Pitfalls in uncertainty estimation : few measurements , non-gaussian distributions

In metrology practice, uncertainty budget calculation might include components based on little measurements data. The concept of coverage factor at a given level of confidence to combined uncertainty is taking care of this lack of information. We propose to apply a similar factor to the individual component(s) of an uncertainty budget when the value is calculated with limited number of measurements. These factors have the same function as the coverage factor defined in the GUM but on the level of the individual component of the total uncertainty. The goal of the work is to provide values of these factors for practical use. Inspired from the Supplement 1 of the GUM, we use Monte Carlo simulations to calculate the component coverage factors for repeated measurements from non-normal distributions. We apply the algorithm to the most common distributions in metrology, for 95% and 99% confidence level: rectangular, arc sine, triangular and trapezoidal


Introduction
In metrology practice of uncertainty budget calculation, it is common to see type A uncertainty component calculated with only 3 measurements.This obviously leads to underestimation of the global uncertainty.As highlighted in the Annex E of the GUM [1], for only 3 observations of a normally distributed variable, the experimental standard deviation of the mean can be underestimated by about 50 %.
The concept of applying a coverage factor to a measurement uncertainty when reporting measurement results is widely accepted.The reference text GUM recommends the use of a coverage factor in order to determine an expanded uncertainty, which encompass a given percentage of the probable value of the measurand.We propose to apply a similar factor to the individual component(s) of the uncertainty budget when the value is calculated with a limited number of measurements.These factors have the same function as the coverage factor defined in the GUM but on the level of the individual component of the total uncertainty, we will call them component coverage factors.The goal of the presented work is to provide values of these factors for different types of distribution, for practical use.
The conditions for application of the GUM uncertainty framework is among others that the distribution for the measurand can adequately be approximated by a Gaussian distribution.For non-Gaussian distribution, there is always the possibility to use the method proposed in Supplement 1 of the GUM which gives as output the probability density function of the measurand and as a consequence the coverage probability of an interval, once the probability distribution function of the input quantities are known [2,3].However, this possibility is seldom used in practice.
Inspired from the Supplement 1 of the GUM, we use Monte Carlo simulations to calculate the component coverage factors for repeated measurements from nonnormal distributions.An algorithm has been developed and applied to the most common distributions in metrology, for 95% and 99% level of confidence: rectangular, arc sine, triangular and trapezoidal distributions.Tables of component coverage factors depending on assumed probability distribution and number of repeated measurements are given in section 3.For large number of repetitions (number of measurements n >30), the difference due to non-normal distribution are negligible but for small n, the use of Student-based coverage factors clearly underestimates the uncertainty.difficulty we propose a numerical method based on Monte Carlo simulation.
According to the GUM, the coverage factor is defined as the factor to multiply the combined standard uncertainty, u c (x), in order to have the expanded uncertainty U, on the basis of a level of confidence.In this definition, the coverage factor multiplies the combined standard uncertainty.The combined standard uncertainty is a sum of individual components.Assuming that the number of individual components is large and that there is not a dominant component either of type A based on few observations or of type B based on rectangular distribution, then the validity of Central Limit Theorem justifies the usage of t-Student distribution to calculate coverage factors (GUM G.2.3, [7]).When the central limit theorem is not applicable, the use of Studentbased coverage factor will lead to underestimation of the uncertainty and a dedicated calculation as presented in this paper is needed.

Algorithm
Let's consider any probability distribution of a random variable.In the Supplement 1 of the GUM, it is recommended how to generate the distribution, which are of interest to this work [8].If we take a sample of this distribution by making a limited number n of repeated measurements, the best estimate of the measurand is the arithmetic mean of the observations ̅ and the experimental standard deviation of this estimate is the standard deviation of this mean ( ̅ ).
In the proposed Monte Carlo approach, this operation is repeated N times, such there are N samples of n observations, the number of repeated measurements.For each sample out of the N, the mean and the associated standard deviation of the mean ( ) are calculated: We obtain N values of an ( ) : , … , et ( ) , … ( ) .
The second step consists in applying the concept of coverage factor as defined in the GUM and explained previously.The component coverage factor k c is defined as follows: if the best estimate of the value attributable to the measurand is x, with an associated uncertainty u(x), there is a probability P that the mean of repeated measurement values will be in the interval [x-U, x+U], where = ( ) ( ), the values of k c depending on the probability P.
In the present method, the best estimate of the measurand is the mean of the N iterations: Therefore, k c (99%) (k c (95%)) is determined numerically such that 99 % (95 %) of the calculated i x are in the interval: # ̿ − ( ), ̿ + ( )%.
The Monte Carlo method is valid if the number of iteration N is sufficiently large.The minimum N was selected based on the asymptotic behavior of the level of confidence of the interval.The results of the calculations presented here are obtained with N = 10 6 .Note that the algorithm results have been verified to be independent of the parameters of the distribution.

Examples
The proposed algorithm can be run for classical distributions but also for any distributions that can be modeled with Monte Carlo.Unlike the use of Studentbased coverage factor, the use of Monte-Carlo based method is not limited by the symmetrical property of the distribution.
The special cases of an exponential distribution for which 2 observations (n=2) if available is taken as example.It illustrates the case of a variable X to which an exponential distribution is attributed and for which only 2 measurement points are available.The algorithm is divided in 2 parts The first parts consists in taking N samples of 2 random observations from the distribution.For each of these N trials, the mean and the standard deviation the mean ( ) are calculated.
& , , , ' → & , , , ' → …. & , , , ' → Figure 2 illustrates the distribution of the .The mean of the , ̿ , is the best estimate of the expected value of the distribution.In the present case of exponential distribution of expected value µ equals to 0.5, ̿ = 0.4495 for 10 6 iterations.The Monte Carlo methods gives a numerical representation of the distribution of the mean of the 2 observations.This distribution is not symmetric in the case of exponential distribution, such that the interval with a given level of confidence is not centered on the best estimate of X.Furthermore, when an interval is given, a coverage factor is theoretically not required.However, from the practical point of view of the user familiar with the GUM coverage factor method, we interpret the level of confidence of that interval in terms of component coverage factor.This later is calculated as follows.k c values are tested such that the interval # ̿ − ( ), ̿ + ( )% has a coverage probability of 99 % (95 %) of the .This is done via an iterative process, using linear interpolation, starting from the Student's coefficient.Generally, 3 or 4 iterations are sufficient.In the present case, a value of 124 (24.3) is found for a level of confidence of 99 % (95 %) (see Table 1 and Table 2

in section 3).
Another examples is treated with an arc sine distribution, for 2 observations as well.Figure 2 shows the distribution of the means of 2 random observations.Component coverage factor of 263 (37) is found for a level of confidence of 95 % (99 %).
The results of the algorithm has been tested to be independent of the parameters assigned to the distribution.

Component coverage factors for nonnormal distributions
The component coverage factors have been calculated for the normal, rectangular, arc sine, triangular, trapezoidal and exponential distributions.The results of the calculations are reported in Table 1 for the 99 % confidence interval and in Table 2  As expected, in the case of a normal distribution, the algorithm gives rise to the same value as the Student's

Conclusions
In the present contribution, we propose a numerical method, based on Monte Carlo simulation, for the calculation of component coverage factors for non-Gaussian distributions and limited number of repeated measurements.The coverage factors for 95 % and 99 % level of confidence have been calculated as a function of the number of observations for rectangular, arcsine, triangular, trapezoidal and exponential distributions.The calculated component coverage factors for a small number of observations are significantly larger than for the Gaussian ones.This difference decreases when the number of observation increases.Hence, It is proposed to use these coefficients rather than the usual Student's factors for the contribution of non-Gaussian distributed uncertainty component in global uncertainty budget.The proposed component coverage factors do not rule out from the use of Student's distribution based values in the case of an uncertainty budget when the requirements for the application of Central Limit Theorem are met and the distribution approaches a normal distribution.

Figure 1 :
Figure 1 : Distribution of the mean of 2 observations from an exponential distribution, built with Monte Carlo for µ=0.5

Figure 2 :
Figure 2 : Distribution of the mean of 2 observations from an arc sine distribution, built with Monte Carlo for µ=0.5 other distributions, the component coverage factors for a small number of observations are significantly larger than for the Gaussian one.Hence, the systematic use of Student's factors whatever the kind of distribution will lead to an underestimate of the uncertainty.

Table 1 :
Coverage factor for different distributions, for 99 % level of confidence

Table 2 :
Coverage factor for different distributions, for 95 % level of confidence