Bayesian approach to uncertainty evaluation: is it always working?

Since the GUM has been published, measurement uncertainty has been defined in terms of the standard deviation of the probability distribution of the values that can be reasonably attributed to the measurand, and it has been evaluated using statistical or probabilistic methods. A debate has always been alive, among the metrologists, on whether a frequentist approach or a Bayesian approach should be followed to evaluate uncertainty. The Bayesian approach, based on some available a-priori knowledge about the measurand seems to prevail, nowadays. This paper starts from the consideration that the Bayesian approach is based on the well-known Bayes theorem that, as all mathematical theorems, is valid only to the extent the assumptions made to prove it are valid. The main question, when following the Bayesian approach, is hence whether these assumptions are satisfied in the practical cases, especially when the a-priori information is combined with the information coming from the measurement data to refine uncertainty evaluation. This paper will take into account some case studies to analyze when the Bayesian approach can be usefully and reliably employed by discussing the amount and pertinence of the available a-priori knowledge.


Introduction
A general consensus has grown in the scientific and technical community that a measurement result cannot be expressed by a single numerical value and a measurement unit.
According to the GUM [1,2], a measurement result must be represented by a probability density function (pdf), which represents the distribution of the values that can be reasonably attributed to the measurand. Then, from this pdf, it is possible to retrieve 1. a measured value and a standard uncertainty, which are given, respectively, by the mean value and standard deviation of the pdf; the frequentistic approach and a type B evaluation of measurement uncertainty, which is generally considered to correspond to the Bayesian approach.
It is well known that, in the frequentistic approach, measurement uncertainty is obtained by processing a statistically significant amount of experimental data. On the other hand, in the Bayesian approach, measurement uncertainty is obtained starting from the a priori available knowledge, and the evaluation on the relevance of the available information is left to the scientific judgment of the operator.
These different points of view (Bayesian and frequentistic) have been widely discussed in the literature and they appear to be incompatible when seen under a purely mathematical and philosophical perspective [3]. However, when a more pragmatic, metrological approach is adopted, devoid of all philosophical disputes of the last three centuries, they appear to be the two sides of the same coin [4]: the instrument's manufacturer, who has the complete series of experimental data about the behavior of each manufactured instrument, can process them in a frequentistic way to find the value of measurement uncertainty associated to the measured values 1 ; on the other hand, the instrument's user, who cannot directly access the manufacturer's data, is somehow forced to adopt the Bayesian approach [4].
Regardless to the way measurement uncertainty is evaluated, its correctness depends on the correctness of the available information, and how well it is processed to evaluate the different contributions to uncertainty and propagate them through the measurement process [1,5].
Ensuring that the available information is reliable is a critical key point whenever measurement results are employed as input elements in decision-making processes, such as conformity assessment. This is probably the most frequent use of measurement results, especially in industrial applications, as well as in healthcare, environment protection and many other important applications.
According to [6], no decision can be taken without the knowledge of the measurement uncertainty value associated to the obtained measured value. Therefore, in this respect, it could be thought that the GUM [1] provides the main and most important guidelines to be followed. However, [6] also states that the approach followed by the GUM to assign a pdf to a measured value is only a simplification, which applies when no a-priori information about the measurand is available. The more general approach is represented by the application of Bayes' theorem.
It follows that Bayes' theorem is expected to provide a better result than that provided by the commonly employed GUM approach. The aim of this paper is to further discuss this statement and the robustness of the proposed approach with respect to the correctness of the a priori assumptions. Simulation and experimental results will be proposed and analyzed to this purpose.
The next section recalls Bayes' theorem and tries to highlight its advantages and disadvantages when it is applied in metrology.

Bayes' theorem in metrology
Let us consider a random variable X, mathematically represented by its probability density function p X (x), and a random variable Y, mathematically represented by its probability density function p Y (y), where x and y represent single realizations of X and Y respectively.
If the possible mutual dependence of X and Y is also known, it is possible to define the joint probability density function p X,Y (x, y) of the two variables X and Y.
In particular, it can be written: where p Y|X (y|x) is the conditional probability of random variable Y, given X. It is obvious that the conditional probability p Y|X (y|x) of event y given x depends on the particular value assumed by the random variable X. When the random variables X and Y are independent of each other, the conditional probability p Y|X (y|x) of event y given x does not depend on the value x assumed by X. Therefore, it is: Under this assumption, it follows: Equation (1) can be also rewritten as: where p X|Y (x|y) is the conditional probability of random variable X, given Y. p X|Y (x|y) depends on the particular value assumed by random variable Y, except when X and Y are independent variables. In this case: and (4) becomes the same as (3).
In the more general situation of dependent variables, because of (1) and (4), it is: that can be rewritten as: which is the formulation of Bayes' theorem. From the mathematical point of view, it is also: where p Y (y) is called the marginal probability distribution. Thanks to (8), it follows that p X|Y (x|y) can be obtained by simply knowing p X (x) and p Y|X (y|x).
Eq. (7) is very important in metrology. Indeed, let us suppose that random variable X represents the measurand and random variable Y represents the measurement result. The obtained distribution of the measured values is p Y|X (y|x), since it is the distribution of the measured values, given the particular measurand's value.
However, the very aim of a measurement process is not that of knowing the distribution of the measured values (p Y|X (y|x)), but knowing the distribution of values that can reasonably be attributed to the measurand, given the measured value. In mathematical terms, this means that the aim of a measurement process is that of evaluating p X|Y (x|y). This can be obtained by applying Bayes' theorem (7). In fact, according to [6], Bayes' theorem can be effectively employed to obtained the desired distribution if reliable a-priori information (p X (x)) is available about a measurand.
In particular, the following interpretation is given to the quantities in (7): 1. p X (x) is the pdf expressing the a-priori information about the measurand; 2. p Y|X (y|x) is the pdf representing the distribution of measured values provided by the measurement process; 3. p Y (y) is the marginal pdf, given by (8); 4. p X|Y (x|y) is the pdf associated to the measurand, which combines the a-priori information with the information provided by the distribution of measured values.
The main advantage in the use of Bayes' theorem in metrology is that it yields the pdf associated to the measurand (posterior), by combining two different kinds of information: the available information about the measurand itself (prior) and the information provided by the measured values.
However, all previous considerations have been derived under a purely mathematical perspective, that does not doubt about the correctness of all considered quantities. In other words, the implicit assumption behind Bayes' theorem is that there is full belief on both the a-priori knowledge and the distribution of measured values. This means full knowledge about the possible distribution of the measurand values and zero doubts about the correctness in the evaluation of the uncertainty associated to the measured value.
It can be readily perceived that this is not the situation in practical applications. Even if the measurand variability is known, because, for instance, the production process is known, it may always drift from the normal operating conditions. Similarly, the measurement process may deviate from its expected operating conditions, thus making the evaluated uncertainty incorrect. It is then important to understand the consequences of a lack of total trust on the a-priori knowledge or the measured values.
This paper considers, with both simulations and experimental data, the effect of the application of Bayes' theorem in different possible situations: 1. when one can assign total belief on both the a-priori knowledge and the measured values; 2. when no total belief can be assigned to the a-priori knowledge p X (x); this means that a deviation might occur in the measurement model, or the measurement model does not represent exactly the measurement process; 3. when no total belief can be assigned to the assumed distribution of measured values; this means that the measured values might deviate from the expected distribution due, for instance, to a drift in the employed measurement instrument.
This paper also considers the risk in taking a wrong decision (false acceptance or false rejection of a product) when the measurement results are employed in conformity assessment [6]. A comparison is made, under different assumption, in the obtained risks values: • when only the distribution of measured values is taken into account without applying the Bayes' theorem to estimate the measurand value; • when Bayes' theorem is applied, that is when the a posteriori values are considered as the measurement result.

Simulations
A typical example of conformity assessment is considered to discuss the robustness of Bayes' approach: the resistance of the manufactured resistors is measured to check whether it is inside the tolerance limits or not. The a-priori information is represented by the expected dispersion of the resistance values of the manufactured resistors due to variations in the production process. A nominal value R nv = 15 Ω is assumed for the resistors, and the given production tolerance is ±0.5%, that is ±75 mΩ. The dispersion of values due to the production process is supposed to be represented by a normal probability distribution, where the ±3σ coverage interval is assumed to be the same as the tolerance interval. Therefore, in the considered example, the prior distribution is given by: where R nv = 15 Ω and σ nv = 25 mΩ.
It is then assumed that the resistance is measured by a Fluke 8845A, 6.5 digit precision multimeter 2 . According to the manufacturer specifications, the following applies: • the measured 15 Ω value falls in the 100 Ω range; • in the 100 Ω range, an accuracy interval ±0.003% of reading ± 0.003% of range is provided, which corresponds to: where R m is the resistor measured value.
The distribution of the possible measured values is assumed to be a normal distribution, with mean value equal to the measured value R m and standard deviation given by: If, for instance, R m = R nv , then it is σ m = 1.15mΩ. 10000 values have been considered, in each simulation, for the measured values, and the three following cases have been considered.

Case I: no deviation in the instrument or the process
Case I considers the situation in which neither the instrument nor the process is deviating.
According to the assumptions given above in Sec. 3, an prior normal pdf with mean value R nv and standard deviation σ nv is associated to each resistor. Then, since the process is not deviating, the "simulated true values" R STV of the 100000 resistors have been simulated by randomly generating 100000 values from this normal pdf.
On the other hand, since the instrument is not deviating, to simulate the measurement of each resistor (R m ), a random value is generated from a normal pdf centered on the corresponding "simulated true value" R STV and with standard deviation σ m . Hence, a normal pdf centered at R m with the corresponding standard deviation σ m is associated to every measured value.  Bayes' theorem is then applied, according to the measurement data and the a priori knowledge.
It is well-known that the mean value of the posterior pdf, according to the given measurement data y and the a priori knowledge is given by: Considering the proposed example, the quantities in (10) take the following values: • µ a priori = R nv = 15 Ω; • σ a priori = σ nv = 25 mΩ; • y = R m ; • σ y = σ m Fig. 1 shows the histogram associated to the 100000 obtained a posteriori mean values µ posterior , given by (10). These values represent the a posteriori values associated to the 100000 resistors (R a posteriori ).
It can be readily checked that the obtained histogram approximates quite well a normal pdf, as expected since normal pdfs have been assumed. The mean and standard deviation associated to the 100000 obtained a posteriori means are evaluated and the normal pdf, drawn in blue line in Fig. 2, is obtained. This pdf is compared with the a priori pdf (red line in Fig.  2).
It can be seen that the two pdfs are in perfect agreement. This result was expected, since no deviation in the measuring instrument or the process is considered in this case I.
It is also possible to perform a risk analysis, according to R a posteriori . In fact, for every obtained value, it is possible to verify whether it falls inside the tolerance limit (R nv ± 75 mΩ) or not. If the value falls inside the limit, the resistor is within the tolerances and should be accepted; on the other hand, if the value falls outside the limit, the resistor is supposed to exceed the tolerances and should be rejected.
Of course, due to measurement uncertainty, the actual resistance value can differ from the measured one. Therefore, if this last value is considered in the comparison with the tolerance limits, there is always a risk that a bad resistor is erroneously accepted, or that a good resistor is erroneously rejected. In order to state if a decision is correct or wrong, for every resistor, we also compared the corresponding "simulated true value" R STV with the tolerance limit (R nv ± 75 mΩ) and verified if the two values (R a posteriori and R STV ) lead to the same decision. Of course, the correct decision is the one obtained when R STV , which is the true value of the resistor, is considered.
The percentage risk of taking a wrong decision is defined as: Risk total = Total wrong decisions Total decisions · 100 (11) When only false acceptances are considered, the following risk can be defined, which represents the risk that bad resistors are erroneously accepted: Total false acceptances Total decisions · 100 (12) On the other hand, when only false rejections are considered, the following risk can be defined, which represents the risk that good resistors are erroneously rejected: Total false rejections Total decisions · 100 (13) Of course, it is: Considering the values provided by Eq. (10) in this case I, the following percentage risk values have been obtained: The total risk can be compared with the corresponding total risk value evaluated without applying the Bayes theorem, when the "measured values" R m are taken into account and compared with the tolerance interval R nv ± 75mΩ: The obtained values are perfectly compatible, as expected since no deviation in the instrument or the process is supposed in this case I. Therefore, in this case I in which no deviation is present in the process or the instrument, the application of Bayes' theorem does not modify the results of the risk analysis.

Case II: deviation in the process
Case II considers the situation in which the process is deviating.
In this second case, to consider the process' deviation, the "simulated true values" cannot be extracted by the a priori pdf. Therefore, 100000 normal pdfs has been considered with: where k = 1...100000; • standard deviation σ nv In other words, the same shape and standard deviation of the a priori pdf are considered, but the mean value is not centered on the resistors' nominal value R nv , but it is shifted by value 0.75 · k µΩ. The value 0.75 µΩ has been chosen so that, at the end of the 100000 iterations, the process is deviated by exactly 75 mΩ which is the tolerance limit (see Sect. 3).
Hence, the 100000 "simulated true values" R STV are obtained by randomly extracting one value from each of the above pdfs.
On the other hand, since the instrument is not deviating, to simulate the measured value of each resistor (R m ), a random value is generated from a normal pdf centered on the corresponding "simulated true value" R STV and with standard deviation σ m . Hence, a normal pdf centered on R m with the corresponding standard deviation σ m is associated to every measured value.
Bayes' theorem is then applied, according to the measurement data and the a priori knowledge, as given by Eq. (10). In the application of this formula, with respect to previous case I, the a priori knowledge is always the same, but the measurement values R m are different, as explained above. The histogram in Fig. 3 is obtained, associated to the obtained resistors' values R a posteriori .
Also in this case II, the histogram approximates quite well a normal distribution and the pdf in blue line in Fig. 4 represents the corresponding pdf. In this same figure, the a priori pdf is also reported (red line). It can be seen that the blue line is shifted in the direction in which the process is deviating. Moreover, it has a higher variance than the a priori.
As in case I, it is also possible to perform a risk analysis, according to the obtained values R a posteriori (that is, when Bayes' theorem is applied) and according to the measured values R m (that is, without applying Bayes' theorem). The following values are obtained, as far as the total risk is concerned: when R a posteriori is considered; Risk total = 0.59% when R m is considered.
It can be noted that the risk increases a little bit after the use of Bayes' theorem. Therefore, the application of Bayes' theorem, in this case, increases the risk of wrong decision.
It is worth noting that the amount of risk is strictly dependent on the accuracy of the instrument. The greater the accuracy of the instrument with respect to the a priori, the lesser the correction due to Bayes' theorem. Therefore, the lesser the difference in the risks with and without the application of Bayes' theorem. Therefore, with less accurate instruments, the risk of taking wrong decisions after the application of Bayes' theorem could become significant. This is maybe a trivial conclusion, since very accurate instruments do not require, in principle, corrections. On the other hand, this conclusion must be taken into careful consideration if the correction obtained by the application of Bayes' theorem is used to counterbalance possible drifts in the measurement process. If this is the case, the belief in the stability of the manufacturing process must be higher than that in the stability of the measurement process not to have unpleasant surprises.

Case III: Deviation in the instrument
Case III considers the situation in which the instrument is deviating.
In order to simulate the true values of the resistors, since the process is not deviating, the same considerations as in case I apply. Therefore, 100000 extractions are taken from the a priori pdf. These are the "simulated true values" R STV .
On the other hand, the measurements R m must consider that the instrument is deviating. Therefore, 100000 normal pdfs has been considered with: where k = 1...100000; • standard deviation σ m In other words, all pdfs have the same shape and standard deviation, but their mean value drifts. The value 0.75 µΩ has been chosen so that, for the last 100000 th resistor, the measued value is deviated by exactly 75 mΩ, which is the resistor tolerance limit (see Sect. 3).  Hence, the 100000 measurement results R m are obtained by randomly extracting one value from each of the above pdfs. Then, for every measured value, a normal pdf with mean value R m and standard deviation σ m is assumed.
Bayes' theorem is then applied, according to the measurement data and the a priori knowledge, as given by Eq. (10). As already stated above, the a priori knowledge is always the same, but the measurement values R m are different. The histogram in Fig. 5 is obtained, associated to the obtained resistors' values R a posteriori .
Also in this case III, the histogram approximates quite well a normal distribution and the pdf in blue line in Fig. 6 represents the corresponding pdf. In the same figure, the a priori pdf is also reported (red line). It can be seen that the blue line is shifted in the direction in which the instrument is deviating. Moreover, it has a higher variance than the a priori.
As in the previous cases, a risk analysis has been performed, and the risks of wrong decisions, obtained when the measurements R m and when the a posteriori values R a posteriori are considered, are compared with each other. In particular: when R a posteriori are considered; Risk total = 13.26% when R m are considered. The difference between the obtained values for the total risk is small, even in this case III. However it can be noted that, in this case, the application of the Bayes' theorem yields better results.
Once again, it can be underlined that the risk depends on the accuracy of the instrument. The greater the accuracy of the instrument with respect to the a priori, the lesser the correction due to Bayes's theorem and, consequently, the lesser the difference in the risks with and without the application of Bayes' theorem. Therefore, with a less accurate instrument, the improvement due to the application of Bayes' theorem could be significant.
It is also interesting to observe that: Risk total = 13.14% and Risk f r = 13.02% thus meaning that, among all the wrong decisions, most of them correspond to false rejection, because of the wrong measurements due to the instrument's deviation.

Experimental results
In order to verify whether the simulation results are experimentally confirmed, so that the method based on bayes' theorem can be applied in practice, 80 resistors have been used with resistance value, given by the manufacturer specifications: 15 ± 0.15 Ω.
First of all, the values of the resistors have been measured with a reference multimeter: Fluke 8508A with 8.5 digits.
The obtained measurements have been considered the conventionally true values of the resistors. The mean of all conventionally true values has been found to be 14.94 Ω. This value is indeed within the manufacturer tolerance interval but, ideally, it was supposed to be 15 Ω. So, according to the three case studies described above, we are in the case in which "the process is deviating".
Moreover, for the sake of experiment, we also made a different assumption. Since the measured true values of the resistors were all inside the tolerance interval given by the manufacturer, in order to simulate some outliers, we assumed a stricter manufacturer specification: 15 ± 0.075 Ω. Also under this different assumption, the mean of the true values (14.94 Ω) falls within the tolerance interval.
Therefore, it is supposed that the a priori knowledge is represented by a normal pdf with mean value 15 Ω and standard deviation 25 mΩ.
At this point, we measured the resistance values of the 80 resistors with the precision multimeter Fluke 8845. The obtained values are the resistors' measured values. Taking into account these measured values and the a priori knowledge, Bayes' theorem has been applied to get the a posteriori mean values according to (10), where: • y is the measured value of the specific resistor; • σ y is the uncertainty associated to the resistor's measured value and, according to the Fluke 8845 precision multimeter specifications, it is calculated as: where it is assumed that the total error of the instrument, given by the manufacturer's accuracy specifications, corresponds to the ±3σ interval; • µ a priori = 15 Ω, that is the nominal value given by the manufacturer; • σ a priori = 25 mΩ, as explained above.
The distribution of the a posteriori mean values is shown in Fig. 7 in blu line, while the red line shows, for a comparison, the distribution of the a priori knowledge. This result (equal total risk) should not be surprising: indeed, in this case, the employed instrument is quite accurate and the variance of the pdf expressing the distribution of the possible measured values is significantly lower than that of the prior pdf. Therefore, the correction provided by the application of the Bayes' theorem is negligible and does not go in favor of a risk reduction.
Moreover, if the partial risks Risk f. a. and Risk f. r. are analyzed, we get: when the measured values are considered. So, also the partial risks do not differ. It is important to underline that, out of the total 26.58%, 22.78% of the risk is due to false acceptances, while only 3.8% is due to false rejection. This happens because the instrument is deviating in the opposite direction with respect to the process' deviation. So, the bias caused by the process deviation is compensated by the bias caused by the instrument deviation. Therefore, in this case Bayes' theorem does not improve the measurement result and does not reduce the risk of wrong decision.

Conclusions
The application of Bayes' theorem in metrology has been discussed, on the basis of theoretical considerations and a simple practical example, analyzed both in terms of simulation results and experimentally. The practical example confirmed the theoretical analysis: the validity of the results provided by the application of the Bayes' theorem highly depends on the validity of the assumptions made. Since the main hypothesis of Bayes' theorem is the full credibility of the a priori knowledge, that is the validity of the random variable expressing it, whenever this fails, the final result may fail too.
Despite this conclusion may sound quite pessimistic and doubtful of the actual utility of Bayes' theorem in metrology, it should sound only as a warning to check all assumptions before trusting the results blindly, as the best practice in measurement suggests.
The worst situation, because the most difficult to detect, is the use of untrustworthy apriori knowledge that may happen, for instance, when the available information on the production process is considered, but the process itself is deviating from its normal operating conditions.
However, a deviation in the production process should manifest itself in several "disagreements" between the expected data and those actually measured in several points along the production line. In the big-data and Internet-of-Things era, all those data should be available for different kinds of post-processing activities. Therefore, a diligent comparison of the production data at the different points along the production line must show a drift in the production process, thus allowing the metrologist to implement the needed corrections.
Once more, this is a step forward in the direction of the best practice in measurement, since it exploits as much "relevant information" as possible in providing a measurement result.