Use of Pooled Standard Deviation of Paired Samples in Calculating the Measurement Uncertainty by the Monte Carlo Method

This paper presents guidelines for expression of measurement uncertainty for tensile tests (tensile strength), making the Type A evaluation of uncertainty from historical data. The application of pooled standard deviation obtained from several samples and calculated on the values obtained for tensile strength, helps for a better expression of uncertainty in measurement. Accordingly, the results obtained with GUM and Monte Carlo evaluation methods were consistent.


Introduction
In the International Vocabulary of Metrology -VIM -VIM JCGM 200:2012 [1], measurement result is define as set of quantity values being attributed to a measurand together with any other available relevant information.A measurement result is generally expressed as a single measured quantity value and a measurement uncertainty.For the Guide to the Expression of Uncertainty in Measurement -GUM -JCGM 100:2008 [2], in general, the measurement result is only an approximation of the value of the measurand and thus is complete when accompanied by a statement of the uncertainty.
As emphasized by Meyer [3], Heping and Xiangqian [4], to report a measurement result, it is important to be informed a quantitative indication of the quality of this result, in such a way that users can assess a range of doubts.Without this indication, measurement results can not be compared, either among themselves or with reference values provided by technical specifications or regulations.According to VIM, measurement uncertainty is a non-negative parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used.The GUM lists possible sources of uncertainty in a measurement, including: a) incomplete definition of the measurand; b) imperfect realization of the definition of the measurand; c) nonrepresentative sampling -sample measured may not represent the defined measurand; d) inadequate knowledge of the effects of environmental conditions on the measurement or imperfect measurement of environmental conditions; e) personal bias in reading analogue instruments; f) finite instrument resolution or discrimination threshold; g) inexact values of measurement standards and reference materials; h) inexact values of constants and other parameters obtained from external sources and used in the data-reduction algorithm; i) approximations and assumptions incorporated in the measurement method and procedure; j) variations in repeated observations of the measurand under identical conditions.For Kessel [5] and Silva [6], the evaluation of measurement uncertainties proposed by GUM has been well implemented since 1993 for measuring instruments calibration and measurements in general.According to Martins [7], the application of the GUM is not without difficulties, associated with the need to create a mathematical model that represents the measurement process and to determine the sensitivity coefficients, as well as correlation between input quantities.
This paper aims to present the application of the GUM Uncertainty Framework and Monte Carlo methods in measurement uncertainty estimation for tensile tests, specifically in determining the tensile strength parameter.The paper is divided in order to initially present the guidelines from the models of evaluation measurement uncertainty.In sequences, are described the main components of measurement uncertainty and how they are considered in the measurement uncertainty.Finally a case study is conducted to check the applicability of the methods in reporting the measurement result of the measurand.

Methods of measurement uncertainty evaluation
The traditional method for the expression of measurement uncertainty is described in the GUM.This guide was developed to uniform the methods used by metrology laboratories to calculate and express the measurement.Its principle is to show that the measurement uncertainty incorporates several components of uncertainty arising from systematic and random effects, allowing the comparability of measurement results performed by different laboratories.The traditional method discussed in GUM is illustrated in Figure 1.
The implementation of GUM Framework starts with the definition of the measurement mathematical model (the measurement equation itself) that includes all relevant contributions to the measurement result.The mathematical model can be represented by: The combined uncertainty u c (y) is estimated by the law of propagation of uncertainty, from the identification and quantification of the standard uncertainty.The standard uncertainty is the range around the central value equal to one standard deviation.
The evaluation of the standard uncertainty u(x i ) can be classified as Type A or Type B. The purpose of classification Type A or Type B is to indicate two different ways of evaluating uncertainty components and serves only for discussion.Both types of evaluation are based on probability distributions and standard uncertainty.Each type is quantified by variance or standard deviation.The Type A evaluating method uses statistical procedures and can be applied when some observations have been made independent of the input quantities under the same measuring conditions.The Type B evaluation is made by ways other than those used for a Type A evaluation of measurement uncertainty.It is based on information.Examples of evaluation based on the information: a) associated with values published by competent authority; b) associated with the value of a certified reference material; c) obtained from a calibration certificate; d) concerning the drift; e) obtained from the specification of a verified measuring instrument; f) obtained from limits deduced from personal experience.Although the combined standard uncertainty u c (y) can be universally used to express the measurement uncertainty, in some commercial, industrial and regulatory applications, and when health and safety are at issue, it is often necessary to give a measurement uncertainty that defines an interval about the result which is expected to cover a large fraction of values that could reasonably be attributed to the measurand.The additional measure of uncertainty that satisfies the requirement to provide a range of the indicated type above is called expanded uncertainty and is represented by U. The expanded uncertainty U is obtained by multiplying the combined standard uncertainty u c (y) by a coverage factor k, for a given level of confidence.
The measurement result is then conveniently expressed as Y = y ± U. The best estimate of the value attributable to the measurand Y is y, and (y -U) to (y + U) is the interval at which is expected to encompass a large fraction of the distribution of values that could reasonably be attributed to Y. Whenever practicable, the confidence level p, associated with the interval defined by U must be declared.
According to Jornada and Jornada [8], the GUM is a method of calculating uncertainty accepted and widely used by laboratories and companies because it is: a) universal: it can be applied to any type of measurement and testing; b) internally consistent: is derivable from the input components that influence the uncertainty; c) transferable: uncertainty given can be used directly on recalculations of other uncertainty, in line with the method that is based on the propagation of uncertainties.However, studies present some limitations of GUM: a) linearization of the model: the principle of uncertainties propagation applied by GUM, which deals with the calculation of the combined standard uncertainty, the Taylor series expansion is truncated to the firstorder terms.This linear approximation in some cases may require higher order terms; b) the assumption of normality of the measurand: according to the recommendation of the GUM, it is common practice to consider the normal distribution in the expression of the expanded uncertainty; Désenfant and Priel [9] highlight the laboratory difficulties for determining the measurement uncertainty, particularly in calculating the correlation between the input variables and the derivation from measurements mathematical models.
The GUM application is not appropriate for models that do not meet the requirements of the method, generates mains fragilities, whose consequence is the inaccuracy associated with the expression of the measurement result.
As an alternative to the traditional method of measurement uncertainty, was issued a supplement to the GUM, called Evaluation of measurement data -Supplement 1 to the "Guide to the expression of uncertainty in measurement" -Propagation of distributions using a Monte Carlo Method -JCGM 101: 2008 [10].This supplement provides a general numerical approach, in line with the general principles of the GUM, to carry out the necessary calculations of measurement uncertainty.This supplement also provides guidance in situations where the conditions for the GUM uncertainty framework are not met, or is not clear whether they are met.It can be used when is difficult to apply the traditional method of GUM, due to the complexity of the model, for example.
The Monte Carlo Method (MCM) is basically a sampling experiment whose aim is to estimate the distribution of possible results of the output variable, based on one or more input variables, which behave according to some probability distribution previously defined.
According Possolo [11] and Suzuki et al [12], the MCM, differently from GUM Framework, uses the concept of probability distributions propagation of the input quantities and not only the propagation of the standard uncertainties, as recommended by the traditional method.That is, the probability distribution of each source of uncertainty is propagated through the measurement equation.Figure 2 and Figure 3 present the comparison between the traditional method (GUM) and Monte Carlo Method (MCM), respectively.The concept of propagation distributions used in the Monte Carlo Method primarily consists in taking appropriate probability distributions (rectangular, normal, triangular, etc.) for the input quantities.
Measurement uncertainty is calculated for a given confidence level, after lot of trials performed.JCGM 101 (2008) highlights that with Monte Carlo Method, probability density functions of the input quantities are propagated by the measurement mathematical model to obtain a probability density function for the output quantity.Thus, output distribution is not assumed to be Gaussian, as it is usually in GUM Framework, but calculated from probability distributions of the input quantities.
MCM can be stated as a step-by-step procedure (Figure 4): a) define the measurand; b) establish the mathematical model of measurement; c) identify the probability density functions corresponding to each input quantity; d) select number of iterations; e) generate random numbers to obtain probability density function (pdf) of the output; f) extract from pdf obtained: average value of the output quantity, standard uncertainty or appropriate coverage interval for the measurand for a stipulated level of confidence.Donatelli and Konrath [13], through the application of computational simulation, have compared MCM in measurement uncertainty evaluation with GUM method for some artificial examples.Some critical factors that directly influence the quality of the results when using the Monte Carlo method have been identified: a) representativeness of the mathematical model; b) quality of the input quantities characterization; c) generating characteristics of the pseudorandom numbers used; d) number of trials (M); e) definition of coverage range.Cox and Haris [14] and own JCGM 101 ( 2008) present examples of uncertainty evaluation using samples of size M = 10 5 or M = 10 6 , but this can result in long wait times when complex mathematical models are calculated, because the computers configuration.Additional studies of Gonçalves et al [15] corroborate the above considerations, and they concluded that in order for simulations of 10 4 results may vary.For simulations in the order of 10 5 , the variations are very low in uncertainty.Compared with the number of simulations 10 6 and 10 7 the difference is around 1%.
To determine the range of coverage, when the distribution of the variable that represents the possible values of the measurand is symmetric, the feature is used to order the output vector from the lowest to the highest value and identify the limits of the range covered by counting of its elements.For example, assuming M = 10 5 and p = 95%, the limits of a symmetric coverage interval can be estimated by the values of the numbers 2500 and 97500 of the ordered array.
However, this method is not appropriate when the distribution of the output quantity is not symmetrical.In such cases, it is appropriate to apply the recommended procedure for estimating the minimum range of coverage according JCGM 101 (2008).
Also according to Donatelli and Konrath [13], the Monte Carlo method is applicable when: a) the mathematical model of the measurement shows an accentuated non-linearity; b) the probability distribution of the output quantity significantly deviates from the normal; c) when complex mathematical models are involved, in which is difficult or inconvenient to determine the partial derivatives required by the traditional method; d) when the measured quantity can not be explicitly expressed because of the influence quantities.

Type A Evaluation of Uncertainty
In Type A evaluation of measurement uncertainty, the uncertainty components evaluated are those acquired through statistical analysis of a series of repeated observations.Usually, the standard uncertainty by Type A evaluation is obtained by calculating the experimental standard deviation of the mean.
Because of the small sample size used in mechanical tests, such as the tensile test, the pooled standard deviation combined from several lots is recommended to use.The pooled standard deviation is determined by the following equation: wherein: s p = pooled standard deviation s = standard deviation per lot assessed n i = number of measurements per batch N = total number of measurements K = number of batches It is essential that for using the pooled standard deviation, the variances of the samples are not significantly different, being necessary to apply test for to verify equality of variances.One test used for this purpose is based on the theory of Bartlett [16].

Tensile Test
The determination and knowledge of the mechanical properties is very important for choice of the material.According to Callister [17], mechanical properties define behaviour of the material when subjected to mechanical stress, as they relate to the material's ability to resist or transmit these efforts applied without breaking and without deform uncontrollably.Some important mechanical properties are strength, hardness, ductility and stiffness.The results of the tensile test are influenced by factors related to the material, specimen, test equipment, test procedure and calculation of mechanical properties.These factors are confirmed by Silva [6], which influences are classified into two categories: metrological parameters and material and testing parameters.
This paper will assess the uncertainties related to the tensile strength parameter.The main objective is use of GUM and Monte Carlo Methods, testing the application of sample standard deviation and pooled standard deviation.
According ISO 6892-1:2009 [18], tensile strength is defined as the stress corresponding to the maximum force.
The mathematical model for the tensile strength (Rm) can be summarized as a function of the repeatability and measuring equipments: Considering the factors of influence reported in the mathematical expression (Eq.3), tensile strength, represented by ratio between force and area, is given by: With the mathematical model presented in Eq.4, it is possible to identify the influence of different components on the measurement results and to calculate the sensitivity coefficients.From the expression (4) are obtained sensitivity coefficients for the experimental load (F), diameter (D), machine (M), caliper (C), respectively: By applying the mathematical model presented in Eq. 4, the variables F and D are treated independently.However, the specimens may have different dimensions, which can generate distinct forces applications.That is, the variability is not due to the testing process, but because the measures of the test pieces.In this direction, each test result is calculated independently, and tensile strength is usually calculated from the uncorrected values of force and diameter test piece experimentally obtained: By isolating F in equation ( 5) and replacing it in Eq (6), it is obtained the mathematical expression for tensile strength: From the expression (7) are obtained sensitivity coefficients for the experimental tensile strength (σ), diameter (D), machine (M), calliper (C), respectively: 14) 03003-p.5

Experimental procedures and results
The data used for the experimental part are from the results of mechanical tests performed on SAE 1020 Steel test pieces, taking into account the ISO 6892-1:2009 procedures (Table 1).19 samples composed of two test pieces were used in the measurement uncertainty evaluation.Each sample is from a different melting charge.All specimens were turned in the same CNC machine and tested under the same conditions.The tests specimen had circular cross section, and the diameters were measured with a digital caliper.Measurement uncertainties from the instruments were taken from their calibration certificates.Expanded uncertainties associated to testing machine and caliper were 370 N and 0,01 mm, respectively.Both results with coverage probability k = 2. Through the Barlett's tests, equal variances between the samples was tested for original diameter, maximum force and tensile strength .All tests demonstrated the equality of variances, since the P-Value was greater than the significance level (α=0,05).The graphical representation of the equality of tensile strength variances from different samples is given in Figure 5.To evaluate the impact of standard deviations obtained from small samples, samples G and P were analyzed in comparison to the pooled standard deviation obtained on 19 samples (Table 2).The measurand Y is the value of Tensile Strength obtained in each test piece.For each sample, four situations were performed by GUM uncertainty framework and by Monte Carlo Method: a) Distinct Type A evaluating for diameter and force obtained from the sample ifself; b) Tensile strength standard deviation obtained from the sample itself; c) Pooled standard deviation for diameter and force; d) Pooled standard deviation for tensile strength.
The GUM Uncertainty Framework shown in Figure 6 lists the principal sources of uncertainty, with the 03003-p.6 respective standard uncertainty in cases where the maximum force (F) and the original diameter of the test piece were considered separately.
In the Figure 6, the values s 1 and s 4 represent the standard deviations obtained by the Type A evaluation of uncertainty.In case of using the standard deviation obtained from a sample, one degree of freedom was attributed to v 1 and v 4 .When using the pooled standard deviation, degrees of freedom attributed to v 1 and v 4 were equal to 19.The sensitivity coefficients are calculated according to equations ( 5) to (8).An example of calculating the measurement uncertainty, evaluated from a single sample, is shown in Figure 7.
The GUM Uncertainty framework shown in Figure 8 lists the principal sources of uncertainty, with the respective standard uncertainty in cases where the strength tensile was calculated for each test piece.
In the Figure 8, the value s represents the standard deviations obtained by the Type A evaluation of uncertainty.In case of using the standard deviation obtained from a sample, one degree of freedom was attributed to v. When using the pooled standard deviation, degrees of freedom attributed to v were equal to 19.The sensitivity coefficients are calculated according to equations (11) to (14).An example of calculating the measurement uncertainty, evaluated from a single sample, is shown in Figure 9.The Monte Carlo Method was carried out by applying the models presented in equations ( 4) and (10).For each case, 10 5 were made interactions.For each input was allocated a probability density function (PDF): a) for D, F and σ: t distribution; b) for B M and B C : Gaussian distribution; c) for L M and L C : rectangular distribution.10 5 simulations were performed for each case described above.With randomly generated data, the values were sorted in ascending order.With a stipulated coverage probability of 95%, an appropriate coverage interval for Rm was obtained.
The results for all cases are presented in Table 3. Column v i informs the degrees of freedom assigned to the input quantities evaluated by Type A measurement uncertainty (F, D and σ).The degrees of freedom equal to 1 is related to the standard deviation calculated on a length sample of 2. degrees of freedom equal to 19 is for the pooled standard deviation calculated over all samples.
The use of sample size n=2 has not allowed a appropriate quality for the estimation of measurement uncertainty.The values within and between GUM and MCM were discrepant.
The use of pooled standard deviation in GUM and MCM presented compatible expanded uncertainty.
The expanded uncertainty obtained by pooled standard deviations of force and diameter was larger, when compared with the pooled standard deviation of tensile strength.

Conclusion
It is common that small samples are applied in the accomplishment of the mechanical tests that will affect the determination of the repeatability standard deviation.Through the obtained results, it was also verified that the factor of larger influence in the uncertainty calculation for the tensile strength is coming from the Type A evaluation of uncertainty.In this case, historical data become a viable alternative to be used for estimate the standard deviation.However, it is important to detach that the use of historical data is for material type that is being analyzed, equipment model and test method.However, it is evident that the pooled standard deviation is a viable and justifiable solution to represent the standard uncertainty determined by Type A evaluation.The expanded uncertainties obtained by the GUM Uncertainty Framework and Monte Carlo Method are compatible when a significant number of degrees of freedom are contemplated in evaluation Type A of uncertainty.To avoid dimensional variation may contribute wrongly in the estimation of measurement uncertainty, it is recommended the use of tensile stregth values.

Figure 1 .
Figure 1.Flowchart for the expression of uncertainty in measurement -ISO GUM

Figure 2 .
Figure 2. Law of propagation of uncertainties by GUM[10]

Figure 3 .
Figure 3. Propagation of distribution for input quantities by MCM [10]

Figure 4 .
Figure 4. Simplified flowchart of the evaluation of measurement uncertainty using the Monte Carlos Method external diameter of test piece F = Maximum force B M = Bias of testing machine L M = Testing machine resolution B C = Bias of caliper L C = Caliper resolution

Figure
Figure 6.Source of Uncertainty, considering Type A evaluating for diameter and force

Table 1 .
Mechanical test results

Table 2 .
Statistics of samples B and P, and pooled standard deviation

i Source of Uncertainty Value Denominator Value of standard uncertainty u(x i ) Probability distribution Degrees of freedom v i
6. Source of Uncertainty, considering Type A evaluating for diameter and force

Table 3 .
Expanded Uncertainty for Samples G and P