Analysis of interlaboratory comparison when the measurements are not normally distributed

. Testing laboratories are more and more concerned with the characterization of their measurement processes. In particular, the standard ISO 17025[2] requires the accredited laboratories to participate to interlaboratory comparisons to evaluate their proficiency to realize the measurement. Different statistical methods are available to exploit the results of this type of comparisons. In our study, first we have evaluated the reference value and the proficiency standard deviation with ISO 5725-2 standard [1] and in second part the calculation of statistical indicator Zscore with ISO13528 [5] standard to assess of proficiency of laboratory with the first estimated parameters. However, these statistical methods rely on the assumption that the measurement results are normally distributed. Based on measurements expressed in dBµV/m, which is a log transformation of an electric field level expressed in µV/m, this paper aims at the comparison between the statistical analysis of data expressed in the two different units and relates these results to the assumed statistical assumptions.


The objectives of an interlaboratory comparison
The interlaboratory comparisons are defined as the organization, the execution and the exploitation of measurements, testing or calibrations on similar items (samples, standards, reference solutions) by at least two different laboratories in predetermined conditions. The implementation of an interlaboratory comparison has different objectives (cf. Fig 1): -Evaluation of the performance of the laboratories. The objective consists to estimate and to demonstrate the proficiency of laboratories to realize the measurement. Each participant implemented his routinely measurement method.
-Estimation of accuracy (trueness and precision) of measurement method. The objective consists to evaluate the performance of the measurement method through repeatability and reproducibility standard deviation. Each participant implemented the same measurement method.
-Attribution of a consensual value to a characteristic of a material. The objective consists to assign a reference value to a material. The participating laboratories must be specialized in the determination of the concerned characteristic.

Fig.1. Objectives of interlaboratory comparisons
In this study, several objectives are realized: The performance of the method, the performance of the participants and the measurement uncertainty evaluation. This is possible with the implementation of the same testing method by all participants described in the next section. First, with the results of participants, we have applied the method describe in ISO 5725 part2 standard [1] to evaluate the overall mean, repeatability standard deviation and reproducibility standard deviation. Second, using the statistical parameters above, the performance of each participant is evaluated with a Zscore, a statistical indicator from ISO 17043 [4]. At the end, in accordance with ISO 21748[6] standard, the evaluation of uncertainty of measurement are calculated with reproducibility standard deviation.
However, all these statistical methods are conditioned to some assumptions which are often made without being checked, in particular the Gaussian behaviour of the observations.

Testing method of all participants
In order to ensure quality control, the participants in the Eurolab France (a professional association of laboratory) dedicated to Electromagnetic Compatibility (EMC) regularly organise interlaboratory comparison scheme. For each scheme, a protocol is defined for a specific measurand. In this paper, we consider the measurement of the electric field emitted by an electronic device according to the standard EN 55016-2-3:2010 [9]. To this extent, the device (a comb generator coupled with an omnidirectional antenna) shown in Fig.2 was circulated between the participants who performed the required measurement within their own facilities.

Fig.2. Illustration of the Device under testing
The measurement is performed in an anechoic room to avoid electromagnetic perturbations of the surroundings. The device is positioned at a distance d = 3 m from the reference point of the antenna, which is at a height denoted as h (see Fig.3).

Fig. 3. Representation of the measurement setup
The participants are asked to perform the measurement both in vertical and horizontal polarisations. For each polarisation, a set of 9 frequencies is chosen and the maximum value for the electric field is reported for each of the 9 frequencies.
The measurement results are expressed in dBµV/m, which is a nonlinear transformation of the corresponding SI unit for an electric field: µV/m. Let ‫ݕ‬ be the measurement expressed in dBµV/m and ‫ݔ‬ the one expressed in µV/m: In order to evaluate the repeatability and reproducibility of the measurement method, 4 independent and repeated measurements are performed for each polarization and each frequency.
As a consequence, the statistical analysis is performed for each polarization and each frequency: 18 levels are available for the comparison. In this paper, we will focus on the results for some of these 18 levels to illustrate our purpose.

Implementation of the interlaboratory comparison
This interlaboratory comparison follows a process defines in a plan of campaign that described the collect of the participants' results, the statistical methods of exploitation and the parameters to be estimated. This process is realized on the mesurande in its two expressions: dBµV/m and µV/m.
The final objective aims at raising awareness of the underlying assumptions when using a statistical method and describes the implications of their inappropriate use. Finally, we discuss the choice of the suitable unit of the measurand to apply the statistical analysis for our example.

Results of interlaboratory comparison organized by Eurolab
The results of the interlaboratory comparison were expressed in dBµV/m, which is not a SI unit, but a convenient working unit in the field of EMC. However, a request was made to perform the analysis in µV/m. In this section, we present the results when considering the data in both units and we will conclude in the discussion regarding the best choice for the purpose of the statistical analysis.

2.1.1.Data
Each of the 22 participating laboratories performed a set of 4 repeated measurements for 9 frequencies of the measurement domain and in 2 polarizations. For clarity, we only present in this paper the results obtained in the horizontal polarization, for the frequency 2.25 GHz (cf. Table 1). First, it can be observed that the data can be considered as normally distributed when expressed in dBµV/m, but not when expressed in µV/m, as pointed out by the two statistical tests for normality in Table 2 : Lilliefors and Anderson-Darling [7,8].

2.2.1.Statistical procedures
In order to evaluate the performance of a measurement method, guidance is provided in the standard ISO 5725-2. Before exploiting the results of the participants, it is necessary to make sure that the results arise from the same process of measurement by applying test of homogeneity. This homogeneity tests are performed to detect potential outliers among the results.
First, the Cochran's test must conclude to homogeneity of the variances of the participants.
If not, this means that one of the variances associated with a laboratory is considered as significantly different from the others. In this case, the repeated measurements of the laboratory are investigated: if they are consistent, then the laboratory is removed from the statistical analysis (because of a too high variance) whereas if an outlier is found, only this outlier value is discarded and the other results are kept in the analysis with updated mean value and standard deviation for the laboratory. The same procedure is then applied until all the variances are homogeneous.
In a second step, a Grubbs' test is performed to ensure that all the mean values from the different laboratories are consistent. If not, outlier laboratories are also discarded from the statistical analysis. The goal of this procedure is to avoid the impact of outliers on the estimation of the overall mean value and the repeatability and reproducibility standard deviations.

2.2.2.Cochran's homogeneity test for the variances
Let p denote the number of participants, the principle of Cochran's test is to test the assumption H ‫‬ ɐ ଵ To this extent, the following Cochran's statistic is obtained thanks to the results of the comparison: Under the assumption H of an equality of the variances, C is supposed to be distributed as a Cochran's distribution. As a result, the observed value C is compared with the critical value in the Cochran's table for p participants and n repeated measurements. This test is commonly used in the analysis of interlaboratory comparisons. However, its conclusions are valid under the assumption that the measurements are distributed as a Gaussian distribution.

2.2.3.Grubbs' homogeneity test for the mean values
The Grubb's test aims at the identification of an outlier, either among the mean values of the different laboratories or among the repeated measurements of a single laboratory. , if one is interested in the minimal value. Then the considered quantity is compared with the critical value in the Grubbs' table. However, the validity of the Grubbs' test is also conditioned to the Gaussian behaviour of the observations. In Fig.4 & Fig5, we provide a representation of all results expressed in both units. The red dots (laboratory 8: figure 4 and Laboratory 4 figure 5) correspond to observations which were discarded after the homogeneity tests (Cochran and Grubbs).  In both cases, the outlier laboratory has a too large variance and has been discarded through the Cochran's test. However, it is not the same laboratory in both cases. Indeed, as the transformation is nonlinear, it has an effect on the variances. It can be observed also that a transformation in µV/m results in a higher spread of the measurement results. After elimination with the tests of homogeneity, we obtain p ' participating laboratories with p ' <= p.  The Evaluation of the parameters of precision (standard deviation of repeatability S r and reproducibility S R ) also the parameter of position (the overall average) on the results on table 4 using formulas (1), (2) and (3) below.

Evaluation of uncertainty of measurement
Alternately to the GUM [3] ,Guide for the expression of the uncertainty of measure (reference method for the evaluation of the uncertainty of measure), we can use the standard deviation of reproducibility obtained in a study of interlaboratory comparison using the standard ISO 5725-2 as an estimation of the standard uncertainty. So, for every studied frequency, we have: The direct calculation in µV/m seems unrealistic in the sense that the standard deviation is of the order of magnitude of the overall mean. However, when data are expressed in dBµV/m, it is possible to compute an approximately 95% coverage interval using a coverage factor ݇ = 2, which is adequate because of the Gaussian behaviour of the results.
In case it is required to express the measurement result in µV/m, care should be taken while applying the transformation. Indeed, Measurements expressed in dBVµ/m can be transformed in µV/m, but such transformation is not allowed for the variance (and thus for the standard uncertainty), as it is a nonlinear transformation. The corresponding results are represented in Fig.6, with the individual mean value for each participant. First, the coverage interval obtained for a direct analysis in µV/m overlaps 0, whereas the intensity of the electrical field is supposed to have a positive value. Of course, such a coverage interval was obtained with a "naïve" assumption of a Gaussian behaviour, which is false as explained above. On the other hand, the transformation of the coverage interval obtained in dBµV/m encloses only positive values, which makes it already more reliable.
But it also has an asymmetric shape: the lower bound is closer to the overall mean than the upper bound. This is a consequence of the nonlinear transformation used. Moreover, Fig 6 shows that this last coverage interval is consistent with all individual data when expressed in µV/m.

Results of the proficiency testing (ISO 13528)
The evaluation of the proficiency of laboratories bases on: -An assigned value Xpt who can be calculated by several methods. For this study the assigned value will be taken equal to the overall average from the exploitation of the results above (Table 6). y X pt -A proficiency standard deviation can be fixed or calculated. In our case, we have used the standard deviation of reproducibility SR estimated below in table 6. R pt S VT he statistic of performance estimates the proficiency of the participant to realize the testing measurement. There are various statistics of performance (Zscore, Difference). In the configuration of this interlaboratory comparison, we used Zscore as formula (4) below.
pt pt lab score X X Z

Vˆ
The interpretation of Z-score : -If |z| <= 2: the performance of the laboratory is satisfactory.
-If 2<|z ¦<= 3: then the performance of the laboratory is debatable, we generate a signal of warning; -If |z |> 3, then the performance of the laboratory is "unsatisfactory", and we generate a signal of action The table 7 below is an example represents Z-scores of every laboratory for one frequency and horizontal polarization by using results on both units.

Conclusion
As a conclusion, our article aims at pointing out the importance of the verification of the assumptions underlying the use of statistical methods. In metrology, a wide majority of the statistical methods commonly used implicitly assume that the data are normally distributed. This is the case when applying the GUM [2] with the common usage of a coverage factor k = 2, and this is also the case in the analysis of interlaboratory comparisons, whether the objective is to characterize the measurement method or to evaluate the proficiency of a laboratory.
In the first case, Cochran's and Grubbs' test have in common to be accurate for Gaussian data. In the second case, the comparison of a Z-score with the values 2 or 3 also relies on a Gaussian assumption as they correspond to a 95% or 99% confidence level (the true values for a Gaussian distribution are then 1.96 and 2.58).
Another warning of this paper is to be careful when applying nonlinear transformations to data. In particular, such transformations cannot be applied to variance or standard deviation calculations.