Reconstruction of a signal from time-averaged data

In measurement applications such as air quality monitoring, sensors record the average of a signal of over a fixed time period, for example, every hour or every day. Sensor networks may involve a number of different types of sensors with different temporal resolutions, for instance, expensive reference sensors recording accurate, hourly averages and inexpensive sensors recording noisy averages every ten minutes. In addition, readings for some time periods may be missing. In this paper, we discuss methods to combine sensor data representing heterogeneous time averages in order to i) reconstruct an estimate and associated uncertainty of signal at any given time, and ii) use data from reference sensors to calibrate low cost sensors in situ.


Introduction
Many monitoring instruments gather time-averaged data.For example, the London Air Quality Network of reference sensors [6] provides 15 minute or hourly averages.Often we wish to combine knowledge gathered from instruments using different time-windows, e.g., reference sensors providing hourly averages along with low-cost sensor providing less accurate data at finer time resolutions.There are often problems with how to interpret time-averaged data where it is known that some of the averages are taken over a partial period due to missing data.We may also want a construct the signal in finer time-resolved steps in order to make comparisons with other variates.In this paper, we present a general approach for constructing a signal from time-averaged data.We make almost no assumptions about the regularity of the averaging process and the algorithm works for data representing averages over completely random sets of time steps.Instead, we make an assumption that at least part of the underlying process evolves smoothly so that the responses at nearby times are correlated to some extent.In section 2 we describe a general model for timeaveraged data related to a temporally-correlated signal and in section 3 we describe how the model parameters can be determined from time-averaged data.In section 4, we illustrate how the signal reconstruction process behaves on problems that occur in practice.Our concluding remarks are given in section 5.

General model for time-averaged data
We assume that a signal x q at time t q can be modelled as where x = (x 1 , . . ., x q , . . .x M ) T is the signal at times t = (t 1 , . . ., t q , . . .t M ) T , a are parameters that describe expected characteristics of the system such as e-mail: alistair.forbes@npl.co.uk drift and/or cyclical behaviour, C is the observation matrix associated with x and is usually specified by basis functions evaluated at t, e are temporally correlated effects [1,7] whose correlation is modelled in terms of a correlation kernel cov(e, e ) = k(t, t ), e.g.
and δ are uncorrelated random effects δ ∈ N(0, σ 2 R I).We let V e be the M × M variance matrix associated with e calculated using t and assume that e ∈ N(0, V e ), V e = V e (t|σ 2 E , τ). ( We also assume that the times {t 1 , t 2 , . . ., t)M } are regularly spaced.We assume that a number of time-averaged measurements y = (y 1 , . . ., y m ) T are available.We let t i ⊂ {t 1 , . . ., t M } be the times associated with the ith measurement, n i the number elements in t i , and d i be the M × 1 vector such that d q /n i if t q is an element of t i and is zero otherwise.Thus, the vector d i can be used to construct the average d T i x of the signal x over the times t i .The ith measurement is modelled as We denote by V (σ) the diagonal variance matrix with σ 2 i in the ith diagonal element.If D is the m × M matrix with d T i in the ith row, then (4) can be written in matrix-vector form In most circumstances, m, the number of time-averaged data recorded, will be significantly smaller than M , the complete set of times.The goal of the analysis is to estimate the signal x based on the measurements y gathered according to (4), and the prior correlation information about e specified by (3).The formulation in ( 5) is quite general.The matrix D can represent any set of time averaging so that a quite heterogeneous set of sensors can be modelled in this way.For example, some sensors providing monthly averages do so in either four or five week time intervals, depending on the length of the month.There is no requirement that the vectors t i (and therefore D) represent a sets of contiguous time steps, although in most practical applications they will be.The matrix V (σ) allows for different sensor accuracies to be represented, e.g., an accurate (expensive) reference sensor giving hourly averages along side a (low cost) sensor giving ten minute averages with much less accuracy.The model can be expanded to incorporate parameters b that represent calibration parameters for the less accurate sensors [3] so that the reference sensor can be used to calibrate the less accurate sensors in situ.A simple example of this type of co-calibration is given below.

Estimation of the model parameters
Let D be the m × M matrix with d T i in the ith row and L e the Cholesky [4] of factor of V e = L e L T e and introduce the parameters f related to e through e = L e f so that ( 1) and ( 4) can be combined to form with Re-defining to be the sum of the random effects Dδ+ , we end up with with Estimates f and â of f and a, respectively, are found by solving the least squares system where The use of the weighting matrix W ensures that the variance matrix associated with the data y is the identity matrix.This means that the variance associated with the estimates [â T f ] T is given by The estimated signal is given by x = C â + L e f with associated variance matrix 4 Numerical examples

Reconstruction of a signal from hourly averages
We first illustrate the behaviour of the algorithm in addressing the problem of determining estimates of a signal at every five minutes given hourly averages.The model involves parameters a modelling a linear trend.
The simulation data was generated as in equation ( 6), with the independent random component δ generated with σ R = 0.01 and the temporally correlated component e generated as in (3) using the kernel in (2) with σ E = 0.1 and correlation times scale parameter τ = 1.00, 0.50 and 0.25.The hourly averages are generated as in ( 4) with σ i = 0.005.The unit for the times is 1 hour.The signal is in arbitrary units for this simulation.The three values of τ correspond to a correlation between the signal at one time and at one hour later (or earlier) being 0.61, 0.14 and 0.000 3, respectively.
Figure 1 shows the reconstructed signals for data corresponding to the three values of τ over a 12 hour period.For the case τ = 1.00 (top graph), the reconstructed signal is a very good representation of the actual signal over the period.Considering that we are estimating the 144 values of the signal x i (every five minutes over 12 hours, along with the two linear drift parameters a, from just 12 measurements, the quality of the reconstruction is very good.The reason for this is that the temporal correlation is forcing a degree of smoothness on the signal and the hourly averages are sufficient to select a good reconstruction from all the possible signals with the requisite degree of smoothness.
A more quantitative argument can be derived by looking at the eigenvalues associated with the variance matrix V e .The sum of the eigenvalues is equal to the trace of V e , the sum of the diagonal elements of the variance matrix.For the case τ = 1.00, the largest 10 eigenvalues account for approximately 98 % of the trace of V e , indicating that V e can be approximated well by a rank 10 matrix.Or, in other words, there is a small number of effective degrees of freedom associated with the model [2,5] so that the 12 hourly averages determine good estimates of these parameters.The reconstructed signal explains over 98 % of the variance of the data.
The reconstruction for the case τ = 0.50 is given in the middle graph of figure 1.For this case, it is clear that the reconstruction does not capture all elements of the signal.The largest 10 eigenvalues of V e account for approximately 80 % of its trace while the reconstructed signal accounts for approximately 78 % of the variance of the data.The bottom graph in figure 1 is the reconstruction for the case τ = 0.25.For this case, largest 10 eigenvalues of V e account for approximately 48 % of its trace while the reconstructed signal accounts for approximately 47 % of the variance of the data.From another point of view, there is a much broader range of signals with the same smoothness characterisation that could have given rise to the measured hourly averages.

Collaborative measurement using a reference and low-cost sensor
The second set of simulations relates to the very practical requirement of how to combine less accurate, lowcost sensors with a reference sensor in order to deliver an enhanced capability.The simulations involve exactly the same set of signals and hourly averages as described above in section 4.1.However, 10 minute averages are also available from a low cost sensor that provide measurements y i related to x according to where b is an offset common to all the measurements and i is a random effect with associated uncertainty σ L .The low-cost nature of the sensor is modelled in this case by the offset, with modest prior information b ∼ N(0, σ 2 O ), with σ O = 0.1, and the fact that σ L = 0.025 is five times greater than σ i = 0.005 for the reference sensor.This additional sensor can be accounted for in the general model ( 6) through appropriate assignment of the matrices C, D and V (σ).
Figure 2 shows in the upper graph the reconstructed signal of a simulated signal derived from the reference data alone, for the case τ = 1.00.This signal is exactly the same as that in the upper graph of figure 1.The middle graph in figure 2 is the signal reconstructed from the low-cost sensor data alone.The reconstruction is seen to follow the data well but is offset due to the systematic effect b.The uncertainty associated with this systematic effect determined from the lowcost data alone is exactly the same as the prior uncertainty σ O : the data provides no extra information about b.The lower graph in figure 2 is the signal reconstructed from both the reference and low-cost sensor data and gives a good reconstruction of the data, as in the upper graph.The standard uncertainty associated with b derived from the combined data set is u(b) = 0.004, comparable with the standard deviation σ i = 0.005 of the random effects associated with the reference sensor, showing that the analysis of the combined set of data gives an accurate in situ calibration of the low-cost sensor.
The upper graph in figure 5 gives the standard uncertainty associated with the three reconstructed signals.The uncertainty associated with signal reconstructed from the low-cost sensor alone is dominated by the uncertainty of 0.1 associated with the offset b.The uncertainties associated with the other reconstructions are similar with the combined data set giving marginally lower uncertainties, as expected.In a practical setting, the low-cost sensor is providing little useful information if the background signal is sufficiently smooth (but the collaborative approach provides the in situ calibration).
Figure 3 provides similar reconstructions as figure but for the case τ = 0.5, representing less smooth signals.Here, the main outcome is that the collaborative approach, in the lower graph, shows a good reconstruction of the signal, significantly better than that derived from the reference (upper) or low-cost sensor data (middle) alone.This fact is also reflected in the standard uncertainties associated with the recon-structed signals with the uncertainties associated with the collaborative approached significantly smaller that those for the other two reconstructed signals.
Figure 4 shows the reconstructions for the case τ = 0.25 and again shows the collaborative approach gives a good reconstruction (lower graph).The benefit of the collaborative approach, in terms of the standard uncertainties in the reconstructed signals, is seen in the lower graph of figure 5.
For all three values of τ , the standard uncertainty u(b) associated with the estimate of b derived from the combined data sets is u(b) = 0.004, demonstrating that the collaborative approach can determine an accurate in situ calibration of the low-cost sensor.

Concluding remarks
We have described a general algorithm for reconstructing a signal from time-averaged data and illustrated its performance on simulated data representing practical problems in air quality monitoring, for example.The underlying model uses a prior assumption of temporal correlation in the signal in order to improve the reconstruction.The algorithm can be used to estimate the signal at finer time resolutions, for combining multiple data streams that are averaged over different time windows, and providing the in situ calibration of sensors against a reference sensor.