ERROR ANALYSIS

Reference: An Introduction to Error Analysis, 2nd Ed., John R. Taylor (University Science Books, Sausalito, 1997).

There are two main types of experimental errors in physical measurements.

Systematic Error will cause the distribution of data points to be offset with respect to the true value. Causes of systematic error may be poor measurement technique, errors in instrumental calibration, software errors or failure to correct for external conditions (e.g. temperature). Possible sources of systematic error must be considered in all of the stages of the experiment, from design to the data analysis. (See Taylor Chapter 4.)

Random Error is the error from the true value which occurs in any physical measurement. The assumption is that this error will result in experimental readings which are equally distributed between "too high" and "too low", and the mean value reflects the true value, to within some precision. Such a distribution of data can be represented by a Gaussian distribution (or Normal distribution, Taylor Chapter 5). It must be kept in mind that certain types of measurements do not result in Gaussian distributions of data points. For instance, repeated measurements of the number of radioactive decays in a time interval will result in a Poisson distribution (Taylor Chapter 11) due to the fact that the number of decays will never be less than 0. We will return to this below. In many cases the Poisson distribution and others may be approximated by a Gaussian, and the following error analysis may be applied.

 

Analysis of Random Error

General Case:

Standard Error is the error in a single measurement. It is the precision to which one can measure an experimental quantity in one reading, and is denoted by sn-1

,

where xi is a particular reading, á xñ is the mean reading, and n is the number of readings. Most calculators have a standard error function, and use of this can save time when dealing with multiple measurements. See Taylor Section 4.3.

When quoting the final result, the appropriate error to quote is the Standard Error in the Mean, which reflects the increased precision resulting from making multiple measurements. This is denoted by sm.

.

See Taylor Section 4.4.

Poisson Statistics:

When dealing with counting (or Poisson) statistics, i.e. when the quantity measured is a number of counts (e.g., of radioactive decays), the appropriate formula for the error in the mean is:

(note that this is). If the general formula is used instead of this one, the error in the mean obtained will be overestimated. See Taylor Sections 3.2, 11.2.

 

Propagation of Error

Since most physical measurements do not directly result in the quantity of interest, it is necessary to "propagate" the error of measurement to the final quantity of interest. The following results are based on a formal statistical theory of random errors and have been reduced down to a few simple equations.

Addition or Subtraction (Taylor Section 3.6)

C = A ± B        

 Functions (Taylor Section 3.11)

C = C (X, Y)               (e.g. if C=lnX    then)

Functions of more variables are dealt with in the same way, i.e. there are additional terms in the square root. Some of the simple relationships which can be derived from the above equation are:

 Multiplication or Division (Taylor Section 3.6)

C = XYZ or C = XY/Z, etc.      

Powers (Taylor Section 3.7)

C = Xn         

 

Correlated Errors  (Taylor Section 3.6)

When carrying out the error analysis it is extremely important that correlated errors be dealt with properly. In keeping with the assumption of random errors, all variables used in the error equations MUST be independent variables. Otherwise, serious mistakes can occur in evaluating the error.

As a trivial example, consider the error in the calculation of the volume of a cube from one measurement of the length of one side:

V = L ´ W ´ H

If these variables (which have resulted from one measurement and are identical) are treated as independent variables, the following result is obtained using the multiplication rule above:

This, however, is incorrect. Since the errors are not independent (all of them are the same), the correct procedure is to write the equation in terms of one variable and to carry out the propagation on that:

V = L3            

 

Criterion for Consistency

The criterion for consistency is used to determine if two measured values of a parameter are consistent with one another. It arises from statistical theory and must be regarded as a loose inequality. It is meant to provide a guide to the validity of experimental or theoretical results.

If two different determinations of a parameter are made, A1 ± dA1 and A2 ± dA2, then they are consistent if .

 

Linear Least Squares Fitting (Taylor Chapter 8)

Take a set of n pairs of measurements (xi± d xi) and (yi± d yi), i=1,…,n, where d xi are possible systematic errors on the xi , and d yi are the statistical (random) errors on the yi (from the probability distribution of y at the particular xi of interest). Define (in practice this often simplifies to s i =d yi when the statistical errors dominate). If a linear relationship of the form y=a+bx is suspected between the two variables, the linear least squares fit parameters a and b can be obtained from the following formulas:

Define , then

,

,

, and

.

In the case where all data points have the same error s i=s , these formulas simplify to:

,

,

,

, and

.

If a relationship of the form y=axb is suspected, then a linear least squares fit of ln y as a function of x can be calculated. If the relationship is of the form y=a ebx , then a linear fit of ln y as a function of ln x can be done. Based on the above, you can calculate the expected errors on the fit parameters a and b in these cases.

 

Trend Fitting with Software Packages

  In many cases you can use software packages to fit a function to a data set, with the intention of obtaining fit parameters (with error estimates) and an estimate of the goodness of the fit (e.g. the c2 parameter). For instance, you may have used Microsoft Excel in such analyses. You should be very careful of exactly how errors on your data points are handled and propagated in such packages when determining the uncertainties on your fit parameters (or even whether your errors are taken into account at all!). Microsoft Excel is very limited in this regard. The computers in 309 and 310 Osmond are equipped with SigmaPlot 2001, which is a much more powerful statistical analysis and graphing package, but even then care has to be taken to make sure the error bars on your data points are handled properly. Here we will look at an example with Excel and SigmaPlot. You may be using some other analysis tool, but be very wary of what is done with your error bars!

Consider the following data set, constructed so that y = x2 :  

x

y

dy

1.02

0.96

0.02

1.95

4.11

0.34

3.1

8.78

0.45

3.9

16.5

0.7

5.2

24.0

1.5

With Excel, a scatterplot is drawn of y vs x, the errors are introduced by clicking on one of the points, selecting “Format data series”, then the “y Error Bars” tab, then selecting “Custom” to read in the error values from column 3. Now the error bars are displayed properly. Then we click on one of the data points and select “Add Trendline”, select polynomial of order 2, under the “Options” tab we select “Display equation on chart”. We find the best fit is y = 0.5692x2 + 2.136x - 2.0657, not at all the y = x2 we were looking for. There are no error estimates provided on the fit parameters, and the error bars on the y values were completely ignored in doing the fit.

With SigmaPlot, a scatterplot of y-with-errors vs x is drawn, where the x, y and dy values are obtained from three columns. We click on one of the data points and select “Fit Curve”. We select “Polynomial” and “Quadratic” and we find y = (0.57 ± 0.35) x2 + (2.1 ± 2.2) x + (-2.1 ± 3.0) . Here the errors on y have also been ignored so far, and the fit parameters are the same as obtained with Excel, but at least here errors are provided on the fit parameters, and we see that the best fit is in fact essentially compatible with y = x2 within errors. To properly take into account the dy error bars, we go through the following procedure:

  1. Click on one data point, and select “Fit Curve”.
  2. Select “Polynomial” and “Quadratic”. Instead of clicking “Next”, click on “Edit Code…”. This brings up the preprogrammed function. Modify it in the following way:
    1. Click “Add as…” to create a new custom function; call it “Weighted quadratic”.
    2. Change “fit f to y” to “fit f to y with weight w”.
    3. Under “x=col(1)” and “y=col(2)”, add a third row “w=1/(col(3)*col(3))” ; this will weigh each value y by 1/(dy)2.
  3. Now a new equation called “Weighted quadratic” exists in the database. Select it to do the fit. The new best fit is: y = (0.83 ± 0.15) x2 + (0.73 ± 0.68) x + (-0.65 ± 0.54), which is much closer to the expected y = x2 and properly takes the dy errors into account.

Note that you can create any new fit equation with proper weighting as above, introducing the correct 1/(dy)2 weights.