Reasonable Averages that give Wrong Answers

A. K. SHAHANI

A Quality Control Problem
Queues: A Project
Vanishing Volume
Losing on Winning Ticket
A Mathematical Insight

Averages are meant to convey the essential features of a set of data, or a random variable, in a simple and a concise way. Like any other summary, an average can be misleading, misused and abused; there is a fair amount of literature on this aspect of averages, the book by D. Huff(1973) being a particularly readable account. In one intuitive use of averages there is a source of error which can be quite serious and which is often not recognised. This source of error is illustrated below by a quality control problem, a project, an experiment and a game. A Taylor series expansion gives an insight into the nature of the error.
 


A Quality Control Problem

A company receives a supply of a certain type of component in batches of 5000 components. In the past the policy has been that each of the batches is completely inspected prior to the use in production. The past results obtained from 1000 batches are summarised as follows:

TABLE 1
 
 
Percent of defective components in batch 1 2 3 4 5 6
Percent of batches 15 20 25 20 10 10

Each of the defective components used in production gives a loss of £1.00. Three options are under consideration.

(i) Continue as before, that is, inspect each batch fully. This will cost £16000 per batch.

(ii) Use the components without any inspection.

(iii) First inspect 100 components from the batch at a cost of£1000 and use the results to either accept the batch for production without any further inspection or to reject the batch. A rejected batch will have to be fully inspected and the inspection of the remaining 4900 components will cost £16000.

Which option should be used? One argument is as follows.

From the past results the mean percent of defective items per batch is 1 x 0.15 + 2 x 0.2 + ... 6 x 0.1 = 3.2%, so that we expect 5000 x 0.032 = 160 defectives, on average, per batch. Thus options (i) and (ii) cost the same on average. With option (iii) if a batch is accepted, an average of 160 - 3.2 = 156.8 defectives will be let through, so in this case the cost per batch will be 10 + 1 56~8 = £1 668. If a batch is rejected under option (iii) the cost is £170. Thus option (iii) is more expensive and the company should choose between (i) and (ii).

Many people find this argument quite convincing. The truth of the matter is rather different. In option (iii) we need to find the acceptance number c which will lead to the decision rule, "if there are c or fewer defective items in the sample of 100 then accept the batch, otherwise reject it". A reasonable choice of c would be one which, assuming that the future variation in the batch quality will be as in past, minimises the average cost per batch. This requires a numerical evaluation of the average cost for a given value of c. For our purpose it will be sufficient to demonstrate that with option (iii) we can achieve a lower average cost than £160.00. For this purpose c = 3 will do. The necessary calculations, using a Poisson approximation for the probability of 3 or fewer defective items in a sample of 100 from a given batch, are set out below.
 
 
 
1
2
3
4
S
Batch quality % of defectives Prob batch. Prob. 3 or fewer defectives in 100 2 x 3 Loss due to defectives passed 49 x I x 4
1 0.15
0.981
0.1472 7.21
2 0.20
0.857
0.1714 1.680
3 0.25
0.647
0.1618 2.379
4 0.20
0.433
0.0866 1.697
5 0.10
0.265
0.0265 6.49
6 0.10
0.151
0.0151 4.44
    Total 0.6086 75.7

 

Note that the probability of rejecting a batch is 1 - 0.6086 = 0.3914. The average value of the total cost per batch is

10 + 75.7 + 03914 x 160 = 148.3

which is less than 160. Hence option (iii) should be chosen and the above calculations for a range of values of c will determine the minimum cost for this option.


Queues: A Project

Queuing for obtaining a service is a common occurrence and there is a vast literature on queuing theory. People queuing in banks, post offices, supermarkets; cars waiting at traffic lights; aircraft circling to land and waiting on runway to take off are just a few of the many queuing situations. Typically, in spite of the vast mathematical literature, practical queuing problems are analysed through simulation.

Consider a simple queue in which customers arrive singly for service and wait if necessary. The server is always available and the customers are served in order of arrival. Carry out a simulation in which the interval between successive customers is one of two values say t1, t2 with mean 100 seconds; the simple case of probability of t1 = probability of t2 = 0.5 will allow the use of a coin, if random digits cannot be used, for generating arrivals. Similarly let the service time, independently of arrival intervals, also be one of two values s1, s2 , with mean 90. Observe the behaviour of the queuing system as the. range between the arrival times (t2 - t1) and the service times (s2 - s1) increases while the averages remain constant at 100, 90. Repeat using a slower (faster) server, say with a mean service time of 95 (80) seconds. Many variations on this theme are possible. Some of the relevant measures of this queuing situation are the number of customers in the system, the waiting time of a customer and the percent of time that the server is busy.

Consider the case of mean arrival interval = 100 seconds and mean service time = 90 seconds. It would seem reasonable to suppose that since the service time is shorter, on average, than the arrival intervals the customers will not suffer delays in obtaining service. Further, the server will be free for 10% of the time. This project, which is probably best carried out by a group of pupils or if possible on a computer (after a small scale hand simulation), will demonstrate that this intuitive feeling can be very widely off the mark. Table 2 shows the mean time that a customer has to wait obtained from a computer simulation of 500 customers for each of the different cases. Note that the mean arrival interval is 100 seconds for all the cases and that there are two values, 90, 95 of the mean service time.

TABLE 2

Mean waiting time of a customer. All times are in seconds.
 
t1 t2 s1
s2
Mean waiting time
80
120
70
110
30.2
60
140
50
130
112.8
40
160
30
150
221.3
80
120
75
115
56.4
60
140
55
135
184.9
40
160
35
155
338.3
Back to top


Vanishing Volume: An Experiment

Obtain a bag of small marbles, say diameter about 1.5cm, and find the diameters of 20 marbles. Compute the mean diameter, d, of the 20 marbles and hence predict that the total volume of the 20 marbles is 20pd3/6. Check by immersing the 20 marbles in water in a graduated beaker that the prediction is quite good. Repeat with 20 large marbles, say diameter about 2.5cm.

Now compute the mean diameter, D, of the 40 marbles and check the prediction that the total volume is 40pD3/6. The results are likely to be rather unexpected.

In one such experiment the results obtained were:

Average diameter of 20 small marbles = 1.646cm
Prediction of the volume of 20 marbles = 46.7 cm3
Actual measured volume = 46cm3

Average diameter of 20 large marbles = 2.533cm
Prediction of the volume of 20 marbles = 170.3cm3
Actual measured volume = 170 cm3

Average diameter of 40 marbles = 2.089cm
Prediction of the volume of 40 marbles = 191 cm3
Actual measured volume = 170 + 46 = 216 cm3

Thus the prediction for the 20 small marbles seems all right, and so does the prediction for the 20 large marbles. However, with the combined 40 marbles, the prediction is 191 cm3 which is short of the actual 216 cm3. Where has the volume gone?
Back to top


Losing On A Winning Ticket: A Game

Essentially two players, say Peter and Paul are involved. Several variations are possible and in one case I was Peter, 258 sixth formers jointly constituted Paul and the game took about 15 minutes.

Peter: The game is that you toss a coin till you get a head. If you get the head on kth toss, you pay me £2k I will pay you £5.00 per game. I hope that you realise that I am in a generous mood, for the average number of tosses till getting a head is 2. Since 22 = 4 you should rake off £100 per game, on average, from me.

Paul. Let us have a go.

A total of 258 games were played so that Paul expected to gain £258. The actual result was that Paul lost £1882. Paulwas quite surprised at this result and some parts of him were motivated to find out what had gone wrong.
Back to top


A Mathematical Insight

Wrong averaging is the answer to the surprising results of the situations considered in the previous1 sections. Mathematically, given a variable y = f(x), we can explain the error as the result of using the equation mean y = f (mean x)            (1)

Equation (1) is true if f is linear, however for non-linear f the use of (1) can represent a huge error. For example Peter used the intuitive appeal of (1) in his argument that Paul was onto a good thing. The correct mean gain, per game, for Peter is far from £100 since this mean value is given by

which is infinite. (This game is essentially the St. Petersburg Paradox.)
 
 

A Taylor series expansion of f(x) gives an insight into the possible danger of equation (1) and it suggests the sort of discovery that the pupils should be guided to making. Suppose the mean of x is µ. Now expand f(x) as

Taking mean values gives

Mean (y) = f(µ) + f"(µ).variance(x)/2 + …

Thus a large variance of x can result in a large error if equation (1) is used. This explains the results for the volume of marbles since the variability within each of the two sets of marbles was quite small. School pupils can discover the truth about the use of equation 1 (without invoking Taylor’s expansion) if they are encouraged to note the error as the variability is increased. This can be shown by the example on queuing and the example using marbles. There are of course many other possibilities. An example of a linear f which may be of interest to some pupils is temperature in 0C and 0F and an exercise could involve a prediction of the mean temperature in 0F from the published temperatures in 0C.
 
 

Acknowledgements

It is a pleasure to acknowledge the computational assistance of David Gill which was made possible by a grant from the School Mathematics Project.

University of Southampton

Back to top

Back to contents of The Best of Teaching Statistics
Back to main Teaching Statistics page