The First shall be Last
BRYAN WILSON

Combining samples from different distributions is a hazardous process.  Traditionally form orders in schools have been produced by combining the marks from different subjects. Bryan Wilson describes a teaching approach to sensitize student teachers to the dangers of doing this.

Form orders, wherein the students in a class are ranked in some overall order of merit, are less common in schools than they used to be.  Nevertheless, plenty of schools in Britain still use them.  In many other countries the practice is widespread.

Objections to form orders are of two kinds, educational and statistical.  The educational arguments against them are partly psychological, partly sociological, and most young teachers and student teachers would be aware of them.  The statistical objections however, while no less serious, are much less well-known.  Some consideration should be given to them in the educational measurement component of initial teacher education courses, not only to convince the students themselves of the unreliability of form orders, but also to provide them with convincing evidence, on statistical grounds, in defence of their views in the staff-room.

The unreliability arises, of course, from the need to produce a single rank order from a set of individual subject-orders, usually available as sets of raw marks.

These marks may themselves either be examination results, or the aggregate of marks given for separate pieces of work over a period of time.  In the latter case the problem is compounded - but we will restrict ourselves to the one-stage problem!

The student teacher will probably be familiar with the meaning of mean, median and mode, and aware that average usually implies the mean.  The unexpected properties of the average of numbers derived by combining two or more sets of numbers may be shown by a series of carefully chosen examples.

Example: Average Speeds
I drive to Leeds, 60 km away, at an average speed of 40 kph, and 1 drive back again at 30 kph.  What is my average speed for the double journey?

This kind of problem is, of course, very well known.  Nevertheless, a significant number of student teachers are surprised to find that the answer is not 35 kph.  The teaching point to emphasise is that the average for the combined situation lies between the two separate averages.

Example: Cricket Averages
The Gravelpatch village cricket team has two bowlers, Sam Slinger and Tom Thrower, who are great rivals.  A cup is to be presented to the one who has the better bowling average in the match which is the highlight of the season, the 2-innings game against Battleham.  Both bowlers did very well in what turned out to be a famous victory, their figures being:
 
 
 
      First Innings   Second Innings
    Runs Wickets Average Runs Wickets Average
  Slinger 6 2 3 60 6 10
  Thrower 28 7 4 33 3 11
Who won the cup?

Lest it be thought that justice was done to Thrower because he took the most wickets, his first-innings figures could be replaced by 5 wickets for 25 runs, to make the surprising conclusion even more obvious.

This apparent paradox can be illuminated by considering the situation on a number-line.

S1 is Slinger's average in the first innings, and similarly for S2, T1, and T2

S1 < T1        and     S2 < T2                            (1)

Their match averages, S and T must satisfy
S1 < S < S2          and     T1 < T < T2
(from the first example), but the diagram makes it easy to see that it is possible for S > T despite (1).

Alternatively, the paradox can be illustrated graphically.

The gradients of the line-segments represent the average runs/wicket.  The gradients of each of the line segments of Thrower's graph are steeper (hence a 'worse' average) than the corresponding sections of Slinger's graph, but gradient OT < gradient OS.

By now the students will be becoming aware of the unexpected, even 'unfair' results of combining different sets of data, and they should be ready to consider the more complex problem of combining several sets of marks to try to produce a single overall order of merit.  The following 3-stage investigation illustrates the pitfalls very dramatically.

Example: Form-Orders
(a) A class of 12 students obtained the following marks in their 8 subjects.

Raw Marks
 
  Art    Biology  Chemistry Drama  English French Geography History
Anne 100 30  47 72   40 75 30  47
 Barbara  90  38 43  60  20  65  48  70
Chris  61 36 40 45  41  55 62 80
David  63 32 51 90  30 70 47  35
Edward 56  55  41  82 45 40  49  41
Francis  80 45 49 64  65 45  38 20
George 23 47  45 55  60  80  32  60
Henry 40  35 52 70  56 20 60 65
Iris 85  40 60 40  28   51 55 30
Jenny 72 54 50 10  25 35 66  75
 Kathy 48 57 55 34  70 60  36 10
 Lesley 10 60 59 20  35 30  70 58
 
Their tutor decided to produce an overall order of merit simply by adding the raw marks for each student.  Work out this order of merit.

(b) Jenny, who had been quite good at maths before she dropped it, objected, and asked that the marks in each subject should be scaled before being aggregated.  After consultation with his mathematics colleague, the tutor agreed to this request, and scaled each set of marks linearly from 0 to 100. (Students can do this either graphically or by calculator.) For example, the set of scaled marks for Biology is:

0, 27, 20, 7, 83, 50, 57, 17, 34, 80, 90, 100

Work out the aggregates of scaled marks.  Has scaling made any difference to the overall order of merit? (Wow!)

(c) This provoked a near-riot and the Principal was called in to adjudicate.  With the wisdom of Solomon, he decreed that the order of merit would not be derived directly from the marks, either raw or scaled, but from the rank orders in each subject.  The tutor accordingly worked out the 'places', subject by subject, and aggregated them for each student, so that the student with the smallest aggregate became top in the overall order of merit.
If a student teacher, having worked carefully through that assignment, still retains faith in form-orders, then perhaps he is pursuing the wrong vocation.

Of course, raw scores in a 'real' classroom situation will usually be more closely correlated between subjects than in this example.  The principle of unreliability, however, is still applicable even in circumstances when it is less dramatic.
 

Back to Contents of The Best of Teaching Statistics
Home
Back to main Teaching Statistics Page