Combining samples from different distributions is a hazardous process. Traditionally form orders in schools have been produced by combining the marks from different subjects. Bryan Wilson describes a teaching approach to sensitize student teachers to the dangers of doing this.
Form orders, wherein the students in a class are ranked in some overall order of merit, are less common in schools than they used to be. Nevertheless, plenty of schools in Britain still use them. In many other countries the practice is widespread.
Objections to form orders are of two kinds, educational and statistical. The educational arguments against them are partly psychological, partly sociological, and most young teachers and student teachers would be aware of them. The statistical objections however, while no less serious, are much less well-known. Some consideration should be given to them in the educational measurement component of initial teacher education courses, not only to convince the students themselves of the unreliability of form orders, but also to provide them with convincing evidence, on statistical grounds, in defence of their views in the staff-room.
The unreliability arises, of course, from the need to produce a single rank order from a set of individual subject-orders, usually available as sets of raw marks.
These marks may themselves either be examination results, or the aggregate of marks given for separate pieces of work over a period of time. In the latter case the problem is compounded - but we will restrict ourselves to the one-stage problem!
The student teacher will probably be familiar with the meaning of mean, median and mode, and aware that average usually implies the mean. The unexpected properties of the average of numbers derived by combining two or more sets of numbers may be shown by a series of carefully chosen examples.
Example: Average Speeds
I drive to Leeds, 60 km away, at an average speed of 40 kph, and 1
drive back again at 30 kph. What is my average speed for the double
journey?
This kind of problem is, of course, very well known. Nevertheless, a significant number of student teachers are surprised to find that the answer is not 35 kph. The teaching point to emphasise is that the average for the combined situation lies between the two separate averages.
Example: Cricket Averages
The Gravelpatch village cricket team has two bowlers, Sam Slinger and
Tom Thrower, who are great rivals. A cup is to be presented to the
one who has the better bowling average in the match which is the highlight
of the season, the 2-innings game against Battleham. Both bowlers
did very well in what turned out to be a famous victory, their figures
being:
| First | Innings | Second | Innings | ||||
| Runs | Wickets | Average | Runs | Wickets | Average | ||
| Slinger | 6 | 2 | 3 | 60 | 6 | 10 | |
| Thrower | 28 | 7 | 4 | 33 | 3 | 11 |
Lest it be thought that justice was done to Thrower because he took the most wickets, his first-innings figures could be replaced by 5 wickets for 25 runs, to make the surprising conclusion even more obvious.
This apparent paradox can be illuminated by considering the situation on a number-line.
S1 is Slinger's average in the first innings, and similarly for S2, T1, and T2
S1 < T1 and S2 < T2 (1)
Their match averages, S and T must satisfy
S1 < S < S2
and T1 < T < T2
(from the first example), but the diagram makes it easy to see that
it is possible for S > T despite (1).
Alternatively, the paradox can be illustrated graphically.
The gradients of the line-segments represent the average runs/wicket.
The gradients of each of the line segments of Thrower's graph are steeper
(hence a 'worse' average) than the corresponding sections of Slinger's
graph, but gradient OT < gradient OS.
By now the students will be becoming aware of the unexpected, even 'unfair' results of combining different sets of data, and they should be ready to consider the more complex problem of combining several sets of marks to try to produce a single overall order of merit. The following 3-stage investigation illustrates the pitfalls very dramatically.
Example: Form-Orders
(a) A class of 12 students obtained the following marks in their 8
subjects.
Raw Marks
| Art | Biology | Chemistry | Drama | English | French | Geography | History | |
| Anne | 100 | 30 | 47 | 72 | 40 | 75 | 30 | 47 |
| Barbara | 90 | 38 | 43 | 60 | 20 | 65 | 48 | 70 |
| Chris | 61 | 36 | 40 | 45 | 41 | 55 | 62 | 80 |
| David | 63 | 32 | 51 | 90 | 30 | 70 | 47 | 35 |
| Edward | 56 | 55 | 41 | 82 | 45 | 40 | 49 | 41 |
| Francis | 80 | 45 | 49 | 64 | 65 | 45 | 38 | 20 |
| George | 23 | 47 | 45 | 55 | 60 | 80 | 32 | 60 |
| Henry | 40 | 35 | 52 | 70 | 56 | 20 | 60 | 65 |
| Iris | 85 | 40 | 60 | 40 | 28 | 51 | 55 | 30 |
| Jenny | 72 | 54 | 50 | 10 | 25 | 35 | 66 | 75 |
| Kathy | 48 | 57 | 55 | 34 | 70 | 60 | 36 | 10 |
| Lesley | 10 | 60 | 59 | 20 | 35 | 30 | 70 | 58 |
(b) Jenny, who had been quite good at maths before she dropped it, objected, and asked that the marks in each subject should be scaled before being aggregated. After consultation with his mathematics colleague, the tutor agreed to this request, and scaled each set of marks linearly from 0 to 100. (Students can do this either graphically or by calculator.) For example, the set of scaled marks for Biology is:
0, 27, 20, 7, 83, 50, 57, 17, 34, 80, 90, 100
Work out the aggregates of scaled marks. Has scaling made any difference to the overall order of merit? (Wow!)
(c) This provoked a near-riot and the Principal was called in to adjudicate.
With the wisdom of Solomon, he decreed that the order of merit would not
be derived directly from the marks, either raw or scaled, but from the
rank orders in each subject. The tutor accordingly worked out the
'places', subject by subject, and aggregated them for each student, so
that the student with the smallest aggregate became top in the overall
order of merit.
If a student teacher, having worked carefully through that assignment,
still retains faith in form-orders, then perhaps he is pursuing the wrong
vocation.
Of course, raw scores in a 'real' classroom situation will usually be
more closely correlated between subjects than in this example. The
principle of unreliability, however, is still applicable even in circumstances
when it is less dramatic.
Back to Contents of The Best of Teaching Statistics
Home
Back to main Teaching
Statistics Page