Not all practical work turns out as expected. The good teacher should be prepared to help his pupils learn from mistakes.
This note concerns an exercise that was given to a group of 30 A-level biology teachers to illustrate the concept of Standard Error. The mistakes implicit in the statement of the exercise, explicit once the exercise had been attempted, serve to highlight the effect of sampling without replacement in small populations and lead subsequently to consideration of the relative importance of sample and population size.
Exercise
(a) Find the mean and standard deviation of the numbers 1, 2, 3, 4,
5 and 6.
(b) Divide yourselves into groups of six; in each group each person
should take a different sample of five numbers from the six in (a) and
find the mean
of those five numbers.
(c) As a group tabulate the six different values of
found by the members of your group.
(d) Find the mean and standard deviation of the values of
tabulated in (c) and verify that
(i) mean in part (d) =
mean in part (a) and
(ii)
standard deviation in part (d) = [standard deviation in part (a)]/
This exercise was designed to illustrate, with as
little arithmetic as possible, a series of important steps necessary to
the understanding of standard errors, namely that the mean
varies according to which sample is chosen, and therefore gives rise to
a new distribution, the sampling distribution or distribution of
as tabulated in (c), different from the original distribution. This distribution
of
has itself got
a mean and standard deviation and it is this standard deviation which is
called the Standard Error. The relationship between the mean and standard
deviation of the distribution of
and the mean and standard deviation of the original set of numbers 1 to
6 namely
= µ
and
were to be illustrated by part (d).
When the group had finished its work we looked at the answers.
(a) Mean 3.5, Standard Deviation 1.87.
(b) and (c)
Sample
1, 2, 3, 4, 5, 3.0
1, 2, 3, 4, 6 3.2
1, 2, 3, 5, 6 3.4
1, 2, 4, 5, 6 3.6
1, 3, 4, 5, 6 3.8
2, 3, 4, 5, 6 4.0
(d) Mean of the values of
= 3.5 = Mean
in part (a)
Standard Deviation of the values of
= 0.37
does not equal [Standard Deviation in part (a)]/
Why then was
not the same as s/
?
The relationship
= s/
is anyway obviously incorrect for if we extended the sample from 5 to 6
numbers thus giving only one possible sample, hence no variation from sample
to sample,
= 0, not s/
.
The relationship
= s/
is so basic to the concept of Standard Error where can the error be?
tThe only thing unusual about our exercise was that we had sampled from
a small population, the numbers 1 to 6, so that we could look at all possible
samples!
The Error Explained
The mistake is either blindingly obvious or totally
invisible depending on which way you happen to be looking at the problem.
In my case I had to re-examine the proof of
to find my error. This fundamental relationship applies when
refers to the standard deviation of the means of all possible
independent random samples of size n. Thus in choosing the
n elements of the sample the choices must be independent. With our
population consisting of the numbers 1 to 6 this implies sampling with
replacement so that there are not simply 6 different samples of size
5 but numbers may be repeated in the sample.
This may seem strange to biologists, for suppose
I am taking a random sample of, say, kilometre squares in which to count
the number of dead elms I am hardly likely to allow a sample which includes
the same square twice. So if in practice we sample without replacement
and still use the relationship s/
to calculate the Standard Error, a relationship which applies to sampling
with replacement, how great is the inaccuracy?
The Importance of Sample Size
Let N be the population size and n the
sample size. Then if
is the mean of a sample taken without replacement it can be shown
that the variance of
is
as opposed to s2/n
for sampling with replacement. Since
=
if n/N is small s2/n
is a good approximation to the variance of
.
In our exercise where N = 6, n =5,
the formula gives the variance of
to be s2/25
and so the standard deviation of
should be s2/5
rather than s/
as we previously suggested - the calculations
confirm this. The "sampling fraction" n/N was in our case 5/6 and
thus the approximation was poor.
The importance of the standard error, or standard
deviation of the distribution of
,
is that it is a measure of the accuracy of using the sample
mean as an estimate of the population mean. From the formula
=
it is clear that the sample size n is the
principle influence as long as N is reasonably large. Thus suppose
there are two populations with 1000 and 100000 individuals respectively
and a sample of 10 is to be drawn from each. The factor 1/n = 1/10 is the
same for each whilst for the population of 1000, (N -
n)/(N - 1) = 990/999 = 0.99, to 2 significant
figures, and for N= 10000, (N - n)/(N
- 1) = 99990/99999= 1.00.
So although the population is 100 times greater in the second case the
variance of the sample mean is increased by only 1 per cent.
Thus it is the size of the sample which is important rather than the fraction of the population sampled. The intuitive notion that to sample a ten acre field, for buttercups say, we ought to take ten times as large a sample as we would for a 1 acre field is therefore false.
Back to Contents of The Best of Teaching Statistics
Home
Back to main Teaching
Statistics Page