ROBERT W. JERNIGAN
Part of this problem is that most students have never seen a normal population, much less actually worked with one. To attack this problem I was resolved to give my students direct exposure to, at least an approximation of, a normal population. Of course any physical model must be a discrete approximation to the normal distribution. This is somewhat unfortunate since one difficulty that students have is in understanding continuous probability distributions. But much insight and experience can still be gained by using a discrete approximation.
There are available commercially several approximate normal population models: from tags in a box to counting stripes on sunflower seeds, but I rejected these as too vague or too tedious. In the end, I constructed my own model of a normal population from 500 wooden beads, which are readily available in most hobby or teacher supply stores. The beads were grouped and stacked by colour on heavy wire poles mounted on a 3 foot long piece of 2" x 4" timber. The beads were then numbered to conform approximately to a normal distribution with a mean of 50 and a standard deviation of 10. This selection allowed a range of numbers from 20 to 80; all positive and of magnitudes that are easily comprehended. Figure 1 shows the model population. The frequencies of each number are listed in Table 1 from Li (1964).
Even if used as nothing more than a display, this model would help reinforce
the idea of how a bell-shaped curve relates to a discrete population. In
class, we derived descriptive statistics for the frequency distribution,
many by simply counting beads, e.g. percentiles.
TABLE 1
Population frequency distribution
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
To demonstrate sampling and statistical inference we conducted sampling experiments. For example, the beads were removed, placed in a bowl, and each student was instructed to draw, with replacement, 10 samples each containing 5 beads. For each of the 10 samples the students calculated the mean, and these values were arranged in a frequency distribution shown in Table 2.
To demonstrate many of the well-known theorems of statistics, we noticed
that the frequency distribution of the sample means followed the characteristic
bell-shaped curve, suggesting that the sample means were themselves a normal
population. Next, computing the mean and standard deviation of the sample
means, we obtained the values 50.24 and 4.41, very close to the theoretical
values of 50 and 10/
5
= 4472, respectively.
The model proved the most useful in illustrating hypothesis testing. Here, I had the opportunity to demonstrate several difficult concepts for students of elementary statistics, namely Type I and Type II errors and confidence intervals.
Each student was asked to use their samples to test the assumption that
the true mean of the population was 50, assuming that the standard deviation
was known to be 10. This test was made against the alternative that the
mean was not at the 5% level of significance. Each of 30 students
performed this test on their first 5 sample means.
TABLE 2
Frequency distribution for the sample means
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Even though the mean of the population was truly 50, and the students know that they should not reject this true hypothesis, 10 out of 150 or 6.6% of the sample means indicated that the assumption was false. This is a Type I error and 0.066 is the probability of committing this error. The discrepancy between the stated significance level and the computed significance level is explained by the fact that we are dealing with a finite number of samples from a discrete approximation to a normal distribution.
The students were then asked to perform the same tests on five of their samples that had the largest means, thus adding a selection bias. The rejection percentage was now up to 10% = 15/150. This, of course, indicated the trouble of letting a group of data suggest a statistical test.
Type II errors were also demonstrated by testing that the population mean was 65, obviously false. On this test 7.3% = 11/150 did not refute this hypothesis.
Several 95 confidence intervals were also formed for the population mean from 5 different samples. Those calculations showed that 96.6% = 145/150 of these intervals contained the true population mean of 50.
The possibilities for sampling or probability experiments, applications, and demonstrations with this model are unlimited. With the aid of computers, similar and even more extensive sampling experiments could be carried out without the model population. There, one could demonstrate other concepts such as the central limit theorem since you would not be limited to one type of population from which to draw your samples. But this population model has the advantage of providing a visual, in-class demonstration. It also provides the students with a "hands on" approach to basic statistics. They have the opportunity to see statistics at work and gain some needed first-hand experience.
The American University, Washington D.C.
Reference
Back to
contents of The Best of Teaching Statistics
Back to main Teaching Statistics
page