without Calculus
C. W. PUR1TZ
Since the fitting of a straight line y = a + bx to a set of data points (x1,y1), (x2,y2),. . . (xn,yn) involves finding two numbers a and b, establishing formulae from the principle of least squares using calculus requires partial differentiation. To avoid this some textbooks use an algebraic method which has always seemed to me rather complicated and hence off-putting to the average A-level maths student. I have therefore used partial differentiation (without the special notation) and found no problems with this. However, the algebraic approach does have merits, particularly the fact that it gives a guaranteed global minimum rather than a local stationary point as furnished by calculus.
This year I have discovered and used a somewhat simpler algebraic method which I found straightforward enough for my A-level statistics set, which had as usual a very wide ability range. We actually did a simple numerical example first and used calculus for this, so they saw both methods. For the general case I proceeded as follows:
The equation
y = a + bx (1)
is intended to be used to estimate y given x. If applied to the
data x values it produces estimates
,
i = 1 to n, which are generally different from the given yi.
Let
=
yi -(a + bxi), the y-discrepancy for
the ith data point,
and consider
(e2 bar)
the
mean square error of estimate of y on x. This is the quantity to be minimised
by choice of a and b.
Now (e2 bar) = var(e) -
(2)
using a familiar formula for variance, and we shall see that a and b can be chosen so that each of the two terms on the right hand side attains its absolute minimum value.
First, var (e) = var (y — a — bx) = var (y — bx), since adding the constant a makes no difference to the variance. (a is constant in the sense that unlike x and y it does not vary from point to point within the data.)
var(y—bx)=var(y) + b2var(x) -2b cov(xy) or b2s2x -2b sxy + s2y (3)
(4)
Clearly this attains its absolute minimum when
b = sxy/ s2x (5)
it has absolute minimum value 0, and this is attained whenever
(6)
As a bonus, (4) tells us that the minimum value of (e2 bar)is
(7)
where r = sxy /s2x .s2y is the correlation coefficient. This gives an immediate proof that r2 < or = 1 with equality occurring only when (e2 bar) = 0, i.e. when the data points are exactly collinear.
Furthermore we can show that r2 s 2y is the variance of the estimated y values since
Thus (7) gives
var(e) = var(y) - var(
)
or
var(y) = var(
)
+ var(e)
i.e. the total variance of the data y values equals the explained variance, (that accounted for by the relationship between y and x) plus the unexplained variance (caused by the data points not lying on the line); and in this sum the explained variance accounts for a fraction r2 of the total.
This last part (after (7)) was hard going for the set, but didn’t take very long, and gave perhaps the best understanding possible at this stage of how the correlation coefficient actually measures the extent to which data can be fitted by a straight line.
Royal Grammar School, High Wycombe
Back to
contents of The Best of Teaching Statistics
Back to main Teaching Statistics
page