Consider the differences between a linear trend forecast, a simple linear regres
ID: 404325 • Letter: C
Question
Consider the differences between a linear trend forecast, a simple linear regression, and a multiple linear regression. What are some of the restrictions of each of the different forecasting techniques? Are there any circumstances when one technique would be preferred to the others?
Consider the differences between a linear trend forecast, a simple linear regression, and a multiple linear regression. What are some of the restrictions of each of the different forecasting techniques? Are there any circumstances when one technique would be preferred to the others?Explanation / Answer
Linear regression is the most widely used of all statistical techniques: it is the study of linear (i.e., straight-line) relationships between variables, usually under an assumption of normally distributed errors.
The first thing you ought to know about linear regression is how the strange term regression came to be applied to the subject of linear statistical models. This type of predictive model was first studied in depth by a 19th-Century scientist, Sir Francis Galton. Galton was a self-taught naturalist, anthropologist, astronomer, and statistician--and a real-life Indiana Jones character. He was famous for his explorations, and he wrote a best-selling book on how to survive in the wilderness entitled "Shifts and Contrivances Available in Wild Places." (The book is still in print and still considered a useful resource--you can find a copy in Perkins Library. Among other handy hints for staying alive--such as how to treat spear-wounds or extract your horse from quicksand--it introduced the concept of the sleeping bag to the Western World.)
Galton was a pioneer in the application of statistical methods to human measurements, and in studying data on relative heights of fathers and their sons, he observed the following phenomenon: a taller-than-average father tends to produce a taller-than-average son, but the son is likely to be less tall than the father in terms of his relative position within his own population. Thus, for example, if the father's height is x standard deviations from the mean within his own population, then you should predict that the son's height will be rx (r times x) standard deviations from the mean within his own population, where r is a number less than 1 in magnitude. (r is what will be defined below as the correlation between the height of the father and the height of the son.) The same is true of virtually any physical measurement than can be performed on parents and their offspring. This seems at first glance like evidence of some genetic or sociocultural mechanism for damping out extreme physical traits, and Galton therefore termed it a "regression toward mediocrity," which in modern terms is a "regression to the mean." But the phenomenon discovered by Galton is a mathematical inevitability: unless every son is exactly as tall as his father in a relative sense (i.e., unless the correlation is exactly equal to 1), the predictions must regress to the mean regardless of the underlying mechanisms of inheritance or culture.
Regression to the mean is an inescapable fact of life. Your children can be expected to be less exceptional (for better or worse) than you are. Your score on a final exam in a course can be expected to be less good (or bad) than your score on the midterm exam. A baseball player's batting average in the second half of the season can be expected to be closer to the mean (for all players) than his batting average in the first half of the season. And so on. The key word here is "expected." This does not mean it's certain that regression to the mean will occur, but that's the way to bet! (More precisely, that's the way to bet if you wish to minimize squared error.)
We have already seen a suggestion of regression-to-the-mean in some of the time series forecasting models we have studied already: plots of forecasts tend to be smoother--i.e., they exhibit less variability--than the plots of the original data. (This is not true of random walk models, but it is generally true of smoothing models and other models in which coefficients other than a constant term are estimated.) The intuitive explanation for the regression effect is simple: the thing we are trying to predict usually consists of a predictable component ("signal") and a statistically independent unpredictable component ("noise"). The best we can hope to do is to predict the value of the signal, and then let the noise fall where it may. Hence our forecasts will tend to exhibit less variability than the actual values, which implies a regression to the mean.
Another way to think of the regression effect is in terms of sampling bias. Suppose we select a sample of baseball players whose batting averages were much higher than the mean (or students whose grades were much higher than the mean) in the first half of the year. Presumably their averages were unusually high in part because they were "good" and in part because they were "lucky." The fact that they did so well in the first half of the year makes it probable that both their ability and their luck were better than average during that period. In the second half of the year they may be just as good, but they probably will not be as lucky. So we should predict that in the second half their performance will be closer to the mean.
Now, why do we often assume that relationships beween variables are linear?
And why do we often assume the errors of linear models are normally distributed?
A variable is, by definition, a quantity that varies. (If it didn't vary, it would be a constant, not a variable.) In fitting statistical models in which some variables are used to predict others, what we hope to find is that the different variables do not vary independently (in a statistical sense), but that they tend to vary together.
In particular, when fitting linear models, we hope to find that one variable (say, Y) is varying as a straight-line function of another variable (say, X). In other words, if all other possibly-relevant variables could be held fixed, we would hope to find the graph of Y versus X to be a straight line (apart from the inevitable random errors or "noise").
A measure of the absolute amount of "variability" in a variable is (naturally) its variance, which is defined as its average squared deviation from its own mean. Equivalently, we can measure variability in terms of the standard deviation, which is defined as the square root of the variance. The standard deviation has the advantage that it is measured in the same units as the original variable, rather than squared units.
Our task in predicting Y might be described as that of "explaining" some or all of its variance--i.e., why, or under what conditions, does it deviate from its mean? Why is it not constant? That is, we would like to be able to improve on the "naive" predictive model: