Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A data set shows the cost of a four-color, one page ad in a selection of magazin

ID: 3302594 • Letter: A

Question

A data set shows the cost of a four-color, one page ad in a selection of magazines. Use multiple regression to explain these costs. The independent variables provided are: X1 = the size of the audience (projected number of readers, in thousands), X2 = median household income of the audience, and X3 = the per cent males among the projected readers.

Build a model using these three variables. Is it significant? Why? Give other measures of the goodness of fit of the model.

Are all of the variables useful in explaining the cost of an ad? Why or why not?

Remove any variable or variables that are not useful and build a new model. Compare the two models, using the typical measures of goodness of fit. Which is better?

(thousands) Median Household Magazine Y Page Cost X1 Audience X2 Income X3 %Male AAA Westways 53,310 8,740 92,600 47.0% AARP The Magazine 532,600 35,721 58,990 39.7% Allure 131,721 6,570 65,973 9.0% Architectural Digest 119,370 4,988 100,445 42.0% Audubon 25,040 1,924 73,446 39.0% Better Homes & Gardens 468,200 38,946 67,637 19.6% Bicycling 55,385 2,100 74,175 73.2% Bon Appétit 143,612 8,003 91,849 25.0% Brides 82,041 5,800 56,718 10.0% Car and Driver 187,269 2,330 141,873 72.8% Conde Nast Traveler 118,657 3,301 110,037 45.0% Cosmopolitan 222,400 18,331 57,298 15.8% Details 69,552 1,254 82,063 69.0% Discover 57,300 7,140 61,127 61.0% Every Day with Rachael Ray 139,000 6,860 70,162 12.0% Family Circle 254,600 21,062 52,502 10.0% Fitness 142,300 6,196 70,442 23.7% Food & Wine 86,000 8,034 84,750 37.0% Golf Magazine 141,174 5,608 96,659 83.0% Good Housekeeping 344,475 24,484 60,981 12.2% GQ (Gentlemen's Quarterly) 143,681 6,360 75,103 77.0% Kiplinger's Personal Finance 54,380 2,407 101,900 62.0% Ladies' Home Journal 254,000 13,865 55,249 5.9% Martha Stewart Living 157,700 11,200 74,436 11.0% Midwest Living 125,100 3,913 69,904 25.0% Money 201,800 7,697 98,057 63.0% More 148,400 1,389 93,550 0.0% O, The Oprah Magazine 150,730 15,575 72,953 12.0% Parents 167,800 15,300 59,616 17.6% Prevention 134,900 10,403 66,799 16.0% Reader's Digest 171,300 31,648 62,076 38.0% Readymade 32,500 1,400 52,894 16.0% Road & Track 109,373 1,492 143,179 75.0% Self 166,773 6,078 85,671 6.6% Ser Padres 74,840 3,444 37,742 28.0% Siempre Mujer 48,300 1,710 46,041 17.0% Sports Illustrated 352,800 21,000 72,726 80.0% Teen Vogue 115,897 5,829 56,608 9.0% The New Yorker 135,263 4,611 91,359 49.9% Time 287,440 20,642 73,946 52.0% TV Guide 134,700 14,800 49,850 45.0% Vanity Fair 165,600 6,890 74,765 23.0% Vogue 151,133 12,030 68,667 12.0% Wired 99,475 2,789 91,056 75.5% Woman's Day 259,960 20,325 58,053 0.0%

Explanation / Answer

We do all calculations in MINITAB.

Now using the three given variables we fit a multiple linear regression model:

General Regression Analysis: Y versus X1, X2, X3

Regression Equation:

Y = -22385.1+ 10.5066* X1 + 1.09198*X2 - 20779*X3
Coefficients

Summary of Model:

R-Sq = 75.82%        R-Sq(adj) = 74.05%

We can see that p value for constant is 0.538 and that of X3 is 0.587.Hence at alpha=0.05 it is accepted(null hypothesis) and we see that constant & X3 are not important/significant for the model.

Goodness of fit: We can see that R-sq is 75.82% which means that 75.82% of the total variation is explained by the regression equation of Y on X1 X2 X3.

PART 2 : Now after removing the constant and X3 we now fit the regression equation of Y on X1 & X2.

The output is given below:

General Regression Analysis: Y versus X1, X2

Regression Equation:

Y = 10.2057*X1 + 0.754411*X2

Now we see that all the variables are significant and the regression equation holds.

Summary of Model:

S = 52936.1           R-Sq = 92.69%        R-Sq(adj) = 92.35%

Godness of fit: Now we see that R-sq is 92.69 % which means that 92.69% of the total variation is explained by the regression equation of Y on X1 & X2.

NOTE:From the R-sq we see that the 2nd model is better as it has high R-sq value.

Term Coefficient S.E coefficient T P value Constant -22385.1 36059.9 -0.6208 0.538 X1 10.5 0.9 11.1508 0.000 X2 1.1 0.5 2.3642 0.023 X3 -20779 37961.3 -0.5474 0.587