Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Description of the dataset : Cigarette Consumption Data: A national insurance or

ID: 3319869 • Letter: D

Question

Description of the dataset: Cigarette Consumption Data: A national insurance organization wanted to study the consumption pattern of cigarettes in all 50 states and the District of Columbia. The variables chosen for the study are given in Table 1. The data from 1970 are given in the project.xlsx excel file. The states are given in alphabetical order.

Table 1. Variables in the Cigarette Consumption Data

Variable Definition

Age

Median age of a person living in a state

HS

Percentage of people over 25 years of age in a state who had completed high school

Income

Per capita personal income for a state (income in

dollars)

Female

Percentage of females living in a state

Price

Weighted average price (in cents) of a pack of

cigarettes in a state

Sales

Number of packs of cigarettes sold in a state on a

per capita basis

Project Instruction:

Q1. Construct a linear regression using Backward Elimination Process, where Sales is the dependent variable (Y) and the rest are independent variables. Describe the procedure and all the models in Backward Elimination Process.

dataset:

https://drive.google.com/file/d/1LBVusBswsbcKazhYCf9IeP6yxqtstze6/view?usp=sharing

Can someone explain step by step how this is done? I don't want the answers, but the process in which it is done. Thanks.

SOLUTION:

Solution-

After running regression analysis using salesas dependent variable in the excel, we obtain the following output-

Residual plots are given as -

Hypotheses would be -

H0: Slope cofficient = 0

P-value the given test in the table shows 0.83 which is greater than 0.05.

Thus null hypothesis is not rejected at 5% level of significance.

hence Female variable may not be needed in the model.

Variable Definition

Age

Median age of a person living in a state

HS

Percentage of people over 25 years of age in a state who had completed high school

Income

Per capita personal income for a state (income in

dollars)

Female

Percentage of females living in a state

Price

Weighted average price (in cents) of a pack of

cigarettes in a state

Sales

Number of packs of cigarettes sold in a state on a

per capita basis

SUMMARY OUTPUT Regression Statistics MultipleR R Square Adjusted R Square Standard Error Observations 0.559044386 0.312530625 0.236145139 28.02911931 51 ANOVA Significance F 516072.026623214.405324 4.091492264 0.003799263 MS Regression Residual Total 45 35353.41881 785.6315291 5051425.44543 P-value Lower 95% Upper 95% Coefficients Standard Error 43.48048598230.4722259 0.188658246 0.851208944-420.7144053 507.6753773 3.3502740532.782776575 1.203932102 0.234910647-2.254525678 8.955073784 0.4107968230.657852433 -0.624451325 0.53548458 -1.735779637 0.914185991 0.0229882740.008559833 2.685598272 0.010103083 0.0057478840.040228663 0.984939934 4.793254993 3.3836673921.0111500793.346355268 0.001660453-5.420228192-1.347106592 t Stat Intercept Age Hs Income Female Price 0.205484569 0.838120301-8.669171192 10.63905106

Explanation / Answer

The regression equation is
sales = 43.48049 + 3.35027405Age - 0.4107968 Hs +0.0229888 Income + 0.98494 Female -3.383667 Price

H0: Slope Coefficient of Female is zero
H1: Slope Coefficient of Female is not equal zero
Let the los be alpha = 5%

The P-value of Female = 0.93812 which is > alpha 0.05, so we accept H0
thus we conclude that Slope Coefficient of Female is zero i.e. it is not significant