Description of the dataset : Cigarette Consumption Data: A national insurance or
ID: 3319869 • Letter: D
Question
Description of the dataset: Cigarette Consumption Data: A national insurance organization wanted to study the consumption pattern of cigarettes in all 50 states and the District of Columbia. The variables chosen for the study are given in Table 1. The data from 1970 are given in the project.xlsx excel file. The states are given in alphabetical order.
Table 1. Variables in the Cigarette Consumption Data
Variable Definition
Age
Median age of a person living in a state
HS
Percentage of people over 25 years of age in a state who had completed high school
Income
Per capita personal income for a state (income in
dollars)
Female
Percentage of females living in a state
Price
Weighted average price (in cents) of a pack of
cigarettes in a state
Sales
Number of packs of cigarettes sold in a state on a
per capita basis
Project Instruction:
Q1. Construct a linear regression using Backward Elimination Process, where Sales is the dependent variable (Y) and the rest are independent variables. Describe the procedure and all the models in Backward Elimination Process.
dataset:
https://drive.google.com/file/d/1LBVusBswsbcKazhYCf9IeP6yxqtstze6/view?usp=sharing
Can someone explain step by step how this is done? I don't want the answers, but the process in which it is done. Thanks.
SOLUTION:
Solution-
After running regression analysis using salesas dependent variable in the excel, we obtain the following output-
Residual plots are given as -
Hypotheses would be -
H0: Slope cofficient = 0
P-value the given test in the table shows 0.83 which is greater than 0.05.
Thus null hypothesis is not rejected at 5% level of significance.
hence Female variable may not be needed in the model.
Variable Definition
Age
Median age of a person living in a state
HS
Percentage of people over 25 years of age in a state who had completed high school
Income
Per capita personal income for a state (income in
dollars)
Female
Percentage of females living in a state
Price
Weighted average price (in cents) of a pack of
cigarettes in a state
Sales
Number of packs of cigarettes sold in a state on a
per capita basis
SUMMARY OUTPUT Regression Statistics MultipleR R Square Adjusted R Square Standard Error Observations 0.559044386 0.312530625 0.236145139 28.02911931 51 ANOVA Significance F 516072.026623214.405324 4.091492264 0.003799263 MS Regression Residual Total 45 35353.41881 785.6315291 5051425.44543 P-value Lower 95% Upper 95% Coefficients Standard Error 43.48048598230.4722259 0.188658246 0.851208944-420.7144053 507.6753773 3.3502740532.782776575 1.203932102 0.234910647-2.254525678 8.955073784 0.4107968230.657852433 -0.624451325 0.53548458 -1.735779637 0.914185991 0.0229882740.008559833 2.685598272 0.010103083 0.0057478840.040228663 0.984939934 4.793254993 3.3836673921.0111500793.346355268 0.001660453-5.420228192-1.347106592 t Stat Intercept Age Hs Income Female Price 0.205484569 0.838120301-8.669171192 10.63905106Explanation / Answer
The regression equation is
sales = 43.48049 + 3.35027405Age - 0.4107968 Hs +0.0229888 Income + 0.98494 Female -3.383667 Price
H0: Slope Coefficient of Female is zero
H1: Slope Coefficient of Female is not equal zero
Let the los be alpha = 5%
The P-value of Female = 0.93812 which is > alpha 0.05, so we accept H0
thus we conclude that Slope Coefficient of Female is zero i.e. it is not significant