Hi Can someone help with this. There is excel file in link for this. We have all
ID: 3294829 • Letter: H
Question
Hi Can someone help with this. There is excel file in link for this.
We have all heard that the house prices is all about location, location, and location. But, what else might be important in determining the price of a house? To do so, we collected data on sales of houses in a suburban region in Pennsylvania (exact location cannot be disclosed) which would, at least partially, neutralize the location as the most important factor in determining the price of a house. This data set contained in “house.xls” located in the Assignments section of the Blackboard has data on 59 home sales for one year. Data set contains the following:
Your assignment is to take four of the variables in this data set and estimate a linear regression equation for the house price. The variables should be chosen to give the “best fit” – which simply means choose four factors that are most important in determining the price of a house.
Once you have determined your best regression equation (or best four variables) report the results along with all appropriate statistics to show the appropriateness of your chosen variables. You should present R2, adjusted R2, and t ratios for all four of your independent variables.
Complete the report by discussing why these four variables you chose are the most important determinants of the price of a house. What intuition/explanation can you give for those variables? Were there any surprises? Conclude your report by discussing one piece of data/variable you would wish to include that were not given in this data set. Explain the intuition behind why you think that variable would have replaced one of the four that you actually included in your regression.
It is estimated that you this report can be completed in less than four hours, if that. It is further estimated that you would need no more than five pages to hit everything that is asked for in this report. And that includes the cover page which should have a snappy, jingoistic title (conjure up your inner marketing specialist) and your name.
Link for excel file.
https://we.tl/8YAqzlyEi5
Column Name Description Price Age Aircon Baths Bedrms Cond firepl Floors Garage Sqft Sales price in thousands of dollars In vears - 1 if central A/C is installed, 0 otherwise # of full baths. 0.5 bath powder room # of bedrooms Condition of home, higher is better # of fire places # of floors size of the garage: 1 = parking for 1 car, etc. Size of the home in square feet Size of the vard in square fee ardExplanation / Answer
USING R softwere
Variable selection method
we use forword selection method for selecting best variable .
>D=read.table(file.choose(),header=TRUE)
>attach(D)
> null=lm(price~1, data=D)
> null
Call:
lm(formula = price ~ 1, data = D)
Coefficients:
(Intercept)
228.6
> full=lm(price~., data=D)
> full
Call:
lm(formula = price ~ ., data = D)
Coefficients:
(Intercept) age aircon baths bedrms cond
-20.968991 2.227015 27.580543 -5.454734 -38.007277 14.726856
firepl floors garage sqft yard
22.615113 -32.228182 -3.399932 0.140483 0.003779
> step(null, scope=list(lower=null, upper=full), direction="forward")
Start: AIC=558.23
price ~ 1
Df Sum of Sq RSS AIC
+ sqft 1 261450 471776 534.22
+ yard 1 156383 576843 546.08
+ baths 1 105758 627468 551.04
+ firepl 1 96232 636993 551.93
+ cond 1 35183 698043 557.33
+ bedrms 1 33666 699560 557.46
<none> 733226 558.23
+ garage 1 9325 723900 559.48
+ floors 1 2040 731186 560.07
+ aircon 1 2005 731221 560.07
+ age 1 628 732598 560.18
Step: AIC=534.22
price ~ sqft
Df Sum of Sq RSS AIC
+ yard 1 43218 428558 530.55
+ bedrms 1 38863 432913 531.14
+ age 1 32591 439185 531.99
+ floors 1 21740 450036 533.43
<none> 471776 534.22
+ firepl 1 12164 459611 534.68
+ baths 1 8019 463757 535.21
+ garage 1 4892 466883 535.60
+ cond 1 3861 467915 535.73
+ aircon 1 1016 470759 536.09
Step: AIC=530.55
price ~ sqft + yard
Df Sum of Sq RSS AIC
+ bedrms 1 30317.7 398240 528.22
+ age 1 26580.1 401977 528.77
<none> 428558 530.55
+ firepl 1 13605.6 414952 530.64
+ floors 1 10664.2 417893 531.06
+ garage 1 9827.4 418730 531.18
+ aircon 1 2375.1 426182 532.22
+ baths 1 1763.7 426794 532.30
+ cond 1 1298.0 427260 532.37
Step: AIC=528.22
price ~ sqft + yard + bedrms
Df Sum of Sq RSS AIC
+ age 1 31416.2 366824 525.37
+ floors 1 13350.5 384889 528.21
<none> 398240 528.22
+ firepl 1 7939.4 390300 529.03
+ garage 1 3987.3 394252 529.63
+ cond 1 1952.5 396287 529.93
+ baths 1 723.2 397517 530.11
+ aircon 1 336.5 397903 530.17
Step: AIC=525.37
price ~ sqft + yard + bedrms + age
Df Sum of Sq RSS AIC
<none> 366824 525.37
+ floors 1 8231.6 358592 526.03
+ cond 1 7471.5 359352 526.16
+ firepl 1 6362.2 360461 526.34
+ aircon 1 2526.0 364298 526.96
+ baths 1 1991.4 364832 527.05
+ garage 1 587.8 366236 527.28
Call:
lm(formula = price ~ sqft + yard + bedrms + age, data = D)
Coefficients:
(Intercept) sqft yard bedrms age
24.702924 0.138975 0.004537 -41.227447 2.211733
COMMENT
Using above forward selection method We select sqft ,yard, bedrms and age this 4 variables.
> f=lm(price~age+bedrms+sqft+yard)
> summary(f)
Call:
lm(formula = price ~ age + bedrms + sqft + yard)
Residuals:
Min 1Q Median 3Q Max
-151.96 -38.64 -25.22 21.35 370.80
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.702924 54.703673 0.452 0.6534
age 2.211733 1.028461 2.151 0.0360 *
bedrms -41.227447 18.123069 -2.275 0.0269 *
sqft 0.138975 0.026018 5.341 1.89e-06 ***
yard 0.004537 0.002221 2.043 0.0460 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 82.42 on 54 degrees of freedom
Multiple R-squared: 0.4997, Adjusted R-squared: 0.4627
F-statistic: 13.48 on 4 and 54 DF, p-value: 1.097e-07
Conclusion
I think sqft is best predictor /independent variable in this model and this is in variable is effect on your house price .
and other like bedrm and yard are also releted but they are not very much important
Only Sqft variable is very important this variable affrcted in house price .
__________________________
I advice you can add in the data area of the house is very important veriable becuse sometime price is depends only on area.
thank you .
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
best of Luck :)
Tushar barkade.