Individual Project Questions Instruction. Please use PHStat to do the following
ID: 3065609 • Letter: I
Question
Individual Project Questions
Instruction. Please use PHStat to do the following data analysis and copy and paste your PHStat output to the specified space for each question. Type necessary conclusions as asked. This project accounts for a total of 100 points and each question contributes 10 points.
Question 1. Please use PHStat to obtain the descriptive statistics (mean, median, mode, variance, standard deviation, etc.) for the house price data (column A in real estate dataset.xls) and give the three normal empirical rules by providing three confidence intervals. Interpret each of the three confidence intervals.
(Paste your Excel output and type your conclusions here)
Question 2. Construct the following contingency table by filling in correct numbers in the table.
Number of Bedrooms
No. of Baths 1 2 3 4 5 6 7 8 Total
1.5
2
2.5
3
Total
Question 3. Based on the table in Question 2, please give a bar chart and a pie chart for the number of bedrooms. Also please give a bar chart and a pie chart for the number of baths.
(Paste your Excel output here)
Question 4. Please plot the histogram for the house price data in column A by using the bin numbers and the midpoints provided in the dataset real estate dataset.xls. Describe the possible skewness (right-skewed, left-skewed or bell-shaped?)
(Paste your Excel output and type your conclusions here)
Question 5. Please plot the boxplot for the house price data in column A by using the PHStat program. Provide the five-number summary. Describe the possible skewness (right-skewed, left-skewed or bell-shaped?). Interpret the meaning of the three quartiles.
(Paste your Excel output and type your conclusions here)
Question 6. Based on the table in Question 2, define the following events:
A={randomly select a house, the house has 3 bedrooms and 2 baths}
B={randomly select a house, the house has 4 bedrooms}
C={randomly select a house, the house has 2 baths}
Compute the probabilities P(A) and P(B), and the conditional probability P(B|C).
(Provide your computational details here)
Question 7. Assume that the house price data are normal. Use the sample mean and the sample standard deviation in Question 1 as the population mean µ and the population standard deviation (keep 4 decimals only). Define the normal random variable
X=the price of a randomly selected house
Question 7a. Compute the following probabilities P(X<170), P(X>250), and P(150<X<300) by using PHStat.
(Paste your PHStat output here)
Question 7b. What is the price x so that only 5% of house prices are lower than this price? That is, find x such that P(X<x)=5%.
(Paste your PHStat output here)
Question 7c. What is the price x so that the top 10% of house prices are higher than this price? That is, find x such that P(X>x)=10%.
(Paste your PHStat output here)
Question 7d. What are the two prices x1 and x2 that are symmetrically distributed around the population mean price so that 99% of house prices are between these two prices? That is, find x1 and x2 that are symmetrically distributed around the population mean price such that P(x1<X<x2)=99%.
(Paste your PHStat output here)
Question 8. Use the sample mean and the sample standard deviation in Question 1 as the population mean µ and the population standard deviation (keep 4 decimals only). Randomly select 50 houses. Compute the following:
Question 8a. P(Sample mean<230)
(Paste your PHStat output here)
Question 8b. P(Sample mean>210)
(Paste your PHStat output here)
Question 8c. P(200<Sample mean<240)
(Paste your PHStat output here)
Question 8d. What are the two prices x1 and x2 that are symmetrically distributed around the population mean price so that 95% of the sample means with sample size n=50 are between these two prices? That is, find x1 and x2 that are symmetrically distributed around the population mean price such that P(x1<sample mean<x2)=95%.
(Paste your PHStat output here)
Question 9. Provide 90%, 95% and 99% t-confidence intervals for the house price using the data in column A in the given dataset and PHStat.
(Paste your PHStat output here)
Question 10. Use PHSat to carry out the hypothesis tests as in the individual project paper:
Question 10a. The population mean is
µ=mean house price
The population standard deviation is known to be =47.11 as mentioned in the paper. Use the data in column A in the given dataset.
Comment on whether you got the same conclusion as in the paper. Make sure your p-value from PHStat should be close to 0.0078 as in the paper.
(Paste your PHStat output here)
Question 10b. Use PHSat to carry out the same hypothesis test as in the individual project paper:
Although Lisa was happy with her findings, she decided that she wanted to dig down even deeper and draw a random sample of 15 from her original sample of 105 to use for T-test analysis. In this mini-sample, Lisa found a mean home value of 239.06k with a standard deviation of 47.09k. She then inputted this new data into the formula for a one sample, right-tailed T-test as follows:
T = 239.06 – 210.00____ T = 29.06 T = 2.39
47.09 12.168
15
The critical T for = .05 and V = 14 is 1.761.
The P-value for = .05 at 14DF is .0315
Her hypotheses for this one sample T-test are:
Ho: < 210.00 versus H1: > 210.00
Please use PHStat to double check the above result and provide your PHStat output. Compare your p-value from PHStat with the one 0.0315 obtained in the paper. Discuss whether your p-value is correct or the one 0.0315 in the paper may be incorrect.
(Paste your PHStat output here)
Explanation / Answer
We have given that ,
Price = Selling price in $000
Bedrooms = Number of Bedrooms
Size = House square feet
Pool 1=yes, 0=no
Distance = Distance from center of the city
Twnship = Township number
Garage 1-yes, 0=no
Baths = Number of baths
Question 1. Please use PHStat to obtain the descriptive statistics (mean, median, mode, variance, standard deviation, etc.) for the house price data (column A in real estate dataset.xls) and give the three normal empirical rules by providing three confidence intervals. Interpret each of the three confidence intervals
The number of observations will be 105.
The mean price is 221.10.
The median price will be 213.57.
Mode of the data is 188.325.
Standard deviation of the price is 47.11.
Variance of the price will be 2219.35.
Emperical rule states that 68% of the data values lie within one standard deviation.
95% of the data value lie within two standard deviation.
99.7% of the data values lie within three standard deviation.
Now we have to find confidence interval for all these three conditions.
The confidence will be :
mu - sigma and mu + sigma
mu - 2*sigma and mu + 2*sigma
mu - 3*sigma and mu + 3*sigma
where mu is sample mean and sigma is sample standard deviation.
68% of the data values lie within (173.99, 268.21).
95% of the data values lie within (126.88, 315.32).
99.7% of the data values lie within (79.77, 362.43)
--------------------------------------------------------------------------------------------------
Question 7. Assume that the house price data are normal. Use the sample mean and the sample standard deviation in Question 1 as the population mean µ and the population standard deviation (keep 4 decimals only). Define the normal random variable
X=the price of a randomly selected house
Here X ~ Normal(
Question 7a. Compute the following probabilities P(X<170), P(X>250), and P(150<X<300) by using PHStat.
z = (x - mu) / sigma
Convert x = 170 into z-score.
z = (170 - 221.1006) / 47.11 = -1.08
P(Z < -1.08) = 0.1390
P(X > 250)
z = (250 - 221.1006) / 47.11 = 0.61
Now we have to find P(Z > 0.61)
P(Z > 0.61) = 0.2698
P(150<X<300)
z-scores for x=150 and x = 300 is,
z = (150 - 221.1006) / 47.11 = -1.51
z = (300 - 221.1006) / 47.11 = 1.67
Now we have to find P(-1.51 < Z < 1.67)
P(-1.51 < Z < 1.67) = P(Z < 1.67) - P(Z < -1.51) = 0.9530 - 0.0656 = 0.8874
----------------------------------------------------------------------------------------------------------
Question 7b. What is the price x so that only 5% of house prices are lower than this price? That is, find x such that P(X<x)=5%.
Given that P(X < x) = 0.05
The formula for x is,
x = mu + z*sigma
where z is z-score for probability 0.05.
z = -1.645
x = 221.1006 + (-1.645)*47.11 = 143.61
-----------------------------------------------------------------------------------------------------------
Question 7c. What is the price x so that the top 10% of house prices are higher than this price? That is, find x such that P(X>x)=10%.
Given that,
P(X > x) = 0.1
P(X < x) = 1- 0.1 = 0.9
z = 1.282
x = 221.1006 + (1.282*47.11) = 281.47
-------------------------------------------------------------------------------------------------------------------------
Bedrooms = Number of Bedrooms
Size = House square feet
Pool 1=yes, 0=no
Distance = Distance from center of the city
Twnship = Township number
Garage 1-yes, 0=no
Baths = Number of baths
Question 1. Please use PHStat to obtain the descriptive statistics (mean, median, mode, variance, standard deviation, etc.) for the house price data (column A in real estate dataset.xls) and give the three normal empirical rules by providing three confidence intervals. Interpret each of the three confidence intervals
Price Mean 221.1006 Standard Error 4.597462 Median 213.57 Mode 188.325 Standard Deviation 47.10997 Sample Variance 2219.349 Kurtosis -0.27624 Skewness 0.474345 Range 220.32 Minimum 125.01 Maximum 345.33 Sum 23215.56 Count 105