This dataset contains obvious various missing values. Using A) mean imputation,
ID: 3132651 • Letter: T
Question
This dataset contains obvious various missing values.
Using A) mean imputation, B) deletion imputation, C) last observation and D) regression based imputation: compare the results between these methods and basic statistical values (distributions of the affected variables).
Determine, select and use the appropriate imputation method and then continue analyzing the dataset through regression. Using all appropriate methods you have learned do the following:
Create a model that effectively describes the relationship between the Xs and the Y (using the imputed values). Which approach did you use? Why?
What if the missing values were the following? Create a regression model including the true values. How would you evaluate the performance of the imputation methods?
Kent
12.4
0.95
0.9225
12.3
Kool
16.6
1.12
0.9372
16.3
L&M
14.9
1.02
0.8858
15.4
LarkLights
13.7
1.01
0.9643
13
all data
Kent
12.4
0.95
0.9225
12.3
Kool
16.6
1.12
0.9372
16.3
L&M
14.9
1.02
0.8858
15.4
LarkLights
13.7
1.01
0.9643
13
Explanation / Answer
A) mean imputation
It consists of replacing the missing data for a given attribute by the mean of all known values of that variable.
The estimate value of Carbon monoxide content kent brand is 12.54
Nicotine content of L&N Brand is 0.870
Weight of Lark Lights Brand is 0.9239
b) deletion imputation
we delete observations where any of the variable is missing.
c) last observation
I donot know what last observation is work
d) regression model
Tar content (mg) Nicotine content (mg) Weight (g) Carbon monoxide content (mg) Mean 12.216 Mean 0.8704 Mean 0.968668 Mean 12.5376 Standard Error 1.133162 Standard Error 0.070558 Standard Error 0.017641 Standard Error 0.947889 Median 12.8 Median 0.87 Median 0.9496 Median 13 Mode #N/A Mode 1.01 Mode #N/A Mode 10.2 Standard Deviation 5.66581 Standard Deviation 0.352792 Standard Deviation 0.088207 Standard Deviation 4.739446 Sample Variance 32.1014 Sample Variance 0.124462 Sample Variance 0.00778 Sample Variance 22.46234 Kurtosis 2.951539 Kurtosis 4.348977 Kurtosis 0.385094 Kurtosis 0.672066 Skewness 0.756658 Skewness 1.031911 Skewness 0.50791 Skewness -0.19969 Range 28.8 Range 1.9 Range 0.3799 Range 22 Minimum 1 Minimum 0.13 Minimum 0.7851 Minimum 1.5 Maximum 29.8 Maximum 2.03 Maximum 1.165 Maximum 23.5 Sum 305.4 Sum 21.76 Sum 24.2167 Sum 313.44 Count 25 Count 25 Count 25 Count 25