Correlation analysis and simple linear regression are two statistical techniques
ID: 3221790 • Letter: C
Question
Correlation analysis and simple linear regression are two statistical techniques used to examine linear relationships between two variables. With the concepts of each technique in mind: What minimum requirements must the two variables meet to use these techniques? When is it appropriate to use a correlation analysis? How do you interpret the results of a correlation analysis - what does the correlation coefficient mean? When is it appropriate to use simple regression analysis? How do you interpret the intercept and slope terms in a regression analysis? How do you determine if a regression analysis will produce meaningful predictive results? Please answer each question, point by point.
Explanation / Answer
A correlation analysis is done to examine the extent of linear dependance between the two variables. The correlation coefficient R tells about this. It can have any value between -1 to +1. A value of 0 means that there is no linear relationship.
A value other than 0 depicts that some linear dependance is there. The higher the value, the stronger is the dependance. A +ve value shows that if X rises Y also rises and vice versa. A -ve value shows that if X rises Y falls linearly and vice versa.
The significance of a correlation analysis is determined using the p-value analysis at a chosen significance level ( say 0.05). If the p-value is less than the significance level, then it is safe to assume that R gives a true (significant) result.
When linear regression is used, the variables are assumed to have a linear relationship. We attempt to develop a linear model ( a linear equation ) for the relation between the variables.
The slope of the line tells about the strength of their dependance.
The intercept is sometimes meaningful and sometimes not. For instance, if you are measuring growth as a result of rainfall, then one should expect 0 growth when there is 0 rain. But this is usually not the case when data is collected. So this means y-intercept here is meaningless and is just acting as a placeholder so that the relation obtained is not biased.