Chapter 10 Notes
©2008 by S. Gramlich
Correlation and
Regression
! = Important Note
! These Notes are not meant to replace Reading. Read Chapter first
10-2
Correlation (Pearson Product Moment) = measures strength of linear relationship between 2 variables
population parameter is ρ, sample statistic is r
correlation can only be a value between -1 & +1, that is, -1 <= r <= +1
2 ways to inspect: 1) Scatterplot or 2) formula
Scatterplot (Scatter Diagram) = plot (x,y) ordered pairs like in Algebra
horizontal axis represents the independent variable (X-values or explanatory or predictor)
vertical axis represents the dependent variable (Y-values or response or criterion)
if the pattern of scatterplot rises from your left to right (like a + slope) then r will be close to +1
if the pattern of scatterplot falls from your left to right (like a - slope) then r will be close to -1
if there is no visible pattern then r will be close to 0
r = sxy /
(sxsy)
where sxy =
covariance = SSxy / (n-1) = Σ[(x-xbar)*(y-ybar)] /
(n-1)
SSxy = sum
of cross products = Σ[(x-xbar)*(y-ybar)]
sx = standard deviation for x = √[Σ(x-xbar)2
/ (n-1)]
sy = standard deviation for y = √[Σ(y-ybar)2
/ (n-1)]
! I prefer a variation of the formula on p. 529, whereas the text uses formula 10-1 on page 520
To see if there is significant relationship between 2 variables, perform a correlation HT:
State Hypotheses:
H0: ρ = 0
H1: ρ ≠ 0
! (2 tailed test)
Find cv from Table A-6
r will be used as the ts
Traditional Decision Rule (Compare r to cv):
If r is visually inside critical region, Reject Null (significant relationship)
If r is visually outside critical region, Fail to Reject Null (relationship not significant)
10-3
Regression Line = the line that best fits through the scatterplot and best minimizes the distance between the observed y values and y values on the line (least squares property)
Yhat = b0 + b1X {recall Y = mX + b from Algebra}
b1 = slope
and found by b1 = SSxy
/ SSx
where SSxy = sum of cross
products = Σ[(x-xbar)*(y-ybar)]
and SSx = Sum of Squares for X = Σ(x-xbar)2
! I prefer using this formula whereas the text uses formula 10-2 on page 542
b0 = y-intercept (or constant) found by b0 = ybar - b1 * xbar
Only find the Regression line if there is Significant linear relationship between iv and dv from correlation HT above.
The regression line is used to predict values other values for X and Yhat is called the Predicted Value.
If the correlation isn't significant then the best predicted value for any X is just the mean for Y (Ybar).
TECHNOLOGY
using StatCrunch:
! data has to be entered in columns in StatCrunch spreadsheet
for Correlation HT:
Stat - Summary Stats - Correlation - (Select Columns) - Next - (check Display 2 sided P-val from sig test) - Calculate
for Regression HT:
Stat - Regression - Multiple Linear - (select X Variables & Y Variable column) - Calculate
Excel commands:
! highlight the data you want to calculate inside the parentheses
independent variable data set = iv, dependent variable data set = dv
Statistic Excel Command
Correlation =correl(dv,iv)
Slope =slope(dv,iv)
Y-intercept =intercept(dv,iv)
Sum Squares Total =DevSq(dv)
! dv must be entered first
EXCEL Data Analysis Procedure
Tools - Data Analysis - Regression - ok - (highlight & enter data) - ok
! x-variables have to be in adjacent columns
! The Data Analysis add-in must be added in first
! in Excel 2007 the Data Analysis procedures are found the Data menu not the Tools menu