Chapter 10 Notes

©2008 by S. Gramlich

Correlation and Regression

! = Important Note

! These Notes are not meant to replace Reading.  Read Chapter first

 

10-2

Correlation (Pearson Product Moment) = measures strength of linear relationship between 2 variables

population parameter is ρ, sample statistic is r

correlation can only be a value between -1 & +1, that is, -1 <= r <= +1

            2 ways to inspect:  1) Scatterplot or 2) formula

 

Scatterplot (Scatter Diagram) = plot (x,y) ordered pairs like in Algebra

horizontal axis represents the independent variable (X-values or explanatory or predictor)

vertical axis represents the dependent variable (Y-values or response or criterion)

if the pattern of scatterplot rises from your left to right (like a + slope) then r will be close to +1

if the pattern of scatterplot falls from your left to right (like a - slope) then r will be close to -1

if there is no visible pattern then r will be close to 0

 

r = sxy / (sxsy)

where  sxy = covariance = SSxy / (n-1) = Σ[(x-xbar)*(y-ybar)] / (n-1)

            SSxy = sum of cross products = Σ[(x-xbar)*(y-ybar)]

sx = standard deviation for x = √[Σ(x-xbar)2 / (n-1)]

sy = standard deviation for y = √[Σ(y-ybar)2 / (n-1)]

! I prefer a variation of the formula on p. 529, whereas the text uses formula 10-1 on page 520

 

To see if there is significant relationship between 2 variables, perform a correlation HT:

State Hypotheses:

H0:  ρ = 0

            H1:  ρ ≠ 0

            ! (2 tailed test)

            Find cv from Table A-6

            r will be used as the ts

            Traditional Decision Rule (Compare r to cv):

            If r is visually inside critical region, Reject Null (significant relationship)

            If r is visually outside critical region, Fail to Reject Null (relationship not significant)

 

10-3

Regression Line = the line that best fits through the scatterplot and best minimizes the distance between the observed y values and y values on the line (least squares property)

Yhat = b0 + b1X {recall Y = mX + b from Algebra}

b1 = slope and found by b1 = SSxy / SSx

where SSxy = sum of cross products = Σ[(x-xbar)*(y-ybar)]

and SSx = Sum of Squares for X = Σ(x-xbar)2

! I prefer using this formula whereas the text uses formula 10-2 on page 542

b0 = y-intercept (or constant) found by b0 = ybar - b1 * xbar

Only find the Regression line if there is Significant linear relationship between iv and dv from correlation HT above.

The regression line is used to predict values other values for X and Yhat is called the Predicted Value.

If the correlation isn't significant then the best predicted value for any X is just the mean for Y (Ybar).

 

TECHNOLOGY

using StatCrunch:

! data has to be entered in columns in StatCrunch spreadsheet

for Correlation HT:

Stat - Summary Stats - Correlation - (Select Columns) - Next - (check Display 2 sided P-val from sig test) - Calculate

for Regression HT:

Stat - Regression - Multiple Linear - (select X Variables & Y Variable column) - Calculate

 

Excel commands:

! highlight the data you want to calculate inside the parentheses

independent variable data set = iv, dependent variable data set = dv

Statistic                                   Excel Command

Correlation                              =correl(dv,iv)

Slope                                       =slope(dv,iv)

Y-intercept                              =intercept(dv,iv)

Sum Squares Total                  =DevSq(dv)

! dv must be entered first

 

EXCEL Data Analysis Procedure

Tools - Data Analysis - Regression - ok - (highlight & enter data) - ok

! x-variables have to be in adjacent columns

! The Data Analysis add-in must be added in first

! in Excel 2007 the Data Analysis procedures are found the Data menu not the Tools menu