Chapter 2 Notes

©2025, 2020, 2008 by S. Gramlich

updated 10/12/2025

 

Definitions and Describing Data (Visually)

! = Important Note

! These Notes are not meant to replace Reading.  Read Chapter first.

 

2-1

Frequency Table = lists counts of # data points for each class (called frequency = f)

Sample Size = n = sum of all f  (formula  n = Σf)

Σ = Greek uppercase Sigma = add up all the values

!  Frequency tables will differ depending upon how many classes are initially chosen.

using StatCrunch:  Stat - Tables -Frequency - Select Column - Calculate

 

Relative Frequency Table = lists relative frequency for each class (found by taking f/n for each class)

f/n = f divided by n

 

Cumulative Frequency Table = lists cumulative frequency for each class (found by adding the previous f iteratively)

 

2-2

Histogram = bar graph with class boundaries as horizontal axis and f as vertical axis

! Again Histograms will differ depending upon how many classes initially chosen

using StatCrunch:  Graphics - Histogram - Select Column - Create Graph

 

2-3

Frequency Polygon = connect midpoints on bars of histogram

Will show you 1 of 3 basic shapes or Distributions:

            Normal (0 skewness) = bell shaped and symmetric, hump in middle

            Skewed Right = tapered to Right, hump to Left

            Skewed Left = tapered to Left, hump to Right

 

Stemplot (stem-and-leaf plot) = stem has left most digit, leaf has rightmost digit

using StatCrunch:  Graphics - Stem and Leaf - Select Column - Create Graph

 

Pareto Chart = qualitative data bar graph; categories on horizontal axis from hi to lo

 

2-4

Correlation (Pearson Product Moment) = measures strength of linear relationship between 2 variables

            linear = straight line

population parameter is ρ, sample statistic is r

correlation can only be a value between -1 & +1, that is, -1 <= r <= +1

            2 ways to inspect:  1) Scatterplot or 2) formula (see Chapter 10)

 

Scatterplot (Scatter Diagram) = plot (x,y) ordered pairs like in Algebra

horizontal axis has x-values for independent (iv) or explanatory or predictor variable

vertical axis has y-values for dependent (dv) or response or criterion variable.

if the pattern of scatterplot rises from your left to right (like a + slope) then r will be close to +1

if the pattern of scatterplot falls from your left to right (like a - slope) then r will be close to -1

if there is no visible pattern then r will be close to 0

 

To see if there is significant relationship between 2 variables, perform a Correlation Hypothesis Test (HT):

1)      State Hypotheses:

Null Hypothesis H0:  ρ = 0 (no significant linear correlation)

Alternative Hypothesis H1:  ρ ≠ 0 (significant linear correlation)

2)      Find critical value r (cv) with n= # of pairs of data from Table 2-11 or A-6.

3)      Correlation r will be used as the sample test statistic.

! (2 tailed test)

4)      Traditional Decision Rule (compare r to cv):

If r is visually inside critical tail region or |r| >= |cv|, Reject Null (significant correlation relationship).

If r is visually outside critical tail region or |r| < |cv|, Fail to Reject Null (correlation relationship not significant).

5)      P-Value Decision Rule (compare P-Value to 5%):

P-Value= probability of results to occur by chance.

if P-Value  <= .05, significant linear correlation (Reject H0).

            if P-Value  > .05, no significant linear correlation  (Fail to Reject H0).

 

Regression Line = the line that best fits through the scatterplot and best minimizes the distance between the observed y values and y values on the line (least squares property)

Yhat = b0 + b1X {recall Y = mX + b from Algebra}

b1 = m= slope

b0 = b =y-intercept (or constant)

Only find the Regression line if there is significant linear relationship between iv and dv from correlation HT above.

The regression line is used to find Predicted Values Yhat for x.

If the correlation isn't significant then the best predicted value for any X is just the mean for Y (Ybar).

 

TECHNOLOGY

using StatCrunch:

to find any of the above graphs: {File menu at top} Graph – {then select desired graph i.e. Scatter Plot}

 

! for Correlation HT, data has to be entered in columns in StatCrunch spreadsheet:

Stat - Summary Stats - Correlation - (Select Columns) - Next - (check Display 2 sided P-val from sig test) - Calculate

for Regression HT:

Stat - Regression - Multiple Linear - (select X Variables & Y Variable column) - Calculate

 

Excel commands:

! highlight the data you want to calculate inside the parentheses

independent variable data set = iv, dependent variable data set = dv

Statistic                                   Excel Command

Correlation                              =correl(dv,iv)

Slope                                       =slope(dv,iv)

Y-intercept                              =intercept(dv,iv)

Sum Squares Total                  =DevSq(dv)

! dv must be entered first

 

EXCEL Data Analysis Procedure

Tools - Data Analysis - Regression - ok - (highlight & enter data) - ok

! x-variables have to be in adjacent columns

! The Data Analysis add-in must be added in first

! in Excel 2007 the Data Analysis procedures are found the Data menu not the Tools menu