Chapter 1: Introduction
A few terms:
Data
Populations
Samples
Parameters
Statistics
Branches of Statistics:
Descriptive
Inferential
Classifying Data
Two types:
Qualitative
Quantitative
Levels of Measurement
Nominal
Ordinal
Interval
Ratio
Experimental Design
Data Collection
Experiment
Simulation
Census
Sampling
Sampling Techniques
Simple random sampling
Stratified sampling
Cluster sampling
Systematic sampling
Convenience sampling
Chapter 2: Frequency Distributions and Their Graphs
Frequency distribution key words:
Classes
Class size
Frequency (f)
Class width
Upper and lower class limits
Midpoints
Relative frequency
Cumulative frequency - ogive
Frequency histogram
Frequency polygon
Questions:
How do you find the mean from a frequency distribution? The standard deviation?
Other types of graphs
Stem-and-leaf plot
Pie chart
Pareto chart
Scatter plot
Measures of central tendency
Mean
Median
Mode
Weighted mean (relate to mean of a frequency distribution)
Shapes of distributions
Symmetric
Uniform
Skewed (left or right/negative or positive)
Measures of variation
Range
Variance
Standard deviation (population versus sample – calculate differently!)
What does the standard deviation mean?!?
Empirical rule
Chebychev’s theorem
Measures of position
Quartiles and the Interquartile Range
Percentiles
Chapter 3 – Probability
Terms:
Probability
Outcome
Sample space
Event
Classical probability vs. Empirical probability (f/n)
Law of large numbers
Complement of an event
Conditional probability
What is the P(B|A) ?
Independence: P(B|A) = P(B)
Multiplication Rule
P(A and B) = P(A) · P(B|A)
Mutually exclusive events
Addition Rule: P(A or B) = P(A) + P(B) – P(A and B)
Counting and Factorials!
Permutations
Combinations
With replacement and without replacement
Chapter 4: Discrete Probability Distributions
Mean, standard deviation, and expected value of a discrete probability distribution
Discrete Distributions:
Binomial distribution
Mean, variance and standard deviation
Geometric Distribution
Poisson
Chapter 5: Normal Probability Distributions
A continuous distribution
Bell-shaped
Mean, median and mode are equal
Empirical rule
68% of area lies within 1σ of μ
95% of area lies within 2σ of μ
99.7% of area lies within 3σ of μ
Z scores (standard normal)
Finding area under the curve (to the left of Z)
Sampling distribution
Central limit theorem
Applying the Central limit theorem:
Normal approximation to binomial distribution
Rule of thumb for using it?
m = np
s = npq
Continuity Correction
Chapter 6: Confidence Intervals
Point estimate, interval estimate, level of confidence (c)
E: Maximum error of the estimate
E = zc s/ n
Confidence interval for m (n 30)
Confidence interval for m (n < 30, s unknown)
Confidence interval for population proportion, p
Rule of thumb for use
Finding minimum sample size to estimate m
Finding minimum sample size to estimate p
When to use z, when to use t?
Confidence interval for
s and s2What are the guidelines?
Chapter 7: Hypothesis Testing with One Sample
Null and Alternative hypotheses
Equality ALWAYS included in the H0
Claim can be in either H0 or Ha
Hypotheses are ALWAYS about population parameters (NEVER about sample statistics)
Type I and II errors
Level of significance
Right-tailed, left-tailed and two-tailed tests (and determining critical values)
Rejection region and making a decision to reject or not reject H0
P-value
Finding critical values for Z, t, and c2 distributions
Hypothesis testing for proportions
Rule of thumb:
Z =
Hypothesis testing for s and s2
Chapter 8. Hypothesis Testing with Two Samples
Two sample Z test for difference between means (n>=30 for both populations)
z=
Two sample Z test for difference between means (n < 30 for at least one population)
Equal variances
t=
Unequal variances
t=
Testing the difference between means (dependent samples)
Use paired difference
t=
Difference between proportions
z=
Chapter 9 Correlation and Regression
Correlation coefficient r =
Correlation is not causality!
t-test for correlation coefficient
t=
Linear regression
Regression line y=mx+b
Residuals sum to zero
m= b=
Coefficient of determination r2 = explained variation (by line)/total variation
Standard error of estimate se =
Prediction intervals E for a particular value of x=x0.
E=
Chapter 10. Chi-square and F tests
Goodness of fit tests
c
2 =where O = observal values from sample, E= claimed values (null hypothesis)
This is a right tailed chi-square test with k-1 degrees of freedom (# categories –1)
Test for independence. Just like goodness of fit test except degrees of freedom = (r-1)(c-1). E values are obtained by multiplying column sums by row sums over total sample size. Also a right-tailed chi-square test.
Comparing two variances. The ratio of variances from two normally distributed population is F distributed.
F = s12/s22
Put larger variance in numerator. Can be either a 1 or two tailed test. Critical F depends on degrees of freedom of numerator (n1-1) and of denominator (n2-1)
Analysis of Variance
F= MSB/MS