Ch1  Statistics: The Art and Science of Learning from Data

Design, Description, Inference

Parameter – numerical summary of a population

Statistic - numerical summary of a sample

                 

 

Ch2 Exploring Data with Graphs and Numerical Summaries

Categorical Variables:

Summarize with Counts and Percentages

Graphs – Bar Charts and Pie Charts

Quantitative Variables:

Graphs – Dotplots, Histograms, Stemplots, Boxplots

Measures of Center – Mean, Median, Mode

Measures of Spread – Range, IQR, Variance, Standard Deviation

Choosing best measures of center or spread for a particular shape distribution

How outliers affect measures of center and spread.

Empirical Rule (68% – 95% – 99.7%)

 

Quartiles and Percentiles

Five Number Summary

z-score

 

 

Ch3 Association:  Contingency, Correlation and Regression

Contingency Tables: Conditional Proportions

Correlation (r): 

Measures  strength and direction of linear association between 2 quantitative vars

positive, negative

strong, weak

number between -1 and +1, no units

Regression:

Equation to predict y from x

x=explanatory (or predictor) variable

y=response variable

Regression Equation: 

  

      slope – average change in y for a one-unit change in x

  y-intercept – expected value of y when x=0, BUT we only interpret

                   if x=0 makes sense and is close to the values of x observed in data

Find the equation using the data summaries

Use line for making predictions

            Residuals = observed y – predicted y     (prediction errors)

Least Squares Method: finds the line that minimizes the sum of squared residuals

R2 = (r) 2  proportion of the variability in y that is explained by the regression on x

Cautions:

            Extrapolation

            Influential Outlier

            Correlation (or Association) does not imply Causation

Simpson’s Paradox – a lurking variable can reverse the association between two

                               categorical variables in a Contingency Table

 

 

Chapter 4: Gathering Data

Experiments vs Observational Studies

Simple Random Sample

 

Surveys:

Margin of Error

Sampling Bias: Undercoverage, Volunteer Samples, Convenience Samples

Nonresponse Bias

Response Bias

Experiments:

Control:  Placebos, Blind Study, Lurking Variables, Matched Pairs (Blocks)

Randomization

Replication

 

Experimental Units

Response Variable

Factors

Treatments

Observational Studies:

Cross-sectional Studies

Retrospective Studies

Prospective Studies

 

 

Chapter 5:  Probability in our Daily Lives

Randomness

Probability

Independent Trials

Sample Space

Complement of an Event:  P(Ac)=1-P(A)

Disjoint Events A and B:   P(A or B) = P(A) + P(B)

Conditional Probability:     P(A | B) = P(A and B) / P(B)

Independent Events A and B: 

                     Definition:   P(A | B) = P(A)

Multiplication Rule:  P(A and B) = P(A) x P(B) 

P(at least one)

Problems of sensitivity and specificity

 

Chapter 6: Probability Distributions

 

Discrete Random Variable:

Finite number of possible values

Probability Distribution:  list, graph or formula with all possible values of X

                                                                                      and their probabilities

Population Mean

 

Continuous Random Variables:

            Infinite number of possible values

            Probabilities are areas under a density curve (smooth) with a total area of 1

Assign probabilities to intervals, not individual values of X

 

 

Normal Probability Distributions:

Bell-shaped curves, indexed by their mean: and standard deviation:

Follows Empirical Rule

z-score: 

Empirical Rule

Using the Z table

area to the left, to the right, in between

value of x for top 5%, bottom 20%, central 50%, etc

 

 

Binomial Distribution:

Each of n trials can have two possible outcomes:  success or failure

Probability of success for each trial is the same: p (independent events)

Binomial Random Variable X counts the number of successes

Mean:  and Standard Deviation

Binomial Formula: