Midterm Coverage: Textbook 
  - All of Chapter 1 (except normal quantile plot on p.65-68)
  - Section 2.1-2.4
  - Section 3.1-3.2
  - Section 4.1-4.5
  - Section 5.1-5.2

Midterm Study Guide 

1.1  Graphical Displays of Data
  - Two types of variables -- categorical or quantitative
  - Graphs
    # Pie charts
      + area represents percentage
      + percentages must add up to 1 or 100%
    # Bar graphs
      + percentage may not add up to 1 or 100%
    # Histograms 
      + It's the area that represents percentage
      + symmetric, right-skewed, left-skewed, number of modes
      + position of mean and median
      + outlier 
    # Stemplots
      + how to make a stemplot?
      + back-to-back stemplot?
    # Boxplot
    # Time plots
  - When to use which graph?

1.2 Numerical Descriptions of Data 
  - Mean v.s. Median
  - Five number summary
  - IQR
  - 1.5 IQR rule
  - Boxplot, modified boxplot
  - When to use which numerical summary? 
     # If unimodal, symmetric, no outliers, use "mean + SD"
     # If unimodal, skewed distribution (w/ or w/o outlier), use 5-number summary, Boxplots
     # If multimodal (i.e. clustered), use histograms or stemplots
  - Effects of linear transformation on mean, and SD

1.3 Normal Distributions
  - 68-95-99.7% Rule
  - Using the standard normal table and normal calculation
  - Inverse normal calculations
  - Skip Normal quantile plots (p.65-68)

2.1 Scatter Plot
  - How to read information for one variable in a scatter plot
  - Form, direction, strength of a relationship
  - Are there outliers, clusters
  - Points in different categories can be marked with different
colors or symbols.
  - Use side by side boxplot to display the relationship between one numerical variable and one 
categorical variable

2.2. Correlation r
  - r does not distinguish between x and y
  - r ranges from ?1 to +1, When will r be -1 or 1?
  - r has no units
  - shift x or y has no effect on r,
  - scaling of x or y has no effect on the magnitude of r, at most changes its sign
  - When is it not appropriate to use r to describe the strength of relationships?
     # nonlinear, outlier, or clusters

2.3-2.4 Regression
  - Equation of the Regression line, slope, intercept
  - Use the regression line to predict the response, numerically and graphically
  - Residual = observed y - predicted y 
                   = vertical (signed) distance from a point to the regression line
  - Regression line always pass through the point of means
  - There are two regression lines, the roles of response and explanatory variable are not 
interchangeable
  - Read R output lm(y ~ x)
  - Residuals always sum to zero
  - Residuals have zero correlation with the explanatory variable
  - Residuals have zero correlation with the predicted responses y-hat
  - The sd of the residuals is the square root of (1-r^2) times the SD of the response
  - The mean of the predicted responses is the same as the mean of the response
  - The SD of the predicted responses is r times the SD of the response
  - r^2 is the fraction of variation in the response explained by the explanatory variable 
  - Residual plot
  - Good residual plot: evenly spread around the zero line
  - Bad sign residual plot: nonlinear, heteroscadasticity (unequall spread around the zero line), and 
their implications?
  - identification of outliers and influential observations
  - correlation or regression doesn't imply causation
  - Skip log transformation of variables (Example 2.13 on p.90)
  - Skip section 2.5

3.0-3.1 Observational studies and Experiments
  - difference between an observational study and a experiment
  - What is a confounding variable?
  - Given a study or an experiment, identify possible confounding variables
  - completely randomized design
  - single-blind, double-blind, placebo
  - Randomized Block Design
  - Matched Pair Design

3.2 Sampling Design
  - 4 Keywords: Population, sample, parameter, statistic
  - Bad sampling methods: convenience sampling, voluntary response sampling
  - Better Sampling Designs
     # Simple Random Sampling
     # Stratified Sampling
     # Cluster Sampling
     # Multistage Clustered Sampling
     Given a description of a sampling method, classify it's sampling design
  - Problems in sampling
     # Undercoverage
     # Non-response bias
     # Response bias: wording of questions, design of questionnaire,  attitude of interviewer
Skip 3.3-3.4

4.1, 4.2, 4.5 Probability
  # Probability rules
  # Conditional Probability
  # General Multiplication Rule
  # Independence of Events
  # The Rule of Total Probability
  # Bayesˇ Rule (IMPORTANT!!!)

4.3-4.4 Random Variables
  - Based on the description of a problem, find the destribution of a (discrete) random variable
  - Mean 
  - Variance (2 formulas)
  - Properties of Mean and Variance
     # E(X + c) = E(X) + c, E(cX) = cE(X)
     # Var(X + c) = Var(X)      
     # Var(cX) = c^2Var(X),    SD(cX)= |c|SD(X)
     # E(X + Y) = E(X) + E(Y) (always valid)
     # Var(X + Y) = Var(X) + Var(Y) when X and Y are independent
   - Sums and Means of i.i.d. Random Variables
 
5.1 The Sampling Distribution for a Sample Mean
   - Statistical Model of Simple Random Sampling
      # Observations are nearly i.i.d. when the sample size is big enough
   - CLT

5.2 Sampling Distributions for Counts and Proportions
   - Binomial formula
   - When is all right to use Binomial formula?
   - CLT