Statistics 343

Applied Regression Methods, Autumn 1995

Instructor:	Ronald Thisted
Office:	Eckhart 126
Phone:	702-8332 (voice mail)
email:	r-thisted@uchicago.edu

Prerequisites

  1. Mathematical analysis (advanced calculus)
  2. Introduction to statistical theory (Stat 244-245)
  3. Matrix linear algebra (Math 250)

Required Textbooks

Venables, WN and Ripley, BD (1994). Modern Applied Statistics with S-Plus. Springer-Verlag: New York. [Ordered at Seminary Coop Bookstore]

Weisberg, Sanford (1985). Applied Linear Regression, Second edition. New York: Wiley. Abbreviation: ALR. [Ordered at Seminary Coop Bookstore]

Additional Resources

Becker, Richard A., Chambers, John M., and Wilks, Allan R. (1988). The New S Language. Pacific Grove: Wadsworth & Brooks/Cole. Abbreviation: NSL.

Chambers, John M., and Hastie, Trevor J., eds. (1992). Statistical Models in S. Pacific Grove: Wadsworth & Brooks/Cole. Abbreviation: SMS.

McCullagh, Peter, and Nelder, John (1989). Generalized Linear Models, Second Edition. London: Chapman & Hall.

Mosteller, Frederick and Tukey, John W. (1977). Data Analysis and Regression: A Second Course in Statistics. Reading: Addison-Wesley.

Rao, C. R. (1973). Linear Statistical Inference and its Applications, Second Edition. New York: Wiley.

Thisted, Ronald A. (1988). Elements of Statistical Computing: Numerical Computation. New York: Chapman & Hall.

Computational Information

1. We shall be using S-Plus for most purposes in this course. This program is available on the Statistics Department computers, as well as on the SUN Cluster. If you are not a member of the Statistics Department, you should use the SUN Cluster for access to S-Plus and to the data sets used in class. Important: If you are not using the Statistics Department computers, you are responsible for obtaining an account and for learning the computing environment on the SUN Cluster, or another computer system.

2. All of the data sets from Weisberg's book, Applied Linear Regression, are available in the directory /ga/thisted/343 on galton. The files names are of the form ALRnnn, where nnn denotes the three-digit page number on which the data set appears. Note the UPPER CASE letters in the file names. These data are also available on the Sun Cluster in the directory /nfs/quads/q2/rats/343.

3. If you will be working on the Sun Cluster instead of galton, read the document entitled, "Using S+ on the Sun Cluster."

Example. Here are some computations in S to help you get started. The data are from exercise 1.2 in ALR. The statement "S UCINIT" should be done once and only once in each directory in which you plan to use S. It sets up a file called .Data, and turns off an invisible file .Audit that would otherwise grow without bound.

galton% mkdir Stat343
galton% cd Stat343
galton%  S UCINIT	      # This sets up your Stat343 directory (one time only)
/ga/thisted/Stat343/.Data has been created and new S can be run in
... directory /ga/thisted/Stat343
/ga/thisted/Stat343/.Data/.Audit is now null and locked against use.
.. If you need Audit, see S notes on Audit.
galton% Splus -e         # On the SUN cluster, omit "-e"
Warning: Cannot open audit file	      # This is entirely normal if you have set up S correctly
> x <- read.table("/ga/thisted/343/ALR028") 
> # On the sun cluster, the file name should be "/nsf/quads/q2/~thisted/343/ALR028"
> x
      V1     V2
 1 210.8 29.211
 2 210.2 28.559
 3 208.4 27.972

  ... etc ...

30 181.0 15.919
31 180.6 15.376
> reg.out <- lm(x[,2] ~ x[,1])
> summary(reg.out)

Call: lm(formula = x[, 2] ~ x[, 1])
Residuals:
     Min      1Q   Median     3Q    Max
 -0.6138 -0.2497 -0.09921 0.2636 0.8123

Coefficients:
               Value Std. Error  t value Pr(>|t|)
(Intercept) -64.4128   1.4292   -45.0702   0.0000
     x[, 1]   0.4403   0.0074    59.1431   0.0000

Residual standard error: 0.3563 on 29 degrees of freedom
Multiple R-Squared: 0.9918
F-statistic: 3498 on 1 and 29 degrees of freedom, the p-value is 0

Correlation of Coefficients:
       (Intercept)
x[, 1] -0.999
> reg.out
Call:
lm(formula = x[, 2] ~ x[, 1])

Coefficients:
 (Intercept)    x[, 1]
   -64.41275 0.4402819

Degrees of freedom: 31 total; 29 residual
Residual standard error: 0.356344
> q()
galton%

Press here to obtain figures from the opening handout.