Due noon Wednesday, 11 December 1996

The data set bodyfat.dat (in the Other Data Sets folder) contains estimates of the percentage of body fat determined by underwater weighing for 252 men along with various body circumference measurements. Accurate measurement of body fat (e.g. by underwater weighing) can be inconvenient or costly and it is desirable to have easier methods of estimating body fat, such as those based on circumference measurements that are easily obtained. The goal is to develop a predictive equation for body fat in terms of the other measurements.

The columns in the data set, from left to right, are:

- Density determined from underwater weighing
- Percent body fat from Siri's (1956) equation
- Age (years)
- Weight (lbs)
- Height (inches)
- Neck circumference (cm)
- Chest circumference (cm)
- Abdomen 2 circumference (cm)
- Hip circumference (cm)
- Thigh circumference (cm)
- Knee circumference (cm)
- Ankle circumference (cm)
- Biceps (extended) circumference (cm)
- Forearm circumference (cm)
- Wrist circumference (cm)

There are a large number of predictor variables, some of them very highly correlated. Rather than trying to evaluate every possible combination of predictors, think hard about what variables you expect will be important and in what combinations. You might think about classifying variables into categories such trunk variables, arm variables, and leg variables. Alternatively, you might want to put wrist and ankle together, bicep and thigh together, etc. The idea would be to find categories of variables that might be expected to make similar contributions to the model. If their contributions do turn out to be redundant, this could be a way of weeding out some variables. When you have narrowed things down to a few variables, then you can start comparing models. (Feel free to try transforming variables if you think it will help.) There will likely be a number of reasonable models. Summarize and interpret your results. Note any anomalies in the data and their impact. Your submission for this problem should consist of at most four typewritten pages, including any tables or figures.

A variety of popular health books suggest that the readers assess their health, at least in part, by estimating their percentage of body fat. One text suggests that readers estimate body fat from tables using their age and various skin-fold measurements obtained by using a caliper. Other texts give predictive equations for body fat using body circumference measurements (e.g. abdominal circumference) and/or skin-fold measurements. Percentage of body fat for an individual can be estimated once body density has been determined. It is often assumed that the body consists of two components, lean body tissue and fat tissue. Letting

D = Body Density (gm/cm^3)

A = proportion of lean body tissue

B = proportion of fat tissue (A+B=1)

a = density of lean body tissue (gm/cm^3)

b = density of fat tissue (gm/cm^3)

we have

D = 1/[(A/a) + (B/b)]

solving for B we find

B = (1/D)*[ab/(a-b)] - [b/(a-b)].

Using the estimates a=1.10 gm/cm^3 and b=0.90 gm/cm^3 we come up with "Siri's equation":

Percentage of Body Fat (i.e. 100*B) = 495/D - 450.

Volume, and hence body density, can be accurately measured a variety of ways. The technique of underwater weighing "computes body volume as the difference between body weight measured in air and weight measured during water submersion. In other words, body volume is equal to the loss of weight in water with the appropriate temperature correction for the water's density". Using this technique,

Body Density = WA/[(WA-WW)/c.f. - LV]

where

WA = Weight in air (kg)

WW = Weight in water (kg)

c.f. = Water correction factor (=1 at 39.2 deg F as one-gram of water

occupies exactly one cm^3 at this temperature, =.997 at 76-78 deg F)

LV = Residual Lung Volume (liters)

This data set and documentation were obtained from StatLib, submitted by Roger Johnson of Carleton College, and originally supplied by Dr. A. Garth Fisher. More complete references to the literature discussed above are cited there (http://www.stat.cmu.edu/datasets/).