Statistics 227
Final Assignment Problem 1
Due noon Wednesday, 11 December 1996
Essential information
The data set bodyfat.dat
(in the Other Data Sets folder)
contains estimates of the percentage
of body fat determined by underwater
weighing for 252 men along with
various body circumference measurements. Accurate measurement of body
fat (e.g. by underwater weighing) can be inconvenient or costly and it is
desirable to have easier methods of estimating body fat, such as those
based on circumference measurements that are
easily obtained. The goal is to develop a predictive equation for
body fat in
terms of the other measurements.
The columns in the data set, from left to right, are:
- Density determined from underwater weighing
- Percent body fat from Siri's (1956) equation
- Age (years)
- Weight (lbs)
- Height (inches)
- Neck circumference (cm)
- Chest circumference (cm)
- Abdomen 2 circumference (cm)
- Hip circumference (cm)
- Thigh circumference (cm)
- Knee circumference (cm)
- Ankle circumference (cm)
- Biceps (extended) circumference (cm)
- Forearm circumference (cm)
- Wrist circumference (cm)
Some guidelines:
There are a large number of predictor variables, some of them very highly
correlated. Rather than trying to evaluate every possible combination
of predictors, think hard about what variables you expect will be
important and in what combinations. You might think about classifying
variables into categories such trunk variables, arm variables, and leg
variables. Alternatively, you might want to put wrist and ankle
together, bicep and thigh together, etc. The idea would be to find
categories of variables that might be expected to make similar
contributions to the model. If their contributions do turn out to be
redundant, this could be a way of weeding out some
variables. When you have narrowed things down to a few variables, then
you can start comparing models. (Feel free to try transforming variables
if you think it will help.) There will likely be a number of
reasonable models. Summarize and interpret your results. Note any
anomalies in the data and their impact. Your submission for this
problem should consist of at most four typewritten pages, including any
tables or figures.
More Details:
A variety of popular health books suggest that the readers assess their
health, at least in part, by estimating their percentage of body fat.
One text suggests that readers estimate body fat from tables
using their age and various skin-fold measurements obtained by using a
caliper. Other texts give predictive equations for body fat using body
circumference measurements (e.g. abdominal circumference) and/or
skin-fold measurements.
Percentage of body fat for an individual can be estimated once body
density
has been determined. It is often assumed that the body consists
of two components, lean body tissue and fat tissue. Letting
D = Body Density (gm/cm^3)
A = proportion of lean body tissue
B = proportion of fat tissue (A+B=1)
a = density of lean body tissue (gm/cm^3)
b = density of fat tissue (gm/cm^3)
we have
D = 1/[(A/a) + (B/b)]
solving for B we find
B = (1/D)*[ab/(a-b)] - [b/(a-b)].
Using the estimates a=1.10 gm/cm^3 and b=0.90 gm/cm^3 we come up with
"Siri's equation":
Percentage of Body Fat (i.e. 100*B) = 495/D - 450.
Volume, and hence body density, can be accurately measured a variety of
ways.
The technique of underwater weighing "computes body volume as the
difference
between body weight measured in air and weight measured during water
submersion. In other words, body volume is equal to the loss of weight
in
water with the appropriate temperature correction for the water's
density". Using this technique,
Body Density = WA/[(WA-WW)/c.f. - LV]
where
WA = Weight in air (kg)
WW = Weight in water (kg)
c.f. = Water correction factor (=1 at 39.2 deg F as one-gram of water
occupies exactly one cm^3 at this temperature, =.997 at 76-78 deg F)
LV = Residual Lung Volume (liters)
This data set and documentation were obtained from StatLib, submitted by
Roger Johnson of Carleton College, and originally supplied by Dr. A. Garth
Fisher. More complete references to the literature discussed above are
cited there (http://www.stat.cmu.edu/datasets/).