Loglinear analysis of cross-classifications [STB-8: smv5.1] ------------------------------------------- ^loglin^ count varlist [^in^ range] [^if^ exp] [^weight^] ,fit(^margins to be fit^) [ltol(^#^) iter(^#^) offset(^variable^) level(^#^) irr anova keep resid collapse] estimates a Poisson maximum-likelihood loglinear model. There are two cases: 1) You have only a summary table, and count indicates the number of cases that fall in each level of varlist, or 2) you have full information on all cases, so that each case should count once. If you fall into case #2, you would be better served to use the ^poisson^ command. For ^loglin^, the ^count^ variable should be a positive integer, a count of the number of cases which fall in the cross-classification of varlist. The counts must be non-negative for all combinations of the independent variables specified in varlist. If a count exactly equals zero, you have three choices: 1) you may assume that it is a ^structural zero^ and replace it with a missing value or a zero cell weight; 2) you may add a small positive constant, for example, .5, to zero cells; or, best of all, 3) you may get more data. Cell weights ------------ If you specify a ^weight^, ^loglin^ will assume that the numbers represent cell weights. The only option for cell weights is frequency weights. If you wish to specify that a particular cell is a ^structural zero^, an appropriate method is to specify a cell weight of zero or missing for that cell. In most instances you will want to use only cell weights of zero or one. Functional Form --------------- This model falls in the class of generalized linear models with a categorical design matrix, a log link, and a poisson distributed disturbance. Thus, the program generates a design matrix similar to the ^anova^ command which is then passed to ^poisson^. The functional form of the model is log-linear: (predicted value) + (offset, if present) E(count) = e or ln E(count) = (predicted value) + (offset, if present) where the predicted value is a linear combination of the design matrix for the categorical independent variables in varlist. If you wish to see estimated expected cell frequencies, residuals, and standardized residuals, specify the ^resid^ option. If the offset is present, it is added onto the predicted value for the purposes of estimation, so that the prediction is actually a predicted rate. ^Anova^ option and Constraints -------------------------------- Like ^anova^, the design matrix for ^loglin^ is not identified, hence constraints must be imposed on estimated parameters in order to generate an unique solution. There are two used in this command: Anova-like and regression-like. In regression-like constraints, redundant levels of independent variables are summarily dropped (the ^first^ level is dropped, then any interaction with it). In anova-like constraints, the ^last^ level is dropped, but the missing level is set equal to -1 times the sum of all the other levels. Interpret regression-like parameter estimates as deviations from the baseline level, and interpret anova-like parameter estimates as deviations from the grand mean. To activate anova-like constraints, specify the ^anova^ option. Otherwise, regression-like constraints will be used. ^Resid^ option ------------ If you specify the ^resid^ option, estimated expected cell frequencies, residuals and standardized residuals will be calculated and displayed as the variables ^cellhat^, ^resid^ and ^stdres^. ^Keep^ option ----------- Normally, the loglin program ^drop^s all the variables it generates for estimation. If you specify the ^keep^ option, these variables, estimated expected cell frequencies, residuals, and standardized residuals will remain in the data set for future use. Only the 1st-order variables (i.e., A1...m, B1...n, C1...o, etc.) will be labeled. Keeping the variables allows the user to create a new design matrix from the already existing variables. It does add substantially to the size of the data set, however. ^Keep^ does not work when ^collapse^ is specified. ^Collapse^ option --------------- Specify the ^collapse^ option ONLY if: 1) your data set contains more variables than you wish to work with in the specific model fit, AND 2) you wish to analyze the subset specified in ^varlist^ AS IF they were the complete table. The ^collapse^ option calculates cell counts for the variables in ^varlist^, adding together the counts from all other variables not in ^varlist^ and placing them in appropriate cells (i.e., it collapses the table). It then generates a temporary data set on which it performs analysis. After calculations are completed, it restores the original data set. Note that if you specify both the ^keep^ and ^collapse^ option, your estimated expected cell frequencies, residuals, and standardized residuals will be displayed, but not saved with your original data set. Fit(^margins to be fit^) ---------------------- To specify a loglinear model, the fit option must be specified. This program generates hierarchical models, so that only the highest interaction must be specified. All lower-level interactions will be automatically included. Separate the margins by commas, and specify interactions with a ^blank^. The fit notation follows that developed by S. Feinberg, 1981, ^The Analysis of^ ^Cross-classified Categorical Data^, Cambridge, MA:MIT Press. For example, suppose we have summary data with three independent variables, ^iv1^, ^iv2^, and ^iv3^, with counts coded in a variable called ^dv^. If we wish to fit an independence model, we type: ^loglin dv iv1 iv2 iv3, fit(iv1,iv2,iv3)^ If we wish to fit a saturated model, we type: ^loglin dv iv1 iv2 iv3, fit(iv1 iv2 iv3)^ An alternative model might be: ^loglin dv iv1 iv2 iv3, fit(iv1 iv2,iv2 iv3)^ Estimation ---------- ^Loglin^ generates the appropriate design matrix and passes that matrix to the ^poisson^ command for estimation. ^Poisson^ uses iteratively reweighted least squares, the estimates of which are equivalent to maximum-likelihood. Convergence ----------- The parameters ^ltol()^ and ^iter()^ may be used to control the maximization process. ^ltol()^ specifies the maximum change in the log likelihood that will be accepted as indicating convergence (default 1e-7), and ^iter()^ specifies the maximum number of iterations (default 100). Other options ------------- The ^level^ option controls display of the confidence interval of the estimate. The ^irr^ option presents estimates in their exponentiated form, i.e., as odds ratios. In either case, see ^help^ for ^poisson^ for details. Also see -------- Manual: [4] Estimate, [5s] Poisson On-line: ^help^ for ^correlate^, ^epitab^, ^linktest^, ^lrtest^, ^predict^, ^test^