Assignments for Lesson 11

1. Refer to the USEDCARS data. Suppose that you want to see if there are significant differences among the prices charged by the dealers. However, you feel that it is necessary to compensate for the annual depreciation of the cars. Some dealers may primarily stock newer cars, and, naturally, their prices will be higher. Perform an analysis of covariance by considering the price of the car to be the response, predicted by both a categorical variable for the used car dealer and a continuous variable representing the year in which the car was manufactured. Write down another variable, which may not necessarily appear in the data, that you would expect to be another significant predictor of the cost of a used car.

2. Refer to the DOGS data. Create a new dataset with 75 observations and four variables: dog, drug concentration received, week of treatment, and the eosinophil count in that week for that dog. Consider dog, drug concentration, and week to be categorical variables, and fit an analysis of variance model for eosinophil count as a function of concentration, week, the concentration-by-week interaction, and dog nested within concentration. This is one way to conduct a repeated-measures analysis of variance.

3. Refer to the CATS data. Create a new dataset with the following three variables: name of the cat, week, and the ratio of the GFR of the treated kidney divided by the GFR of the untreated kidney for that cat in that week. This dataset should have 24 observations, with three observations per cat. Now, consider the week and cat to be categorical variables, and fit a general linear model to predict the GFR ratio using week and cat effects. This is an example of a two-factor analysis of variance model.

4. Refer to the SOCCER data. Create a new variable which defines each player's primary position as the one listed in the data if only one is given or as the one listed first if two positions are given. (For example, Kerri Doran's position is listed as MF/D, so her primary position would be MF.) Perform a one-way analysis of variance to see if the heights of the players (in inches) can be predicted by the categorical primary position variable.

Strictly speaking, the soccer players are not a random sample from some well-defined population, so the usual assumptions about the F test are not valid. However, we can use the statistics in the ANOVA table as descriptive measures of variation between and within primary positions.

5. Refer to the LIMES data. Suppose that a grower wants to see if the juice volumes of the limes depend on the times that they are harvested. Of course, juice volumes could also vary according to size. Create a new categorical variable to denote time as "early," "middle," or "late" season, depending on whether the limes were picked in February, March, or April, respectively. Fit an analysis of covariance model to predict juice volumes from the categorical time variable and fruit diameter, regarded as a continuous variable. (Hint: If X is a SAS date, the SAS function MONTH(X) returns the month of the year, expressed as an integer from 1 to 12, for that date.)

6. Refer to the IRIS data. Stepwise discriminant analysis could be used to describe the biological characteristics which best differentiate the three species from each other. The first part of this analysis is to perform a one-way analysis of variance for each of the four plant measurements. Calculate four one-way ANOVA models to predict sepal lengths, sepal widths, petal lengths, and petal widths from the categorical species variable. Then, write down the list of the four variables in order by their F statistics, which are labeled with "F Value." The variable with the highest F statistic is the best single-variable discriminant. (Note: There is a special procedure to do this in SAS. If you find it, you may use it.)

7. Refer to the MANATEES data. Transpose the data to form a new dataset with three variables: years (expressed as the calendar year minus 1975), cause of death, and count. Suppose that you want to compare baseline counts (in 1975) of each cause of death as well as rates of increase or decrease in those death rates for each cause. One way to do this is to fit a regression model in which each cause of death has its own slope and intercept with respect to time. Using the transposed dataset, fit a general linear model with the count of dead manatees as the response, and use years since 1975, cause, and the year-by-cause interaction as predictors. Regard elapsed time as a continuous variable and the cause of death as a categorical variable.

8. Refer to the DOGS data. Suppose that we want to see if the white blood cell count at week 0 depended on any of the demographic information we know about the dogs. If so, then we may want to compensate for such differences before evaluating the treatment effects. Find the logarithm of the white blood cell count at week 0 and model that as a function of gender as a categorical variable, age in months as a continuous variable, length of hair as a categorical variable, and weight in pounds as a continuous variable.


Return to STA 5106 home page