Assignments for Lesson 9

1. Refer to the DOGS1 data. The investigator may want to ensure that the dogs allocated to each treatment group were of similar compositions with respect to gender and hair coat. Use PROC FREQ to conduct Fisher's exact test to see if the concentration of the drug received was independent of the gender of the dog. Likewise, see if the length of the coat and the drug treatment were statistically independent with Fisher's exact test. Write your interpretation of the results of these tests.

2. Refer to the MANATEES data. Your task is to see if the proportion of manatees killed by human-related causes has remained about the same through time or if this proportion has changed significantly from year to year. Create a dataset with 46 observations and three variables: year, cause indicator (human or non-human), and total number of deaths. Define deaths attributed to humans as the sum of deaths from all watercraft, flood gates, canal locks, and other human-related deaths. Non-human deaths include perinatal, other natural, and undetermined causes of death. Then, apply a chi-square test to the table of year versus cause of death (human or non-human). Make sure that this table correctly indicates total numbers of manatees in each cell.

3. Refer to the LIMES dataset. Fruits and vegetables are sometimes classified into groups by size for sale on the market. Classify the diameters of the limes as "small," "medium," and "large," using respective cutoff points of less than 5 cm, 5 to 6 cm, and over 6 cm. Likewise, classify the juice volumes as "low," "medium," and "high," using cutoff points of less than 20 ml, 20 to 40 ml, and more than 40 ml. Delete any observations which have missing diameters or missing juice volumes from the dataset. Prepare a contingency table showing the size classification versus the juiciness classification. Make sure that the rows and columns in the table are labeled with the words shown above and that they appear in the proper order from the lowest category to the highest category. Does the table indicate that larger limes tend to have more juice?

4. Refer to the SOCCER dataset. Suppose that Coach Burleigh needs to know how many players of each academic class are available at each position on the team, so that she can establish her priorities for recruiting this year. Prepare a table showing the number of players at each position (goalkeeper, forward, ...) in each academic class (freshman, sophomore, etc.). Make sure that the classes appear in proper order, from freshman to senior, in the table. What position would you advise the coach to recruit in the next year?

Some of the players have two positions listed; for example, Jennifer Bransford is listed as a forward/midfielder. For this assignment, use only the first position listed for each player. An easy way to do this is to create the POSITION variable, then create a new variable NEWPOS = SCAN(POSITION, 1, '/'); in a DATA step. This tells SAS to regard the forward slash as a break between "words," and we want to find the first "word" in the sequence of letters given by POSITION.

5. The Gainesville Sun recently published this list of manufacturers and models for the 15 cars which are most likely to be stolen. In descending order of popularity among thieves, these were:

Manufacturer Model
Honda        Accord
Toyota       Camry
Oldsmobile   Cutlass
Honda        Civic
Ford         Mustang
Chevrolet    C/K
Nissan       Maxima
Jeep         Grand Cherokee
Ford         F150
Jeep         Cherokee
Cadillac     Deville
Ford         Taurus
Chevrolet    Caprice
Plymouth     Voyager

Suppose that you work for an insurance company which provides coverage for used car dealers in Gainesville. You want to find which dealerships have the highest risk of having cars stolen from their lots. Create one SAS dataset from the USEDCARS data and another dataset from the information given above. Use programming statements with those two datasets to classify each car as "high-risk" or "low-risk," where a car is at high risk if it appears in the list above. Then, prepare a table showing the car dealerships versus the risk categories.



6. Alan Agresti presented the following data, obtained from Michael Radelet, in the book Categorical Data Analysis (Wiley, New York, 1990). The data describe the circumstances of 326 homicide cases in Florida from 1976-1977.

Defendant's race   Victim's race    Death penalty   Count
Black              White            Yes               11
Black              White            No                52
Black              Black            Yes                6
Black              Black            No                97             
White              White            Yes               19
White              White            No               132
White              Black            Yes                0
White              Black            No                 9

Use PROC FREQ to create appropriate tables to answer the following questions. Write down your answers to these questions, and circle those same numbers on your SAS printout. (In other words, make sure that your tables explicitly show the requested percentages.)

A. In what percentage of cases was the death penalty verdict given?

B. When the defendant was white and the victim was black, in what percentage of cases was the death penalty verdict given?

C. When the defendant was black and the victim was white, in what percentage of cases was the death penalty verdict given?

D. When the races of the victim and the defendant were the same, in what percentage of cases was the death penalty given?

7. Refer to the HOCKEY data. Did Ohio State have a home-ice advantage? Use PROC FREQ to perform Fisher's exact test for the 2X2 table of the outcome of the game ("won" or "lost or tied") versus location (Columbus, Ohio or elsewhere). Don't forget to change the score of the final game to Boston College 5, Ohio State 2.

8. Refer to the HANKS data. Your task is to use a SAS program to count the number of movies in which Tom Hanks appeared in each of the years from 1984-1998, then make a scatterplot with the number of movies made in each year on the vertical axis versus the year on the horizontal axis. In some years, such as 1997, he did not appear in any movies. On the scatterplot, indicate those years by plotting a point at zero. To do this, you could create another dataset with all of the years from 1984 to 1998 by using a DO loop, then MERGE that dataset with the dataset containing the movie counts. Then, you will need to replace missing values for movie counts with zeroes.


Return to STA 5106 home page