Assignments for Lesson 7

1. Refer to the CLINTON data. Write a SAS program which reads the data. Create a new variable which indicates the number of days elapsed between successive polls. Set the elapsed time for January 24, 1993 (the day of the first poll) to be a missing value. Then, use PROC UNIVARIATE to examine the distribution of those elapsed times. Identify the 5 shortest and 5 longest elapsed times with dates, using an appropriate format for the date. What was happening to President Clinton when the opinion polls were spaced very close together?

2. Refer to the USEDCARS data. Suppose that an econometrician wanted to use results applicable to the normal distribution to describe the prices of used cars. Use PROC UNIVARIATE to decide whether the prices or the logarithms of the prices more closely follow a normal distribution. Write down at least two findings from PROC UNIVARIATE to support your claim.

3. The midrange is a statistic which is sometimes used to report a central value of a distribution. The midrange is defined as (minimum value + maximum value)/2. Refer to the LIMES data. Use PROC UNIVARIATE to calculate the midrange of the juice liquid volumes of the limes. For this problem, you must use SAS to perform all of the calculations; for example, you may not find the minimum and maximum with SAS, then calculate the midrange by hand.

4. Refer to the IRIS data. Suppose that you want to present summary statistics to someone who is not familiar with SAS PROC UNIVARIATE output. Prepare a printout which lists only the sample size, mean, and standard deviation of the sepal widths separately for each of the three iris species. Use SAS commands to explicitly label these numbers on your printout as "Sample Size," "Mean," and "Standard deviation."

5. Refer to the CATS data. Suppose that the veterinarian wants to see if the treatment had altered kidney function within the first week after surgery. One way to do this is to perform a paired t-test. Calculate a new variable representing ((GFR of the untreated kidney in Week 1) minus (GFR of the surgically-treated kidney in Week 1)) for each of the eight cats. Then, apply PROC UNIVARIATE to those differences. The p-value of the 2-sided t-test is the number marked Pr>|T|. Based on this value, would you decide that the surgery had an effect after one week?

6. In some situations, a trimmed mean is used as a measure of the central value of a distribution. To calculate a trimmed mean, equal numbers of the lowest and highest observations are removed from the data, and the remaining observations are averaged. Trimmed means are used in the Olympics. In subjectively-scored events such as diving, the lowest and highest ratings given by judges are thrown out, and the remaining scores are averaged.

Refer to the GRADES dataset. Use a SAS program to find the total number of points earned by each student. You should find that the lowest point total is 63 and the largest is 103. Then, calculate the trimmed mean of the point totals after removing the single lowest point total and the single highest point total, but do not explicitly write 63 and 103 in the program.

7. In statistical analyses, it is important to consider the impact that outliers may have on the results. In the book Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, Upper Saddle River, NJ, 1997), an outlier is defined as an observation which either exceeds Q3 + (1.5(Q3-Q1)) or is less than Q1-(1.5(Q3-Q1)), where Q1 and Q3 refer to the first and third quartiles, respectively. In PROC UNIVARIATE with the PLOT option, SAS indicates outliers with the character "0" in the boxplots. Extreme outliers are defined by using 3 rather than 1.5 in the formulas above, and SAS marks them with "*".

Refer to the USEDCARS data. Using the definition above, analyze the prices of the cars and create a new variable which has a value of 1 if the observation is an outlier, 0 otherwise. Print the year, manufacturer, model, price, and the new indicator variable for outliers. (You would need to perform these steps if you wanted to delete the outliers from further analyses.)

8. Refer to the DOGS data. The researcher may want to verify that, before the shampoo treatments were administered, the distributions of white blood cells in the three treatment groups were roughly the same. (For example, we might be concerned about the statistical analyses if the dogs with the lowest initial white blood cell counts all received the highest drug concentration of 2 grams per bottle.) Use PROC UNIVARIATE to produce side-by-side boxplots of the distributions of the white blood cell counts in the three treatment groups. Would you conclude that these distributions were approximately the same within each group?


Return to STA 5106 home page