This topic is not covered in The Little SAS Book.
SAS has many mathematical functions, including log, sine, square root, etc. SAS also has a function to generate random numbers which are uniformly distributed between 0 and 1. These numbers are not "random" in the sense that SAS pulls them out of a hat; indeed, SAS does use a complicated formula to produce its so-called "random" numbers. Instead, they are "random" in the sense that we cannot anticipate in advance the numbers SAS will choose, and the numbers it chooses look as if they came from a uniform distribution.
SAS uses a number that you provide, called a seed, to start its random number generation for the first random number you ask to be generated. The first random number then becomes the seed for the second; the second, for the third, etc. If you use the seed number 0, then SAS will begin its random number generation based on the time of day, according to the computer, at which you issued the command. You can also use positive integers as seed values.
The following program produces 100 random uniform numbers between 0 and 1 and provides the univariate statistics for the random sample.
data randdata; do i=1 to 100; x=ranuni(293317); output; end; proc univariate plot data=randdata; var x; run;
According to statistical theory, the mean and variance of this sample should be approximately 0.5 and 0.083, respectively. Compare these to the results from PROC UNIVARIATE.
Univariate Procedure
Variable=X
Moments
N 100 Sum Wgts 100
Mean 0.471139 Sum 47.11391
Std Dev 0.281132 Variance 0.079035
Skewness 0.135847 Kurtosis -1.03805
USS 30.0217 CSS 7.824489
CV 59.67071 Std Mean 0.028113
The stem-leaf and box plots also indicate that the numbers are uniformly distributed.
Stem Leaf # Boxplot
9 789 3 |
9 113444 6 |
8 67789 5 |
8 23 2 |
7 89 2 |
7 011234 6 |
6 566777999 9 +-----+
6 3 1 | |
5 578 3 | |
5 111112234 9 | |
4 5678 4 | + |
4 00023333 8 *-----*
3 55667789 8 | |
3 022344 6 | |
2 677 3 +-----+
2 0023 4 |
1 5679 4 |
1 0133 4 |
0 5556677999 10 |
0 234 3 |
----+----+----+----+
Multiply Stem.Leaf by 10**-1
So, we can generate random numbers - so what? This actually has several practical applications.
For an example on how random number generation can be used to approximate the answer to a complicated problem, consider the following. The score of a basketball game is Tigers 67, Bears 66 with seconds to play. John, one of the Bears' players, is fouled just before the buzzer and is awarded two free throws. From past experience, we know that John has a 70% chance of making the first free throw. If he makes the shot, he gets some confidence, and his chance of making the second shot is 80%. However, if he misses the first shot, then he gets discouraged and makes the second shot only 40% of the time. What is the probability that John misses both shots, letting the Tigers win?
We know that John makes the first shot 70% of the time. Likewise, a random uniform number is between (0 and .70) 70% of the time, so we can use this to decide whether John makes the first shot, as follows:
data basket; shot_1=ranuni(472321); if shot_1<=.70 then result_1='Good'; else result_1='Miss';
Then, we can decide what happens on the second shot.
shot_2=ranuni(493929); if (result_1='Good' and shot_2<=.80) or (result_1='Miss' and shot_2<=.40) then result_2='Good'; else result_2='Miss'; if result_1='Miss' and result_2='Miss' then missboth='Yes'; else missboth='No ';
The dataset BASKET looks like this:
OBS SHOT_1 RESULT_1 SHOT_2 RESULT_2 MISSBOTH 1 0.70594 Miss 0.27227 Good No
In this case, John missed the first shot but made the second shot, sending the game into overtime. Of course, to estimate the probability of missing both shots, we should do this a large number of times. Use a loop, as shown below.
data basket; do w=1 to 10000; shot_1=ranuni(472321); if shot_1<=.70 then result_1='Good'; else result_1='Miss'; shot_2=ranuni(493929); if (result_1='Good' and shot_2<=.80) or (result_1='Miss' and shot_2<=.40) then result_2='Good'; else result_2='Miss'; if result_1='Miss' and result_2='Miss' then missboth='Yes'; else missboth='No '; output; end; proc freq data=basket; tables missboth; run;
The following frequency table is produced.
Cumulative Cumulative MISSBOTH Frequency Percent Frequency Percent ------------------------------------------------------ No 8243 82.4 8243 82.4 Yes 1757 17.6 10000 100.0
Thus, we estimate that John would miss both shots 17.57% of the time. The true probability, based on statistical theory, is 18%.
As implied above, SAS can generate random normally-distributed numbers from random uniform numbers, but this would require an awkward transformation. The function RANNOR provides a random observation from a normal distribution with mean 0 and standard deviation 1. To create a random observation from a normal distribution with mean m and standard deviation s, multiply the result from the RANNOR function by s, then add m. The following example creates 50 observations from a normal distribution with mean 74 and standard deviation 6.
data normdata; do k=1 to 50; x=74 + 6*rannor(92641); output; end;
SAS also has automatic random number functions for the binomial, Cauchy, exponential, gamma, Poisson, and triangular distributions. However, SAS uses a random uniform number as the basis for all of these calculations.
When designing an experiment, it is proper to randomly assign treatments to subjects or experimental units, rather than following some scheme that you devise. SAS has a facility for scrambling integers called PROC PLAN; the results can be used for random treatment assignment.
Suppose that a doctor wants to compare two treatments (drugs and surgery) for a certain disease. A pilot study, or a small version of the experiment, will be conducted to detect any flaws in the experimental protocol and to see if either of the treatments shows promising results. Ten patients will participate in the study, with five patients receiving each treatment. Patients who volunteer for the study must then be randomly assigned to receive only one of the treatments.
Consider the following SAS program:
proc plan seed=79311; factors a=10 of 10; output out=exptplan; proc print data=exptplan; run;
This produces the following output:
OBS A 1 6 2 8 3 1 4 5 5 3 6 7 7 4 8 2 9 9 10 10
This selects 10 numbers from the integers 1 through 10 (10 of 10) and arranges them in random order. So what? Look at the rest of the program.
data patients; input initials $ @@; datalines; DBS BJH SJB MAH GNJ JNK FGM JAC KWP KCL ; data all; merge patients exptplan; if a<=5 then treatmnt='Drug'; else treatmnt='Surg'; proc print data=all; run;
This produces the following output:
OBS INITIALS A TREATMNT 1 DBS 6 Surg 2 BJH 8 Surg 3 SJB 1 Drug 4 MAH 5 Drug 5 GNJ 3 Drug 6 JNK 7 Surg 7 FGM 4 Drug 8 JAC 2 Drug 9 KWP 9 Surg 10 KCL 10 Surg
We now have a list of patients and their randomized treatments. The randomization occurred in such a way that we could not predict what treatment each person would receive, and every patient was equally likely to receive drugs or surgery.
You can also choose some, but not all, of the integers in a certain range. The statements below chooses 10 numbers at random from 1, 2, 3, ..., 20. In this case, SAS chose (2 6 16 13 3 7 11 10 12 17).
proc plan seed=17377; factors a=10 of 20; output out=exptplan;
You can ask for several factors and specify whether or not they should be randomized. For example, suppose that you want to conduct a randomized block experiment. There are four textbooks which could be used for a particular subject, and you want to see which books help the students to have higher test scores. You decide to randomly assign one of the books to each of four sections of the class in the fall semester. However, you realize that the composition of the class changes from one semester to the next, so you decide to use three semesters (fall, spring, and summer) as blocks. The semesters are ordered; we can't randomly assign spring to come after summer, for example. However, within each semester, we can randomize the textbooks to the four sections of the class. Consider the following SAS program.
proc plan seed=27079; factors semester=3 ordered book=4 of 4; output out=bookplan; proc print data=bookplan; run;
The ORDERED option tells SAS that we want the numbers 1, 2, and 3 to appear in order as levels of the variable SEMESTER. Within each of levels 1, 2, and 3 of SEMESTER, we randomly arrange the numbers 1 through 4 to represent the books. The dataset BOOKPLAN looks like this:
OBS SEMESTER BOOK 1 1 3 2 1 2 3 1 4 4 1 1 5 2 1 6 2 2 7 2 3 8 2 4 9 3 2 10 3 3 11 3 1 12 3 4
If we had decided in advance that the plan would be applied to the sections in sequential order, then the first section in Semester 1 (Fall) would receive Book 3; the second section, Book 2; and so forth. The following lines of SAS code show how to print a nice version of the book assignments.
proc format; value semfmt 1='Fall' 2='Spring' 3='Summer'; value bookfmt 1='Theory' 2='Methods' 3='Applications' 4='Principles'; data bookplan; set bookplan; by semester; retain section; if first.semester=1 then section=0; section=section+1; proc print data=bookplan noobs; var semester section book; format semester semfmt. book bookfmt.; run;
SAS produces the following output.
SEMESTER SECTION BOOK Fall 1 Applications Fall 2 Methods Fall 3 Principles Fall 4 Theory Spring 1 Theory Spring 2 Methods Spring 3 Applications Spring 4 Principles Summer 1 Methods Summer 2 Applications Summer 3 Theory Summer 4 Principles