Lesson +: Random Numbers

This topic is not covered in The Little SAS Book.

Random uniform numbers

SAS has many mathematical functions, including log, sine, square root, etc. SAS also has a function to generate random numbers which are uniformly distributed between 0 and 1. These numbers are not "random" in the sense that SAS pulls them out of a hat; indeed, SAS does use a complicated formula to produce its so-called "random" numbers. Instead, they are "random" in the sense that we cannot anticipate in advance the numbers SAS will choose, and the numbers it chooses look as if they came from a uniform distribution.

SAS uses a number that you provide, called a seed, to start its random number generation for the first random number you ask to be generated. The first random number then becomes the seed for the second; the second, for the third, etc. If you use the seed number 0, then SAS will begin its random number generation based on the time of day, according to the computer, at which you issued the command. You can also use positive integers as seed values.

The following program produces 100 random uniform numbers between 0 and 1 and provides the univariate statistics for the random sample.

data randdata;
 do i=1 to 100;
  x=ranuni(293317);
  output;
 end;

proc univariate plot data=randdata;
 var x;

run;

According to statistical theory, the mean and variance of this sample should be approximately 0.5 and 0.083, respectively. Compare these to the results from PROC UNIVARIATE.

Univariate Procedure

Variable=X

                 Moments

 N               100  Sum Wgts        100
 Mean       0.471139  Sum        47.11391
 Std Dev    0.281132  Variance   0.079035
 Skewness   0.135847  Kurtosis   -1.03805
 USS         30.0217  CSS        7.824489
 CV         59.67071  Std Mean   0.028113

The stem-leaf and box plots also indicate that the numbers are uniformly distributed.

   Stem Leaf                     #  Boxplot
      9 789                      3     |
      9 113444                   6     |
      8 67789                    5     |
      8 23                       2     |
      7 89                       2     |
      7 011234                   6     |
      6 566777999                9  +-----+
      6 3                        1  |     |
      5 578                      3  |     |
      5 111112234                9  |     |
      4 5678                     4  |  +  |
      4 00023333                 8  *-----*
      3 55667789                 8  |     |
      3 022344                   6  |     |
      2 677                      3  +-----+
      2 0023                     4     |
      1 5679                     4     |
      1 0133                     4     |
      0 5556677999              10     |
      0 234                      3     |
        ----+----+----+----+
    Multiply Stem.Leaf by 10**-1

So, we can generate random numbers - so what? This actually has several practical applications.

For an example on how random number generation can be used to approximate the answer to a complicated problem, consider the following. The score of a basketball game is Tigers 67, Bears 66 with seconds to play. John, one of the Bears' players, is fouled just before the buzzer and is awarded two free throws. From past experience, we know that John has a 70% chance of making the first free throw. If he makes the shot, he gets some confidence, and his chance of making the second shot is 80%. However, if he misses the first shot, then he gets discouraged and makes the second shot only 40% of the time. What is the probability that John misses both shots, letting the Tigers win?

We know that John makes the first shot 70% of the time. Likewise, a random uniform number is between (0 and .70) 70% of the time, so we can use this to decide whether John makes the first shot, as follows:

data basket;
 shot_1=ranuni(472321);
 if shot_1<=.70 then result_1='Good';
 else result_1='Miss';

Then, we can decide what happens on the second shot.

 shot_2=ranuni(493929);
 if (result_1='Good' and shot_2<=.80) or (result_1='Miss' and
shot_2<=.40) 
   then result_2='Good';
  else result_2='Miss';  
 if result_1='Miss' and result_2='Miss' then missboth='Yes';
  else missboth='No ';

The dataset BASKET looks like this:

OBS     SHOT_1    RESULT_1     SHOT_2    RESULT_2    MISSBOTH
 1     0.70594      Miss      0.27227      Good         No

In this case, John missed the first shot but made the second shot, sending the game into overtime. Of course, to estimate the probability of missing both shots, we should do this a large number of times. Use a loop, as shown below.

data basket;
 do w=1 to 10000;
 shot_1=ranuni(472321);
 if shot_1<=.70 then result_1='Good';
 else result_1='Miss';
 shot_2=ranuni(493929);
 if (result_1='Good' and shot_2<=.80) or (result_1='Miss' and
shot_2<=.40)
   then result_2='Good';
  else result_2='Miss';
 if result_1='Miss' and result_2='Miss' then missboth='Yes';
  else missboth='No ';
 output;
 end;

proc freq data=basket;
 tables missboth;
run;

The following frequency table is produced.

                                Cumulative  Cumulative
MISSBOTH   Frequency   Percent   Frequency    Percent
------------------------------------------------------
No             8243      82.4        8243       82.4
Yes            1757      17.6       10000      100.0

Thus, we estimate that John would miss both shots 17.57% of the time. The true probability, based on statistical theory, is 18%.

Random numbers from other distributions

As implied above, SAS can generate random normally-distributed numbers from random uniform numbers, but this would require an awkward transformation. The function RANNOR provides a random observation from a normal distribution with mean 0 and standard deviation 1. To create a random observation from a normal distribution with mean m and standard deviation s, multiply the result from the RANNOR function by s, then add m. The following example creates 50 observations from a normal distribution with mean 74 and standard deviation 6.

data normdata;
 do k=1 to 50;
 x=74 + 6*rannor(92641);
 output;
 end;

SAS also has automatic random number functions for the binomial, Cauchy, exponential, gamma, Poisson, and triangular distributions. However, SAS uses a random uniform number as the basis for all of these calculations.

PROC PLAN

When designing an experiment, it is proper to randomly assign treatments to subjects or experimental units, rather than following some scheme that you devise. SAS has a facility for scrambling integers called PROC PLAN; the results can be used for random treatment assignment.

Suppose that a doctor wants to compare two treatments (drugs and surgery) for a certain disease. A pilot study, or a small version of the experiment, will be conducted to detect any flaws in the experimental protocol and to see if either of the treatments shows promising results. Ten patients will participate in the study, with five patients receiving each treatment. Patients who volunteer for the study must then be randomly assigned to receive only one of the treatments.

Consider the following SAS program:

proc plan seed=79311;
 factors a=10 of 10;
 output out=exptplan;

proc print data=exptplan;
run;

This produces the following output:

OBS     A

  1     6
  2     8
  3     1
  4     5
  5     3
  6     7
  7     4
  8     2
  9     9
 10    10

This selects 10 numbers from the integers 1 through 10 (10 of 10) and arranges them in random order. So what? Look at the rest of the program.

data patients;
 input initials $ @@;
 datalines;
 DBS BJH SJB MAH GNJ
 JNK FGM JAC KWP KCL
;

data all;
 merge patients exptplan;
 if a<=5 then treatmnt='Drug';
  else treatmnt='Surg';

proc print data=all;
run;

This produces the following output:

OBS    INITIALS     A    TREATMNT

  1      DBS        6      Surg
  2      BJH        8      Surg
  3      SJB        1      Drug
  4      MAH        5      Drug
  5      GNJ        3      Drug
  6      JNK        7      Surg
  7      FGM        4      Drug
  8      JAC        2      Drug
  9      KWP        9      Surg
 10      KCL       10      Surg

We now have a list of patients and their randomized treatments. The randomization occurred in such a way that we could not predict what treatment each person would receive, and every patient was equally likely to receive drugs or surgery.

You can also choose some, but not all, of the integers in a certain range. The statements below chooses 10 numbers at random from 1, 2, 3, ..., 20. In this case, SAS chose (2 6 16 13 3 7 11 10 12 17).

proc plan seed=17377;
 factors a=10 of 20;
 output out=exptplan;

You can ask for several factors and specify whether or not they should be randomized. For example, suppose that you want to conduct a randomized block experiment. There are four textbooks which could be used for a particular subject, and you want to see which books help the students to have higher test scores. You decide to randomly assign one of the books to each of four sections of the class in the fall semester. However, you realize that the composition of the class changes from one semester to the next, so you decide to use three semesters (fall, spring, and summer) as blocks. The semesters are ordered; we can't randomly assign spring to come after summer, for example. However, within each semester, we can randomize the textbooks to the four sections of the class. Consider the following SAS program.

proc plan seed=27079;
 factors semester=3 ordered book=4 of 4;
 output out=bookplan;
proc print data=bookplan;
run;

The ORDERED option tells SAS that we want the numbers 1, 2, and 3 to appear in order as levels of the variable SEMESTER. Within each of levels 1, 2, and 3 of SEMESTER, we randomly arrange the numbers 1 through 4 to represent the books. The dataset BOOKPLAN looks like this:

OBS    SEMESTER    BOOK

  1        1         3
  2        1         2
  3        1         4
  4        1         1
  5        2         1
  6        2         2
  7        2         3
  8        2         4
  9        3         2
 10        3         3
 11        3         1
 12        3         4

If we had decided in advance that the plan would be applied to the sections in sequential order, then the first section in Semester 1 (Fall) would receive Book 3; the second section, Book 2; and so forth. The following lines of SAS code show how to print a nice version of the book assignments.

proc format;
 value semfmt 1='Fall' 2='Spring' 3='Summer';
 value bookfmt 1='Theory' 2='Methods' 3='Applications'
  4='Principles';

data bookplan;
 set bookplan;
 by semester;
 retain section;
 if first.semester=1 then section=0;
 section=section+1;

proc print data=bookplan noobs;
 var semester section book;
 format semester semfmt. book bookfmt.;
run;

SAS produces the following output.

SEMESTER    SECTION    BOOK

 Fall          1       Applications
 Fall          2       Methods
 Fall          3       Principles
 Fall          4       Theory
 Spring        1       Theory
 Spring        2       Methods
 Spring        3       Applications
 Spring        4       Principles
 Summer        1       Methods
 Summer        2       Applications
 Summer        3       Theory
 Summer        4       Principles


Homework problems for this lesson

Return to STA 5106 home page