Lesson 14: SAS IML

SAS Manual: Depertment of Statistics Webpage (www.stat.ufl.edu) => Computing Environment => Online Software Documentation => SAS manuals (UF viewing only) =>SAS Procedures Guide => SAS/IML User's Guide.

SAS IML is a general computer language. Unlike data based SAS procedures, IML can be used to write nondata oriented program. For example, it is not easy to use SAS to write a program to find the roots for a cubic equation by numerical method. But using SAS IML, this program can be write similarly step by step as in Fortan, Pascal or C++. The advantage of using SAS IML are:

  1. It can handle matrices almost as simple as real numbers. For example B=INV(A) means B (matrix) is the inverse of matrix A.
  2. It can be connected to SAS. Thus all the SAS PROC can be considered as subroutines for IML.
  3. It is extremely convenient for statistical analysis, because it has functions such as binomial, z, t, chi-square, F, noncentral-t, noncentral-chi-square and noncentral-F. These functions are not easy to find in other software.

Some commonly used IML Matrix operations

In the following descriptions, capital letters are matrices or vectors. Small letters are scalars. (This is only for descriptive convenience. SAS IML, same as SAS, does not distinguish capital and small letters.) Obviously, all the matrix operation can be used for scalars, because a scalar is a 1x1 matrix. (See: SAS/IML User's Guide => Language Reference => Operators. )
Operation              Symbol  Example

addition                 +     A+B
subtraction              -     A-B         
multiplication           *     A*B
Kronecker product           @     A@B (see explanation below)
matrix power             **    A**3=A*A*A
transpose                `     A` Careful about the direction of `
maximum                  <>    A<>B (see explanation below)
minimum                  ><    A>< B (see explanation below)
element multiplication   #     A#b
element division         /     A/b
element power            ##    A##2 (NOT A*A)

horizontal               ||    A||B = [A B]
concatenation             
vertical                 //    A//B = [A]
concatenation                         [B]

and (logical)            &     IF (a>0) & (a<1)
or  (logical)            |     IF (a>0) | (a<1) 
not (logical)            ^     IF a^=25
less than (or equal to)  < (<=)
greater than (or eq.to)  > (>=)

DO .. TO ;...; END;       DD i=1 TO m; ... ;END;
DO WHILE(...); ...;END;   DO WHILE(i<35);...;END;
DO UNTIL(...); ...;END;   DO UNTIL(i=k);...;END;

IF ... THEN...;ELSE...;   IF a>0 THEN B=A**2;ELSE B=A;
IF ... THEN DO;...;END;   IF a>0 THEN DO; B=A**2;
                             a=a+1;END;

GOTO label                IF X>Y THEN GOTO SKIP;....;
                          SKIP: PRINT 'Why X>Y?'; (Software engineers 
                                discourage the use of this operation)

STOP                      Stops the IML program.
RETURN                    Return to the calling module

Though most the operation have the same definition as those in linear algebra, some of them need more explanation.
  1. Element operations mean the operation is on each element of one matrix. Thus B=A##2 means bij=aij2. It is different from C=A**2, where cij=sum over k (aikakj).
  2. Maximum <> (or minimum ><) means the maximum of the two matrix elementwise. Thus the two matrices must have the same dimension.
  3. The Kronecker product A@B gives a11B a12B ... in the first upper layer. The second layer is a21B a22B ... etc.
Here are some of the basic matrix functions with the structure: function name ( arguments). (See: SAS/IML User's Guide => Language Reference => Statements, Functions, and Subroutines. )
Function            Symbol or Example

determinant              X=DET(A)

diagonal                 X=DIA(A)

eigenvalues              M=EIGVAL(A)

generalized inverse      G=GINV(A)

identity matrix          I=I(n); nxn identity matrix

inverse matrix           B=INV(A)

matrix of identical      B=J(r,c,x); rxc matrix 
values                   with identical entry x

number of rows           NR=NROW(A)

number of columns        NC=NCOL(A)

. (missing value)        x=. means the output of x is .
                         J(2,3,.) is a 2x3 matrix with . values

rank values              R=RANK(A); random assignment for ties,
                           the smallest number has rank 1. 

rank with ties averaged  R=RANKTIE(A); average score for ties

sum                      s=SUM(A); sum of all the entries

sum of squares           ss=SSQ(A); sum of the square of
                                    all the entries

trace                    t=TRACE(A)

eigen values and        call EIGEN(M,E,A); M is an diagonal
vectors of a            matrix with all the eigenvalues in E, i.e.,
symmetric matrix        AE=EM.

Here are some scalar functions:
Function                 Usage and Example

absolute value           ABS(x)
 
square root              SQRT(x)

integer truncation        INT(x); INT(5.91)=5

modulo                   MOD(x,m); MOD(56, 10)=6

sine, cosine, tangent    SIN(x), COS(x), TAN(x)

arc-sine cosine tangent  ARCOS(x), ARSIN(x), ATAN(x) 

Exponential               EXP(x)

natural logarithm        LOG(x)

gamma function           GAMMA(x),e.g, GAMMA(6)=120, GAMMA(0.5)=1.772

normal distribution      PROBNORM(x); PROBNORM(1.96)=0.975

chi-square distribution  PROBCHI(x,df,nc); nc=noncentr. parameter
                         PROBCHI(11.1,5,0)=0.95 

t-distribution           PROBT(x,df,nc); PROBT(1.855,9,0)=0.95

F-distribution           PROBF(x,ndf,ddf,nc)
                         PROBF(3.63,4,9,0)=0.95

inverse normal           PROBIT(x);PROBIT(0.975)=1.96

uniform random number   a=UNIFORM(seed), seed must be an integer
                          (Use 3 to 6 digits). Seed will be 
                          redefined by the function automatically.

normal N(0, 1) random   a=normal(seed)
deviate

The following example show how basic matrice operations are handled by IML. Special attention should be paid to the way that a matrix or a vector is defined and the PRINT statements which shows how to write output. SAS IML error information is far from satisfactory. Error message may not appear for undefined terms due to misspelling. Thus, it is very helpful to print some items in the intermediate steps during program writing.

Example 1 (input)
PROC IML;

/* DO loop and IF */

   sum=0; m=5; DO i=1 TO m; 
   IF INT(i/2)#2 ^= i THEN sum=sum+i##2; END; 
   PRINT 'Output for 1+9+25', sum;                 

/* Define vectors and their products */

   v={1, 2, 3, 4}; h={1 2 0 1}; p=v*h; m=h*v;      
   PRINT v[FORMAT=3.1] h[FORMAT=3.1] p[FORMAT=3.0] 
         m[FORMAT=3.0];                             

/* Matrice inverse and product */

   a=J(4,4,1.0);a[2,2]=2;a[3,3]=4;a[4,4]=3;
   b=INV(a); c=a*b; d={1.2 2, 3 4, 5 6.7};
   PRINT a[FORMAT=3.1] b[FORMAT=5.2] c[FORMAT=4.1] d[FORMAT=3.1];

/* Note: All the numbers can be defined symbols. For example
   a=J(r1,c1,const), when r1, c1, and const are well defined. */
                                                   
/* Combine matrices */

   av=a||v;bh=b//h;
   PRINT av[FORMAT=3.1] bh[FORMAT=5.2];             

/* Submatrices  */

   av1=av[1:3,2:4]; av2=av[1:3,]; av3=av[,2:4]; 
   av4=av[{1 3},{2 4 5}];
   PRINT av1[FORMAT=3.1] av2[FORMAT=3.1] av3[FORMAT=3.1] 
   av4[FORMAT=3.1];

/* Note: All the numbers can be defined symbols. For example
   av4=av[{r1 r2},{col2 col4 col3}], when r2, r2, etc. are well defined. */
                                                    
/* Some matrix functions */

   nc=NCOL(av); nr=NROW(av); sumd=SUM(d); 
   meand=sumd/(NCOL(d)#NROW(d)); ssd=SSQ(a);
   PRINT nc nr sumd meand ssd;                    

<\PRE>

Example 1 (output)

Output for 1+9+25 SUM 35 V H P M 1.0 1.0 2.0 0.0 1.0 1 2 0 1 9 2.0 2 4 0 2 3.0 3 6 0 3 4.0 4 8 0 4 A B C D 1.0 1.0 1.0 1.0 2.83 -1.00 -0.33 -0.50 1.0 0.0 0.0 0.0 1.2 2.0 1.0 2.0 1.0 1.0 -1.00 1.00 0.00 0.00 -0.0 1.0 0.0 0.0 3.0 4.0 1.0 1.0 4.0 1.0 -0.33 0.00 0.33 0.00 -0.0 0.0 1.0 0.0 5.0 6.7 1.0 1.0 1.0 3.0 -0.50 0.00 0.00 0.50 -0.0 0.0 0.0 1.0 AV BH 1.0 1.0 1.0 1.0 1.0 2.83 -1.00 -0.33 -0.50 1.0 2.0 1.0 1.0 2.0 -1.00 1.00 0.00 0.00 1.0 1.0 4.0 1.0 3.0 -0.33 0.00 0.33 0.00 1.0 1.0 1.0 3.0 4.0 -0.50 0.00 0.00 0.50 1.00 2.00 0.00 1.00 AV1 AV2 AV3 AV4 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 1.0 1.0 2.0 2.0 1.0 1.0 1.0 1.0 3.0 1.0 4.0 1.0 1.0 1.0 4.0 1.0 3.0 1.0 4.0 1.0 1.0 1.0 3.0 NC NR SUMD MEAND SSD 5 4 21.9 3.65 42

IML handle subroutine with a

START name;

and end with

FINISH;

When the subroutine is called, you write

RUN name of the subroutine;

Here is a simple example.

Example 2: A simple function F(x)=ax2+b


PROC IML;
START F(x,a,b,y);
Y=a#x#x+b;
FINISH;
RUN F(2,2,5,y1);x=5; RUN F(x,2,-20,y2);
PRINT y1 y2;
RUN;

The output is 13, 30. The arguments of a function can be matrices.

Example 3: Regression analysis for Y=Xb+e


PROC IML;  

/* The regression subroutine    */

START reg(X,Y,B,mse);
  n=NROW(X); k=NCOL(X);
  XPX=X`*X;
  XPY=X`*Y;
  IXPX=INV(XPX);
  B=IXPX`*XPY;

  YHAT=X*B;
  e=Y-YHAT;

  mse=SSQ(e)/(n-k);

FINISH;

/* The main program  */

  X={39, 43, 21, 64, 57, 47, 28, 75, 34, 52};
  C=J(10,1,1);
  T=C||X;
  Y={65, 78, 52, 82, 92, 89, 73, 98, 56, 75};
  RUN REG(T,Y,BETA,mse);

  PRINT BETA[format=6.2] mse[format=6.2];

RUN;

The output from this program is
       BETA       MSE
      40.78     75.75
       0.76  

This result is confirmed by


DATA NEW;INPUT X Y @@;
cards;
39 65 43 78 21 52 64 82 57 92 47 89 28 73 75 98 
34 56 52 75
PROC GLM;MODEL Y=X;

As said in the beginning, IML and SAS PROC can be linked. Here is an example on how this can be done.

Example 4 (input)

     

  %MACRO imlsub(in=,y=,out=);

  PROC IML;  

/* Read SAS data as an IML matrix rm 
   Each row of rm contains three entries [&y p80 p90] */

  USE ∈READ ALL VAR{&y p80 p90} INTO rm; 

  n=NROW(rm); d=J(n,1,0); c=d;
  DO I=1 TO n;
    d[i]=rm[i,3]/rm[i,1];
    c[i]=rm[i,3]-rm[i,2];
  END;

/* Create a new matrix with input and output d and c */

  x=rm||d||c;

  CREATE &out FROM x; APPEND FROM x;

  %MEND imlsub;

/* The main SAS data program */

  DATA n1;INPUT city $ state $ area p80 p90 @@;cards;

  Akron OH 62.1 237 223  Atlanta GA 131 425 394  
  Dallas TX 333 905 1000 Denver CO 107 493 467  
  NY NY 302 7072 7322    Memphis TN 264 646 610 
  Tampa FL 58.7 82 125  Raleigh NC 83.4 150 208 
  ;
/* Call the IML subroutine */

  %imlsub(in=n1,y=area,out=n2);

/* The IML output in main program. */

  PROC PRINT DATA=n2;                                    
  TITLE 'Pure output from IML, numerical values only.'; 

  DATA n3;SET n1;DROP area p80 p90;

/* The IML output and SAS program joined. */

  DATA n4; MERGE n3 n2;
  RENAME COL1=area COL4=density;
  PROC PRINT;                                    
  TITLE 'Output from IML merged with original data.';

RUN;

Example 4 (output)


     Pure output from IML, numerical values only.             
                                        
 OBS     COL1    COL2    COL3      COL4     COL5

  1      62.1     237     223     3.5910     -14
  2     131.0     425     394     3.0076     -31
  3     333.0     905    1000     3.0030      95
  4     107.0     493     467     4.3645     -26
  5     302.0    7072    7322    24.2450     250
  6     264.0     646     610     2.3106     -36
  7      58.7      82     125     2.1295      43
  8      83.4     150     208     2.4940      58

     Output from IML merged with original data.              
                                        
 CITY       STATE     AREA    COL2    COL3    DENSITY    COL5

 Akron       OH       62.1     237     223     3.5910     -14
 Atlanta     GA      131.0     425     394     3.0076     -31
 Dallas      TX      333.0     905    1000     3.0030      95
 Denver      CO      107.0     493     467     4.3645     -26
 NY          NY      302.0    7072    7322    24.2450     250
 Memphis     TN      264.0     646     610     2.3106     -36
 Tampa       FL       58.7      82     125     2.1295      43
 Raleigh     NC       83.4     150     208     2.4940      58

If the rename instruction is done in the macro, then the output names will be not COL# but the new names. In this example, we can replace the 'CREATE &out FROM x; APPEND FROM x;" by

CREATE preout FROM x; APPEND FROM x;
DATA &out;SET preout;
RENAME COL1=area COL4=density;

Special attention should be paid to how the IML data is treated in SAS. They are label with COL#. The following example further illustrate this point. This is a module that can take a random sample from input data set.

Example 5 (input)

/* The macro sample takes a random sample of the input data:

   in      = input data set name
   popsize = input data (population) size
   seed    = input seed, use 3 to 6 digit integer
   sampsize= input required sample size 
   out     = output data set                                  */
            

%MACRO sample(in=,popsize=,seed=,sampsize=,out=);

  %MACRO one(psize=,s=,ssize=,rarray=);

    PROC IML;
    N=&psize;SEED1=&s;sn=&ssize;
    IF sn>n THEN DO; 
       PRINT '*** ERROR: Sample size > population size.';
       STOP;END;
    A=J(N,1,1);
    DO i=1 to N;A[i]=UNIFORM(SEED1);END;
    R=RANK(A);     DO i=1 to N;IF R[i]>sn THEN r[i]=0; ELSE r[i]=1;END;
    CREATE &rarray FROM R;APPEND FROM R;

  %MEND one;
     
  %one(psize=&popsize,s=&seed,ssize=&sampsize,rarray=n1);

  DATA &out; MERGE n1 &in ;
  IF COL1=0 THEN DELETE;DROP COL1;  

%MEND sample;

/* The main program */

DATA cities;INPUT city $ state $ area p80 p90 @@;cards;

Akron OH 62.1 237 223  Atlanta GA 131 425 394  
Dallas TX 333 905 1000 Denver CO 107 493 467  
NY NY 302 7072 7322    Memphis TN 264 646 610 
Tampa FL 58.7 82 125  Raleigh NC 83.4 150 208 
;
   
%sample(in=cities,popsize=8,seed=12345,sampsize=3,out=rcities);
PROC PRINT DATA=rcities;TITLE 'First sample of size 3';

%sample(in=cities,popsize=8,seed=54321,sampsize=3,out=rcities);
PROC PRINT DATA=rcities;TITLE 'Second sample of size 3';

%sample(in=cities,popsize=8,seed=54321,sampsize=10,out=rcities);
PROC PRINT DATA=rcities;TITLE 'Third sample of size 10';

RUN;

Note that in the output, the STOP in the IML stops only the IML program. The main program still outputs the previous file after the IML noticed the input error and stopped.

Example 5 (output)

              First sample of size 3                        
         
OBS    CITY      STATE     AREA     P80     P90

 1     Denver     CO      107.0     493     467
 2     NY         NY      302.0    7072    7322
 3     Tampa      FL       58.7      82     125
                      
              Second sample of size 3                        
                                         
OBS     CITY      STATE     AREA    P80    P90

 1     Memphis     TN      264.0    646    610
 2     Tampa       FL       58.7     82    125
 3     Raleigh     NC       83.4    150    208

        *** ERROR: Sample size > population size.

               Third sample of size 10                       
                                          
 OBS     CITY      STATE     AREA    P80    P90

  1     Memphis     TN      264.0    646    610
  2     Tampa       FL       58.7     82    125
  3     Raleigh     NC       83.4    150    208
 

Homework problems for this lesson

Return to STA 5106 home page