SAS Manual: Depertment of Statistics Webpage (www.stat.ufl.edu) => Computing Environment => Online Software Documentation => SAS manuals (UF viewing only) =>SAS Procedures Guide => SAS/IML User's Guide.
SAS IML is a general computer language. Unlike data based SAS procedures, IML can be used to write nondata oriented program. For example, it is not easy to use SAS to write a program to find the roots for a cubic equation by numerical method. But using SAS IML, this program can be write similarly step by step as in Fortan, Pascal or C++. The advantage of using SAS IML are:
Operation Symbol Example
addition + A+B
subtraction - A-B
multiplication * A*B
Kronecker product @ A@B (see explanation below)
matrix power ** A**3=A*A*A
transpose ` A` Careful about the direction of `
maximum <> A<>B (see explanation below)
minimum >< A>< B (see explanation below)
element multiplication # A#b
element division / A/b
element power ## A##2 (NOT A*A)
horizontal || A||B = [A B]
concatenation
vertical // A//B = [A]
concatenation [B]
and (logical) & IF (a>0) & (a<1)
or (logical) | IF (a>0) | (a<1)
not (logical) ^ IF a^=25
less than (or equal to) < (<=)
greater than (or eq.to) > (>=)
DO .. TO ;...; END; DD i=1 TO m; ... ;END;
DO WHILE(...); ...;END; DO WHILE(i<35);...;END;
DO UNTIL(...); ...;END; DO UNTIL(i=k);...;END;
IF ... THEN...;ELSE...; IF a>0 THEN B=A**2;ELSE B=A;
IF ... THEN DO;...;END; IF a>0 THEN DO; B=A**2;
a=a+1;END;
GOTO label IF X>Y THEN GOTO SKIP;....;
SKIP: PRINT 'Why X>Y?'; (Software engineers
discourage the use of this operation)
STOP Stops the IML program.
RETURN Return to the calling module
Though most the operation have the same definition as those in linear algebra, some of them need more explanation.
Function Symbol or Example
determinant X=DET(A)
diagonal X=DIA(A)
eigenvalues M=EIGVAL(A)
generalized inverse G=GINV(A)
identity matrix I=I(n); nxn identity matrix
inverse matrix B=INV(A)
matrix of identical B=J(r,c,x); rxc matrix
values with identical entry x
number of rows NR=NROW(A)
number of columns NC=NCOL(A)
. (missing value) x=. means the output of x is .
J(2,3,.) is a 2x3 matrix with . values
rank values R=RANK(A); random assignment for ties,
the smallest number has rank 1.
rank with ties averaged R=RANKTIE(A); average score for ties
sum s=SUM(A); sum of all the entries
sum of squares ss=SSQ(A); sum of the square of
all the entries
trace t=TRACE(A)
eigen values and call EIGEN(M,E,A); M is an diagonal
vectors of a matrix with all the eigenvalues in E, i.e.,
symmetric matrix AE=EM.
Here are some scalar functions:
Function Usage and Example
absolute value ABS(x)
square root SQRT(x)
integer truncation INT(x); INT(5.91)=5
modulo MOD(x,m); MOD(56, 10)=6
sine, cosine, tangent SIN(x), COS(x), TAN(x)
arc-sine cosine tangent ARCOS(x), ARSIN(x), ATAN(x)
Exponential EXP(x)
natural logarithm LOG(x)
gamma function GAMMA(x),e.g, GAMMA(6)=120, GAMMA(0.5)=1.772
normal distribution PROBNORM(x); PROBNORM(1.96)=0.975
chi-square distribution PROBCHI(x,df,nc); nc=noncentr. parameter
PROBCHI(11.1,5,0)=0.95
t-distribution PROBT(x,df,nc); PROBT(1.855,9,0)=0.95
F-distribution PROBF(x,ndf,ddf,nc)
PROBF(3.63,4,9,0)=0.95
inverse normal PROBIT(x);PROBIT(0.975)=1.96
uniform random number a=UNIFORM(seed), seed must be an integer
(Use 3 to 6 digits). Seed will be
redefined by the function automatically.
normal N(0, 1) random a=normal(seed)
deviate
The following example show how basic matrice operations are handled by IML. Special attention should be paid to the way that a matrix or a vector is defined and the PRINT statements which shows how to write output. SAS IML error information is far from satisfactory. Error message may not appear for undefined terms due to misspelling. Thus, it is very helpful to print some items in the intermediate steps during program writing.
PROC IML;
/* DO loop and IF */
sum=0; m=5; DO i=1 TO m;
IF INT(i/2)#2 ^= i THEN sum=sum+i##2; END;
PRINT 'Output for 1+9+25', sum;
/* Define vectors and their products */
v={1, 2, 3, 4}; h={1 2 0 1}; p=v*h; m=h*v;
PRINT v[FORMAT=3.1] h[FORMAT=3.1] p[FORMAT=3.0]
m[FORMAT=3.0];
/* Matrice inverse and product */
a=J(4,4,1.0);a[2,2]=2;a[3,3]=4;a[4,4]=3;
b=INV(a); c=a*b; d={1.2 2, 3 4, 5 6.7};
PRINT a[FORMAT=3.1] b[FORMAT=5.2] c[FORMAT=4.1] d[FORMAT=3.1];
/* Note: All the numbers can be defined symbols. For example
a=J(r1,c1,const), when r1, c1, and const are well defined. */
/* Combine matrices */
av=a||v;bh=b//h;
PRINT av[FORMAT=3.1] bh[FORMAT=5.2];
/* Submatrices */
av1=av[1:3,2:4]; av2=av[1:3,]; av3=av[,2:4];
av4=av[{1 3},{2 4 5}];
PRINT av1[FORMAT=3.1] av2[FORMAT=3.1] av3[FORMAT=3.1]
av4[FORMAT=3.1];
/* Note: All the numbers can be defined symbols. For example
av4=av[{r1 r2},{col2 col4 col3}], when r2, r2, etc. are well defined. */
/* Some matrix functions */
nc=NCOL(av); nr=NROW(av); sumd=SUM(d);
meand=sumd/(NCOL(d)#NROW(d)); ssd=SSQ(a);
PRINT nc nr sumd meand ssd;
<\PRE>
Example 1 (output)
Output for 1+9+25
SUM
35
V H P M
1.0 1.0 2.0 0.0 1.0 1 2 0 1 9
2.0 2 4 0 2
3.0 3 6 0 3
4.0 4 8 0 4
A B C D
1.0 1.0 1.0 1.0 2.83 -1.00 -0.33 -0.50 1.0 0.0 0.0 0.0 1.2 2.0
1.0 2.0 1.0 1.0 -1.00 1.00 0.00 0.00 -0.0 1.0 0.0 0.0 3.0 4.0
1.0 1.0 4.0 1.0 -0.33 0.00 0.33 0.00 -0.0 0.0 1.0 0.0 5.0 6.7
1.0 1.0 1.0 3.0 -0.50 0.00 0.00 0.50 -0.0 0.0 0.0 1.0
AV BH
1.0 1.0 1.0 1.0 1.0 2.83 -1.00 -0.33 -0.50
1.0 2.0 1.0 1.0 2.0 -1.00 1.00 0.00 0.00
1.0 1.0 4.0 1.0 3.0 -0.33 0.00 0.33 0.00
1.0 1.0 1.0 3.0 4.0 -0.50 0.00 0.00 0.50
1.00 2.00 0.00 1.00
AV1 AV2 AV3 AV4
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2.0 1.0 1.0 1.0 2.0 1.0 1.0 2.0 2.0 1.0 1.0 1.0 1.0 3.0
1.0 4.0 1.0 1.0 1.0 4.0 1.0 3.0 1.0 4.0 1.0
1.0 1.0 3.0
NC NR SUMD MEAND SSD
5 4 21.9 3.65 42
IML handle subroutine with a
START name;
and end with
FINISH;
When the subroutine is called, you write
RUN name of the subroutine;
PROC IML; START F(x,a,b,y); Y=a#x#x+b; FINISH; RUN F(2,2,5,y1);x=5; RUN F(x,2,-20,y2); PRINT y1 y2; RUN;The output is 13, 30. The arguments of a function can be matrices.
PROC IML;
/* The regression subroutine */
START reg(X,Y,B,mse);
n=NROW(X); k=NCOL(X);
XPX=X`*X;
XPY=X`*Y;
IXPX=INV(XPX);
B=IXPX`*XPY;
YHAT=X*B;
e=Y-YHAT;
mse=SSQ(e)/(n-k);
FINISH;
/* The main program */
X={39, 43, 21, 64, 57, 47, 28, 75, 34, 52};
C=J(10,1,1);
T=C||X;
Y={65, 78, 52, 82, 92, 89, 73, 98, 56, 75};
RUN REG(T,Y,BETA,mse);
PRINT BETA[format=6.2] mse[format=6.2];
RUN;
The output from this program is
BETA MSE
40.78 75.75
0.76
This result is confirmed by
DATA NEW;INPUT X Y @@; cards; 39 65 43 78 21 52 64 82 57 92 47 89 28 73 75 98 34 56 52 75 PROC GLM;MODEL Y=X;As said in the beginning, IML and SAS PROC can be linked. Here is an example on how this can be done.
%MACRO imlsub(in=,y=,out=);
PROC IML;
/* Read SAS data as an IML matrix rm
Each row of rm contains three entries [&y p80 p90] */
USE ∈READ ALL VAR{&y p80 p90} INTO rm;
n=NROW(rm); d=J(n,1,0); c=d;
DO I=1 TO n;
d[i]=rm[i,3]/rm[i,1];
c[i]=rm[i,3]-rm[i,2];
END;
/* Create a new matrix with input and output d and c */
x=rm||d||c;
CREATE &out FROM x; APPEND FROM x;
%MEND imlsub;
/* The main SAS data program */
DATA n1;INPUT city $ state $ area p80 p90 @@;cards;
Akron OH 62.1 237 223 Atlanta GA 131 425 394
Dallas TX 333 905 1000 Denver CO 107 493 467
NY NY 302 7072 7322 Memphis TN 264 646 610
Tampa FL 58.7 82 125 Raleigh NC 83.4 150 208
;
/* Call the IML subroutine */
%imlsub(in=n1,y=area,out=n2);
/* The IML output in main program. */
PROC PRINT DATA=n2;
TITLE 'Pure output from IML, numerical values only.';
DATA n3;SET n1;DROP area p80 p90;
/* The IML output and SAS program joined. */
DATA n4; MERGE n3 n2;
RENAME COL1=area COL4=density;
PROC PRINT;
TITLE 'Output from IML merged with original data.';
RUN;
Pure output from IML, numerical values only.
OBS COL1 COL2 COL3 COL4 COL5
1 62.1 237 223 3.5910 -14
2 131.0 425 394 3.0076 -31
3 333.0 905 1000 3.0030 95
4 107.0 493 467 4.3645 -26
5 302.0 7072 7322 24.2450 250
6 264.0 646 610 2.3106 -36
7 58.7 82 125 2.1295 43
8 83.4 150 208 2.4940 58
Output from IML merged with original data.
CITY STATE AREA COL2 COL3 DENSITY COL5
Akron OH 62.1 237 223 3.5910 -14
Atlanta GA 131.0 425 394 3.0076 -31
Dallas TX 333.0 905 1000 3.0030 95
Denver CO 107.0 493 467 4.3645 -26
NY NY 302.0 7072 7322 24.2450 250
Memphis TN 264.0 646 610 2.3106 -36
Tampa FL 58.7 82 125 2.1295 43
Raleigh NC 83.4 150 208 2.4940 58
If the rename instruction is done in the macro, then the output names will be not COL# but the new names. In this example, we can replace the 'CREATE &out FROM x; APPEND FROM x;" by
CREATE preout FROM x; APPEND FROM x; DATA &out;SET preout; RENAME COL1=area COL4=density;Special attention should be paid to how the IML data is treated in SAS. They are label with COL#. The following example further illustrate this point. This is a module that can take a random sample from input data set.
/* The macro sample takes a random sample of the input data:
in = input data set name
popsize = input data (population) size
seed = input seed, use 3 to 6 digit integer
sampsize= input required sample size
out = output data set */
%MACRO sample(in=,popsize=,seed=,sampsize=,out=);
%MACRO one(psize=,s=,ssize=,rarray=);
PROC IML;
N=&psize;SEED1=&s;sn=&ssize;
IF sn>n THEN DO;
PRINT '*** ERROR: Sample size > population size.';
STOP;END;
A=J(N,1,1);
DO i=1 to N;A[i]=UNIFORM(SEED1);END;
R=RANK(A); DO i=1 to N;IF R[i]>sn THEN r[i]=0; ELSE r[i]=1;END;
CREATE &rarray FROM R;APPEND FROM R;
%MEND one;
%one(psize=&popsize,s=&seed,ssize=&sampsize,rarray=n1);
DATA &out; MERGE n1 &in ;
IF COL1=0 THEN DELETE;DROP COL1;
%MEND sample;
/* The main program */
DATA cities;INPUT city $ state $ area p80 p90 @@;cards;
Akron OH 62.1 237 223 Atlanta GA 131 425 394
Dallas TX 333 905 1000 Denver CO 107 493 467
NY NY 302 7072 7322 Memphis TN 264 646 610
Tampa FL 58.7 82 125 Raleigh NC 83.4 150 208
;
%sample(in=cities,popsize=8,seed=12345,sampsize=3,out=rcities);
PROC PRINT DATA=rcities;TITLE 'First sample of size 3';
%sample(in=cities,popsize=8,seed=54321,sampsize=3,out=rcities);
PROC PRINT DATA=rcities;TITLE 'Second sample of size 3';
%sample(in=cities,popsize=8,seed=54321,sampsize=10,out=rcities);
PROC PRINT DATA=rcities;TITLE 'Third sample of size 10';
RUN;
Note that in the output, the STOP in the IML stops only the IML program. The main program still outputs the previous file after the IML noticed the input error and stopped.
First sample of size 3
OBS CITY STATE AREA P80 P90
1 Denver CO 107.0 493 467
2 NY NY 302.0 7072 7322
3 Tampa FL 58.7 82 125
Second sample of size 3
OBS CITY STATE AREA P80 P90
1 Memphis TN 264.0 646 610
2 Tampa FL 58.7 82 125
3 Raleigh NC 83.4 150 208
*** ERROR: Sample size > population size.
Third sample of size 10
OBS CITY STATE AREA P80 P90
1 Memphis TN 264.0 646 610
2 Tampa FL 58.7 82 125
3 Raleigh NC 83.4 150 208