Lesson 4: Nice Programs & Nice Output

Writing nice programs

The SAS language is strict about its commands, but it is more lenient about the ways in which those commands are presented in the program. For example, SAS does not distinguish between uppercase and lowercase letters in program statements. (However, in the data, John and JOHN are considered as two different names.) Also, you can insert blank lines and spaces into a program; these can be used to write the program in an outline format that is easy to follow. Likewise, you can write several program statements on one line, as long as each statement ends with a semicolon. Consider the following SAS programs, which create and print a small dataset of graduate students' names, ages, and degree programs.



data students;                DATA Students; 
input name $ age degree $;      INPUT Name $ Age Degree $;CRARDS;  
datalines;                    
Mary    21 MA                    Mary    21 MA
John    22 MS                    John    22 MS
Alice   22 MBA                   Alice   22 MBA
Joe     25 PhD                   Joe     25 PhD
;                              
proc print data=students;     PROC PRINT Data=Students; Run;

run;                          

The two programs produce identical results. You may want to use these ideas to start organizing your programs in some pleasing manner. For example, it may be easier for you to find errors if the program is divided into "paragraphs," with a blank line between paragraphs.

Notice that the lines of data, unlike the lines of SAS code, must be written exactly as you want them to appear. For example, SAS would consider an "M. S." degree and an "MS" degree to be different from each other.

Comments

Comments are statements that are ignored by the SAS program. Comments may appear anywhere in a SAS program. They have two forms:

*This is a comment statement;

and

/* This is also a comment statement. */

In the first form, the comment begins with an asterisk in the first column and ends with a semicolon. In the second form, the comment begins with a slash-asterisk and ends with asterisk-slash. The first form is useful for writing a brief note to yourself about the program. The second form is useful for writing longer comments that extend over several lines. It can also be used to "block out" long segments of code that you want to retain in the program but do not want to run. A common mistake that occurs with the second form is forgetting to end the comment with */ . If this is not supplied, SAS will consider everything after /* to be part of the comment.

Comments have several important uses.

  1. They help you keep track of the steps you took to derive an answer.
  2. There may be modifications to the data or to the statistical analyses during a study. Comments can be used to note the history of the investigation.
  3. Very often, someone else will be using your program and will need assistance with understanding how you arrived at your answer. Comments provide invaluable help to others who may use your program.
  4. By blocking out sections of the program with /* and */, you can run partial analyses to save time, and you can remember how you previously analyzed the data if you change your approach.
  5. Studies are never "finished." Journal reviewers, academic advisors, and others may request additional analyses months or years after the data have been analyzed. Comments will refresh your memory about the project.

Variable names

There are many possible SAS variable names (to be exact, there are 27 + 27(37) + 27(37)(37) + ... + 27(377) - well, anyway, it's a big number). You will find it helpful to use meaningful variable names. You could name variables A, B, C, etc. in every SAS program you ever write, and SAS wouldn't care. However, using full names, mnemonics and abbreviations will help you and others to keep track of the variables.

Suppose that someone conducts an agricultural experiment, analyzes the data, and moves on to something else. Years later, you ask the researcher if you may look at the data for use in another study. The SAS dataset could have these variables:

BLOCK, TREAT, SUBPLOT, REP, Y

or these:

COUNTY, IRRIGAT, FERTILIZ, ROWNUMBR, CORN_YLD

Which would you prefer?


Options

SAS allows you to change the appearance of printed output in several ways. For example, you may want to left-justify the output rather than centering it, or you may want to print as much output as possible on each sheet of paper. These and other changes may be made with an OPTIONS statement in SAS. The OPTIONS statement can appear anywhere in the program, but it usually appears as the first line. The following changes to output may be made with an OPTIONS statement.

An example of an OPTIONS statement is:

OPTIONS linesize=80 pagesize=54 nocenter;

Other options control such things as the characters used to print tables and how many details are included in the SAS log. To see a list of these options or to change them, click on Globals, then Options, then Global options within the SAS session window.

Titles and footnotes

By default, SAS prints the title "The SAS System" at the top of each page. You can replace this with your own title, add subtitles, and add footnotes. You may add titles and footnotes anywhere in the program, but you need to make sure that they match the appropriate content. For example, if you use the titles 'Trends, 1980-1990' and 'Trends, 1990-2000' in one program, you need to make sure that each title matches its decade's data. You need to ensure that both opening and closing quotation marks are furnished; otherwise, SAS will interpret the rest of your program as part of a title.

To add one title, add a statement like this:

TITLE 'Example of a SAS Title';

You may add a title and up to nine subtitles by adding a number from 1 to 10 after the word TITLE in the title statement. An example appears below.

TITLE1 'Example of a SAS Title';
TITLE2 ' ';
TITLE3 'This is a Subtitle';

Here title2 provides an empty line. In most SAS system, the quotation mark '' after the title statements are optional. One can simply write TITLE This is the first output; The following command will remove all titles.

TITLE;

Footnotes can be added with the same rules above, replacing TITLE with FOOTNOTE.

Labels

Eight characters doesn't provide enough information about many variables. For example, a basketball team's statistician may follow the advice in the section above and use the variable name _3PMPO97 to indicate "three-point field goals made in the playoffs in 1997." This isn't readily apparent to anyone else, though. Fortunately, labels can be attached to variables. When defining a dataset in a DATA step, a LABEL statement is used to define these labels. The labels can be up to 40 characters long, including blanks. SAS will automatically print the labels in commonly-used procedures, but LABEL must be added to the PROC PRINT statement to use the labels in a data listing. An example is shown below.

DATA basket;
 INPUT jerseyno $ name $ _3pmpo97 _3papo97 @@;
 LABEL jerseyno='Jersey number' name='Name' 
  _3pmpo97='3 pt. goals made, 1997 playoffs'
  _3papo97='3 pt. goals attempted, 1997 playoffs';
 DATALINES; 
0  Irlbeck 4 19  00 Morrison 6 9   11 Perkins 1 4
13 Pinson 4 9    14 Camacho 0 0    22 Gragg 0 0
23 Mitchell 0 0  35 Garcia 0 1     42 Cannon 3 3 
44 McGuire 0 0   51 Betts 2 4      55 Galloway 2 4
;
PROC PRINT DATA=basket LABEL;
 TITLE 'Playoff results';
RUN;
The output looks like:
               Playoff results                                                                   
                                            3 pt. goals
                             3 pt. goals     attempted,
        Jersey                 made, 1997        1997
OBS    number    Name          playoffs       playoffs

 1      0       Irlbeck          4              19    
 2      00      Morrison         6               9    
 3      11      Perkins          1               4   
.........................................................

More about PROC PRINT

The program above used LABEL within the PROC PRINT statement. PROC PRINT alone would satisfy the basic request, but it was modified to suit our needs. Many of the statistical procedures work the same way; you can add options to the PROC statements to ask for more output, suppress some output, or change the way in which the analysis is performed.

Strictly speaking, DATA=BASKET is another option. If the name of the dataset is not supplied, SAS will operate on the most recently-created dataset. However, you will soon be working with several datasets at once, and including the DATA= option whenver you call for a PROC will ensure that you are analyzing the correct set of data.

Other options for PROC PRINT include:

You do not have to print all of the variables or all of the observations in a dataset. For example, if you only needed a roster of the players and their numbers, you could use:

PROC PRINT DATA=basket;
 VAR jerseyno name;
RUN;

VAR specifies the VARiables to be printed. By reversing NAME and JERSEYNO, you would change the order in which they appear on the printout.

Suppose that you wanted to print only the players' names and jersey numbers without the observation numbers. You could use the following:

PROC PRINT DATA=basket noobs;
 VAR name jerseyno;
RUN;

Alternatively, you could use the ID statement to identify the players by name, as shown below.

PROC PRINT DATA=basket;
 VAR jerseyno;
 id name;
RUN;
The above two PROC PRINT statements output the same results.

For large datasets, you may want to look at a partial listing of the variables to see if they appear as you had expected. For example, if you wanted to print only the first five observations from the basketball data, you could submit the following:

PROC PRINT DATA=basket(obs=5);
RUN;

PROC PRINT can be modified in other ways; for example, totals and subtotals of numeric variables can be printed. Consult SAS documentation for more details.

Formats

Earlier, you used SAS informats to specify how to read certain types of data. Formats can be used to specify how the data should be printed. The syntax for formats is similar to that for informats; the name of a variable precedes its intended format, and a period is used after the format. Look at the following example involving fictional data from a day care center. Some parents pay more than others, depending on the time the child stays at day care during the week.

OPTIONS nodate nonumber nocenter;
DATA kids;
 INPUT @1 firstnam $11. @12 lastname $11. 
  @23 birthday mmddyy10. @33 gender $1. @34 wklyrate 3.;
 LABEL firstnam='First name' lastname='Last name'
  birthday='Birthday' gender='Gender' 
  wklyrate='Rate';
 DATALINES;
Douglas    Lindgren   08/29/1996M115
Elizabeth  Wilkerson  01/13/1997F95
Evangeline Chambers   03/11/1997F100
Arthur     Hollander  07/19/1996M.
ChristopherKalbfleisch04/13/1995M115
Stacy      Siegel     11/15/1996F100
;
PROC PRINT DATA=kids NOOBS LABEL;
 FORMAT birthday worddate18. wklyrate dollar7.2;
 title 'Day care roster';
RUN;

The FORMAT statement in PROC PRINT instructs SAS to print the birthday in words, using up to 18 characters. The weekly rate is to be expressed in dollars, using up to seven characters. The last two characters to be printed in WKLYRATE fall after the decimal point. SAS produces the following output:

Day care roster

First name   Last name             Birthday   Gender       Rate

Douglas      Lindgren       August 29, 1996     M       $115.00
Elizabeth    Wilkerson     January 13, 1997     F        $90.00
Evangeline   Chambers        March 11, 1997     F       $100.00
Arthur       Hollander        July 19, 1996     M           .  
Christopher  Kalbfleisch     April 13, 1995     M       $115.00
Stacy        Siegel       November 15, 1996     F       $100.00

Many different formats for dates, times, and numerals are available. For example, you can write Social Security numbers with the SSN11. format, or you can write a date in terms of the reigns of the Japanese emperors with the NENGO8. format.

Creating your own formats

You may want to write your own formats for a specific application. For example, using the data above, you may want to spell out "Male" or "Female," and you may want to simply classify rates paid as "Low" (under $100/week) or "High" (at least $100/week). PROC FORMAT allows you to create your own formats. An example is shown below.

PROC FORMAT;
 VALUE $sexfmt 'F'='Female' 'M'='Male';
 VALUE ratefmt 0-<100='Low' 100-HIGH='High' .='Missing';

PROC PRINT DATA=kids NOOBS LABEL;
 VAR firstnam lastname gender wklyrate;
 FORMAT gender $sexfmt. wklyrate ratefmt.;
 TITLE 'Day care roster';

RUN;

Notice that PROC FORMAT requires a dollar sign when defining character formats. Also, notice that periods do not appear in the names of the formats in PROC FORMAT, but they do appear when using those formats later in PROC PRINT. The word HIGH in PROC FORMAT is a signal to include all rates from $100 to the maximum rate in the category called "High." This program produces the following output.

Day care roster

First name     Last name      Gender    Rate

Douglas        Lindgren       Male      High
Elizabeth      Wilkerson      Female    Low
Evangeline     Chambers       Female    High
Arthur         Hollander      Male      Missing
Christopher    Kalbfleisch    Male      High
Stacy          Siegel         Female    High

Sorting datasets

You may want the observations in your printouts to be arranged in a certain order. For example, you may want the children listed alphabetically by last name, or you may want to list them in order by the amount their parents spend for day care. In SAS, PROC SORT is used to sort data. The following example sorts the children first by gender, then by birthday.

PROC SORT DATA=kids;
 BY gender birthday;

The default in sorting is the acending order from a to z and from smaller number to larger. If you include the word DESCENDING in the BY statement, SAS will sort the data in descending order of the variable following the word DESCENDING.

In addition to arranging data on a printout, PROC SORT has a very important function in SAS. If you want to analyze several groups of data in the same way, those groups of data must first be sorted so that they appear in ascending order in the dataset. You can ask for separate analyses of groups using a BY statement, but it will not work correctly unless you sort the data first. Using the example above, and assuming that you have sorted the data by gender and birthday as shown above, you could obtain separate printouts of data for the girls and boys with the following:

PROC PRINT DATA=kids NOOBS LABEL;
 VAR firstnam lastname birthday wklyrate;
 BY gender;
 FORMAT birthday worddate18. wklyrate dollar7.2;
 TITLE 'Day care roster';
RUN;

The resulting output is shown below. Notice that, within each group, the children are arranged with lowest birthdays (older children) listed first.

Day care roster

Gender=F

First name    Last name              Birthday       Rate

Stacy         Siegel        November 15, 1996    $100.00
Elizabeth     Wilkerson      January 13, 1997     $95.00
Evangeline    Chambers         March 11, 1997    $100.00


Gender=M

First name      Last name               Birthday       Rate

Christopher    Kalbfleisch        April 13, 1995    $115.00
Arthur         Hollander           July 19, 1996        .
Douglas        Lindgren          August 29, 1996    $115.00

Customized reports

Recall the way that SAS makes a comma-delimited file for transferring data to a spreadsheet as described in Lesson 3. A DATA _NULL_ statement, followed by FILE and PUT statements, was used to create a text file. The same idea can be applied to print data in a more pleasing format than that produced by PROC PRINT. For example, suppose that the day care provider wishes to print reminders for the children's parents to pay on time. The following lines of code could be used.

PROC FORMAT;
 VALUE $genfmt 'M'='son' 'F'='daughter';
DATA _null_;
 SET kids;
 FILE 'a:\reminder.out';
 PUT lastname 'Family: You owe ' wklyrate dollar7.2 
  ' for your ' gender $genfmt. ' ' firstnam '.';
RUN;

SAS produces the following output in A:\REMINDER.OUT.

Lindgren Family: You owe $115.00 for your son      Douglas .
Wilkerson Family: You owe  $95.00 for your daughter Elizabeth .
Chambers Family: You owe $100.00 for your daughter Evangeline .
Hollander Family: You owe     .   for your son      Arthur .
Kalbfleisch Family: You owe $115.00 for your son Christopher .
Siegel Family: You owe $100.00 for your daughter Stacy .

Transferring output to a word processor

At some time, you will probably need to transfer SAS output to a word processing program. The most efficient way to do this is to run your SAS program as usual. Then, in the Output window, choose File, then Save As ... Under Save as type:, choose RTF Files. Then, import the RTF file into the word processor. RTF (Rich Text Format) can be interpreted correctly by such programs as WordPerfect and Microsoft Word. In general, this works better than cutting and pasting or reading SAS output stored in ASCII DOS Text format into the word processing program.


Homework problems for this lesson

Return to STA 5106 home page