9.0 Oh, The Things You Can
Do
In this section, we will touch briefly on
a few of the many tools available to you that can be used in classes, papers,
teaching, etc. Each of these topics has entire books and courses devoted
to them. This session, in no way, will attempt to teach you the software,
but rather to show you that they exist and how they can be used.
9.1 SAS
The SAS System is a software system
for data analysis.
The goal of SAS Institute is to provide
data analysts one system
to meet all their computing needs. When
your computing needs are
met, you are free to concentrate on results
rather than on the mechanics
of getting them. Instead of learning
programming languages, several
statistical packages, and utility programs,
you only need to learn the
SAS system.
[Ed's note: Yeah, right]
Although SAS has a graphical front end, it
is better to run it from the command line. You can create SAS procedure
files using your editor and run the by passing them into SAS.
You could create a SAS file in Emacs, called
yield1a.sas for example, and then run
it from in Xterm with the command...
fdrebin@blowfish> sas yield1a
This produces at least one other file, the
.log file (pronounced "dot log").
This file shows information about the specific SAS job such as details,
errors, resources used, files created, etc.
If you have output from SAS procedures, then
these will be in the .lst file (pronounced
"dot list"). Usually SAS produces much more white space than
you'll want to print out. You can (and should) trim the output as much
as you can before printing it out. You should also use the pr2up
command to print these files. The pr2up command
prints two pages side by side on one sheet of paper.
You can either use the more
command to look at the .log and .lst
files or open them in Emacs. If the results are not what you wanted, if
you have errors, or if you need to do more analyses, then you can start
the procedure all over again.
Sometimes you will see SAS work files with
names like SAS_worka42D1. If you are
not currently running a SAS job, then these are not needed and should be
deleted. They are can take a lot of unessential space.
The SAS program file that we will run is in
the System Manager's public directory. Copy it
from there to your directory with the command...
fdrebin@blowfish> cp ~dmarlin/public/orientation/yield1a.sas
~/
You will also need the data file, yield.dat
in the same directory.
Open the yield1a.sas
file in Emacs and take a look at it. There are comments (lines that begin
with *) in the file, that show what
the following few statements are intended to do.
After looking at the file, run the job from
an Xterm window, using the command...
fdrebin@blowfish> sas yield1a
When the job is done, pull up a directory
listing using the ls command. You should
see four new files:
-
yield1a.log which shows the specifications
about the SAS job
-
yield1a.lst which shows the output from
the SAS job
-
graphout.eps which is the SASGraph plot
that we created
-
fact.ssd01 which is a permanent SAS data
set.
Load up the .lst
file in an Emacs buffer and look over the results. Let's say we want to
include the output from the Analysis of Variance in our report. We will
want to create file called anova-output.lst
that contains just the Analysis of Variance data.
There are a variety of ways that this can
be done. One way is to delete everything except what we want and write
the buffer to a new file. To do this we'll need to block all of the unwanted
material and "kill" it. You can block the unwanted material by
using the mouse or by using the C-SPC
(<Control-Spacebar>) command
at the beginning of the block and C-w
at the end of the block.
After you have removed all the unwanted data,
save the resulting buffer to a file called anova-output.lst.
The command to write a buffer to a new file is C-x C-w,
then specify the new file name in the mini-buffer. Your new file should
look something like this...
Analysis of Variance Procedure
Dependent Variable: YIELD
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 14 1339.0249 95.6446 4.87 0.0001
Error 75 1473.7667 19.6502
Corrected Total 89 2812.7916
R-Square C.V. Root MSE YIELD Mean
0.476048 24.04225 4.4329 18.438
Source DF Anova SS Mean Square F Value Pr > F
METHOD 2 953.15622 476.57811 24.25 0.0001
VARIETY 4 11.38044 2.84511 0.14 0.9648
METHOD*VARIETY 8 374.48822 46.81103 2.38 0.0241
Figure 26 - Analysis of Variance Output
One of the files that was created when we
ran the SAS job earlier was a file called graphout.eps.
This file is an Encapsulated Postscript file. You can view the file using
a program on the system called ghostview
using a command like this...
fdrebin@blowfish> ghostview graphout.eps &
Ghostview will open another window and display
the graph represented by the file graphout.eps.
If you want to print the graph, you can do so from the ghostview program
by choosing the Print option from the File menu.
We are done with this part of the project.
Now let's move on to another bit of analysis and graphics using S-Plus.
9.2 S-plus
S-plus is a language and an interactive programming
environment for data analysis and graphics. ...The primary goal of the
S-plus environment is to enable and encourage good data analysis. The facilities
in S-plus are directed toward this goal.
S-plus is about data: it provides general
and easy-to-use facilities for organizing, storing, and retrieving all
sorts of data.
S-plus is about analysis: that is, computations
you need to understand and use data. S-plus provides numerical methods
and other computational techniques.
S-plus is about programming: you can write
functions in the S-plus language itself. These functions can build on the
power and simplicity of the S language. Because S-plus is highly interactive,
new functions can be designed and tried out much faster than with most
languages. S-plus also provides simple interfaces to other kinds of computing,
such as to commands from the Unix system or to C or Fortran routines.
Especially, S-plus is about graphics interactive,
informative, flexible ways of looking at data. The graphics capabilities
of S are designed to encourage you to create new tools and try out new
ideas.
A wide range of people are presently using
S-plus in diverse areas-financial analysis, statistics research, management,
academia-for analytical computing, graphics, and data analysis.
S-plus is best used in Emacs with the S-mode.
You can start S-plus in Emacs either from the command line, with the command...
fdrebin@blowfish> emacs -f S &
or from inside Emacs with the command...
When you first start up Emacs in the S-mode,
the minibuffer will ask you for the working directory and give you ~/
(your home) as the default. For now, just press <Return>
to accept the default. If the directory you specify doesn't have a .Data
subdirectory, then the home directory will be used instead. It is generally
a bad practice to use the home directory very much since you could easily
wind up with a lot of extra files there.
As you do more with S-plus, you will want
to create different working directories for each project or class. You
can do this by creating a .Data directory as a subdirectory in the subdirectory
you are using for the project. This is where S-plus stores all its objects,
most of the time.
You can read a lot about the S-mode using
the Emacs info documentation reader.
To call up the info docs within Emacs, use
the command...
which will bring up the Info window inside
an Emacs buffer.
As you can see, there are a dozen or so entries
in the menu, most are in some way related to GNU products. You can use
the up, down arrow keys on the keyboard to move down the list. Move down
the list until your cursor is on a line that reads...
* S-mode: (S-mode). Emacs Mode for S/Splus.
Now press <Return>
to call up the S-mode info documentation. That documentation should look
something like Figure 29.
Again, you can use your up/down arrow keys
to scroll through the documentation. Any line that begins with an "*"
is a selectable menu option. You can press the letter "u"
to go up a page (back) and the letter "q"
to quit out of the Info mode.
The first thing we want to do is to read in
the permanent SAS data set (fact.ssd01)
that was created as output from the SAS session earlier. This is done inside
the Emacs S-mode at the command line with a command like:
> yield.df <- sas.get('.','fact')
and press <Return>.
The "<-"
symbols represent assignment. The output of the sas.get
procedure is a data frame and it is assigned to the variable yield.df.
The sas.get procedure retrieves a permanent
sas data set, in this case, called fact
and from the current directory, i.e. ".".
As an aside, if you want to find out what
the sas.get function does and how its
used, you can call up the S-plus Help mode and look it up. You'll find
the S-plus Help very handy for looking up functions definitions and such.
The command to start up the S-plus Help mode is...
The minibuffer will respond by asking you
what you need help on. For our example, type in the name of the function,
sas.get and press <Return>.
Emacs will respond by opening a new window with the help information available
for that function.
You can go to that window and scroll through
the help. When you're done with the S-plus Help window, simply move back
to the S-plus mode buffer (C-x o) and
continue working.
In order to change the type of one of the
variables, we use the following command...
> yield.df$variety <- as.factor(yield.df$variety)
We can now look at the data frame object with
the command...
This produces a description of yield.df.
Since yield.df is a data frame, the summary is a description of
each variable it contains. Your output should look like this...
method variety yield
A:30 1:18 Min. : 6.00
B:30 2:18 1st Qu.:14.65
C:30 3:18 Median :18.55
4:18 Mean :18.44
5:18 3rd Qu.:22.10
Max. :28.60
We now want to visually look at the interaction
between variety and method. We do this with the following sequence of commands.
The first thing to do is to turn on the graphics device using the command...
This brings up a xdvi graphics window. Now
we'll attach the data frame to the graphic device using the command...
We'll plot our graph using the command...
> interaction.plot(method,variety,yield)
You should notice that the xdvi graphic window
now displays our graph.
Suppose that we want to include that plot
in our report. We need to save the postscript code to a file that can be
included into our LaTeX document. There are a number of ways to do this.
We will use the following...
> postscript(file="intplot.eps",horiz=F)
> interaction.plot(method,variety,yield)
Now we detach the data frame with the command...
and close the device which will force it to
write the postscript file...
You should see something like this as it closes
the device...
Generated postscript file "intplot.eps".
openlook
There should now be a file in your working
directory called intplot.eps.
We're done with S-plus now, so to exit out
of the S-mode in Emacs, use the command...
9.3 LaTeX
LaTeX is a generic typesetting system
that uses as its formatting
engine. ...Due to its flexibility, ease
of use, and professional
typographic quality, LaTeX is presently
used in almost all areas
of science and the humanities. Unlike many
word processors,
LaTeX (and its underlying formatting engine
TeX) comes free of
charge and is not linked to any particular
computer architecture or
operating system. Since LaTeX source files
are plain text files, it
is possible to ship them, and the packages
referenced, from any
computer to any other computer in the world
(over electronic
networks or via normal mail). The recipient
will be able to obtain
a final output copy identical to the one
generated at the sender's site,
independently of the hardware used. Thus
members of groups,
geographically spread over several sites
in different countries, or
even on different continents, can now work
together in composing
complex documents where different parts
can be dealt with by
different individuals, and then brought
together without problems.
Moreover, the use of electronic manuscripts
has the potential to
speed up the publication of papers by publishers.
The LaTeX file that we will use to write our
report is located in the System Manager's public directory. You should
copy it using the command:
fdrebin@blowfish> cp ~dmarlin/public/orientation/report.tex ~/
Open this file in Emacs, using the Emacs command...
and typing in the name of the file; report.tex.
You'll notice as the file is loading, that many other things are loading
also. This is the LaTeX mode that is used with Emacs. It is called AUC-TeX
and is similar in concept to the S-mode used with S-plus.
You'll find that, like the S-mode, there is
an Emacs info documentation file available for this AUC-TeX mode. You'll
remember that you can bring up the info docs using the command:
Feel free to look over the file. You'll see
that LaTeX uses tags embedded in the document to specify what its supposed
to look like.
There's not much we need to do here so let's
"tex" (used as a verb) this document and see what we get. To
run this document through LaTeX in the Tex-mode, use the command...
You'll notice down in the minibuffer that
the default command is LaTeX, which is what we want so just press <Return>.
LaTeX will take a moment to compile the document.
If there are any errors, it will tell us in the minibuffer how to see them.
If the compilation successfully ended, then it will say so in the minibuffer.
After the compilation has finished and there
are no errors, we want to view the document to see what it looks like.
We can do that using the command...
and in the minibuffer type "v"
for view (probably the default) and press <Return>.
The minibuffer will probably show you something like this...
View command: xdvi report.dvi
You can just press <Return>
to accept this. What it will do is call the xdvi graphics viewer up as
a new window and display our formatted report. We can page through the
document to make sure that everything is okay. If not, we can go back to
the report.tex file and start over.
If we are all done and want to print the file,
then we again execute the command
This time the command is Print.
We can just type p to get the Print
command or type the entire word. The minibuffer will ask you for the printer
and then the print command, you can press <Return>
for both, the defaults will be fine.
Your LaTeX report should print.
9.4 Conclusion
This is all you will see in this section.
It is a quick tour with little explanation, sort of like the boat ride
on the chocolate river in Willie Wonka and the Chocolate Factory. Feel
free to explore any of these systems. There are a number of printed and
online resources to help you. Good luck.