http://www.stat.ufl.edu http://www.stat.ufl.edu

Tutorial on Making Simple R Packages

This tutorial was written against R version 2.7.1 (2008-06-23) on Ubuntu 7.10, set up as described here. Your mileage may vary. The example below will fail to package on Windows unless you have a FORTRAN compilation environment suitable for R.

Goal

The goal of this document is to guide a graduate student or faculty member to take a collection of R work and create an R package that can be easily shared with the world. We will be using a package of Dr. Doss' as an example. The example shows a deliberately minimal use of the packaging features, because it aims to get you over the hurdle of creating a package. After that big jump, documentation and packaging improvements are more easily made incrementally.

Cookbook

The example package consists of about a half dozen R functions, a FORTRAN program for run time speed, and two data files. Most projects will be of similar size and complexity. The exact files are: R source code npbayes.R, the FORTRAN source code combo.f, the documentation text file Readme.txt, and the data set bcdat.R. You'll find the completely worked example package here. In this example, the Readme.txt is a holdover from an older version of R packaging, and all the information contained there should really be moved into the R manual pages, or files with the names: LICENCE, COPYING, NEWS, README, and ChangeLog. There are other filenames which are expected by convention at the top level directory of an R package, and after you get your basic package working, please skim the chapters that seem to be relevant to your project from the Writing R Extensions manual, listed in the Resources section below.

Make a directory to work on this tutorial in:

$ cd
$ mkdir r-pkg-tut
$ cd r-pkg-tut
Download the source code, as you would find it before it has been converted to a package:
$ wget -r -l 1 http://www.stat.ufl.edu/system/r-pkg-tut/example/source
$ wget -r -l 2 http://www.stat.ufl.edu/system/r-pkg-tut/example/npbayes
$ mv www.stat.ufl.edu/system/r-pkg-tut/example/source .
$ mv www.stat.ufl.edu/system/r-pkg-tut/example/npbayes npbayes.reference
$ rm -r www.stat.ufl.edu
$ find . -type f -name "index*" -print | xargs rm
Make a directory to create the package in:
$ mkdir npbayes
$ cd npbayes
Create the directory hierarchy that the R packaging system is expecting. These names are part of the R packaging conventions, and you can't change them. There are other directories that belong in a package with better documentation such as demo and tests:
$ mkdir data doc man R src
Copy the different file types into the proper directories:
$ cp ../source/npbayes.R R
$ cp ../source/Readme.txt doc
$ cp ../source/combo.f src
The data sets were coded in plain text files, which look like this:
$ head -n 10 ../source/bcdat.rad.txt 
   left right
1    45   Inf
2     6    10
3     0     7
4    46   Inf
5    46   Inf
6     7    16
7    17   Inf
8     7    14
9    37    44
$ head -n 10 ../source/bcdat.radchem.txt
   left right
47    8    12
48    0    22
49   24    31
50   17    27
51   17    23
52   24    30
53   16    24
54   13   Inf
55   11    13
But the data set needs to be turned into a saved R object for inclusion in the package. The conversion is done like this:
$ R
[...]
> bcdat.rad <- read.table('../source/bcdat.rad.txt', header = TRUE)
> bcdat.radchem <- read.table('../source/bcdat.radchem.txt', header = TRUE)
> save(list='bcdat.rad', file='bcdat.rad.rda')
> save(list='bcdat.radchem', file='bcdat.radchem.rda')
> q()
[...]
$ mv bcdat.rad.rda bcdat.radchem.rda data
Create the file DESCRIPTION, which declares many of the properties of your package other than the ones concerned with getting the programming to work:
$ emacs DESCRIPTION				# Normally
$ cp ../npbayes.reference/DESCRIPTION .		# For tutorial purposes only
There are other keywords that belong in a DESCRIPTION file, but these are the minimal set to get the package built:
Package: npbayes
Type: Package
Title: npbayes: Nonparametric Bayesian Analysis using the Gibbs sampler
Version: 0.1
Date: 2008-08-11
Author: Hani J. Doss, Fred W. Huffer
Maintainer: Hani J. Doss <doss@stat.ufl.edu>
Description: Some functions for nonparametric bayesian analysis of survival data
License: GPL2.0
LazyLoad: yes
Create the file NAMESPACE, which declares what objects from your source code will be advertised to the package user as being in the package:
$ emacs NAMESPACE				# Normally
$ cp ../npbayes.reference/NAMESPACE .		# For tutorial purposes only
The NAMESPACE file from the example is shown below. The useDynLib line declares the connection between the FORTRAN source code and R. Within R, the FORTRAN subroutines are called with .Fortran. This interface is highly picky with little error checking, and details are in the Writing R Extensions manual. If you aren't using FORTRAN, you won't need a line like this. The export line declares, to some extent, the visibility of your R functions to the package user. You should only list functions you want the user to call themselves, not any of the internal functions that implement your algorithms. Any function listed here will have many checks and requirements placed on it by the R packaging system, such as additional documentation in the form of R manual pages. There are other things that can go in a NAMESPACE file, and after you get your basic package working please read about them and check out the ones that apply to you.
useDynLib(npbayes)

S3method(plot, mixdir)
S3method(summary, mixdir)

export(gibbs1,
        ritcen,
        sis1,
        sis2)
This example illustrates the use of FORTRAN subroutines. If you are instead using C, the process is similar.

Create an R manual page for each of the functions you listed in the export section of NAMESPACE, and also all of the data objects you put in the data directory, by rephrasing the documentation found in Readme.txt into the R manual page format. There are many other keywords that belong in R manual pages besides those shown here, and you should check them out. The R packaging system tends to take what you give it and run with it. You are required to document all the parameters of the functions you mention, and the parameter names are checked for consistency with the code. If you provide an \example section for a function, the packaging system will run them to see if they work.

$ cd man
$ emacs gibbs1.Rd ritcen.Rd sis1.Rd sis2.Rd bcdat.rad.Rd bcdat.radchem.Rd	# Normally
$ cp ../../npbayes.reference/man/* .						# For tutorial purposes only
Here is the R manual page for one of the R functions:
$ more gibbs1.Rd
\name{gibbs1}
\alias{gibbs1}
\title{This is the title string for gibbs1}
\description{
  This is the description string for gibbs1.
}
\usage{
gibbs1(data, prior, nrep, control = c(10, 1, 3), seed = NULL,
eps = 0.01, adtim = NULL, domcse = T, ci = 0, nbatch = 20, new = T,
npass.ci = 3, nrep.ci = NULL, tol = 0, reuse = F)
}
\arguments{
  \item{data}{is a matrix containing the data intervals.  The matrix has
two columns.  Each row of the matrix gives one data interval.  The time
value infinity may be coded as NA, Inf, or -99.}

  \item{prior}{is a vector specifying the prior distribution.  It
contains three values a, b, totm0.  The values a, b specify the gamma
mixing distribution.  The value totm0 is the total mass alpha(R) of the
Dirichlet measure.}

  \item{nrep}{is the number of iterations of the Gibbs sampler to
record.}

  \item{control}{is a vector of three values (go, g1, gextra) which
control the running of the Gibbs sampler.  \code{g0} is the warm-up or
burn-in period.  The first g0-1 iterations of the "complete" Gibbs
sampler are discarded.  \code{g1} s the spacing.  Every g1 iterations of
the "complete" Gibbs sampler, we record various statistics (keep running
totals).  \code{gextra} controls the spacing of the "extra step" which
randomizes the position of the atoms.  The extra step is inserted every
gextra sweeps of the "basic" Gibbs sampler.  The gextra sweeps of the
"basic" Gibbs sampler plus the extra step make up one iteration of the
"complete" Gibbs sampler.}

  \item{seed}{is a vector of three integer seeds for the random number
generators.  The first seed should be between 1 and 32.}

  \item{eps}{is a small positive value.  The program currently does not
handle exact observed death times t, but replaces these by interval
censored observations (t-eps,t).}

  \item{adtim}{is a vector of additional time points at which the
survival function is to be computed.  If this is not specified, then the
survival function is only computed at the endpoints of the data
intervals.}

  \item{domcse}{Needs documentation.}

  \item{ci}{Needs documentation.}

  \item{nbatch}{Needs documentation.}

  \item{new}{Needs documentation.}

  \item{npass.ci}{Needs documentation.}

  \item{nrep.ci}{Needs documentation.}

  \item{tol}{Needs documentation.}

  \item{reuse}{Needs documentation.}
}
Here is the R manual page for one the datasets:
$ more bcdat.rad.Rd 
\name{bcdat.rad}
\docType{data}
\alias{bcdat.rad}
\title{Title string for bcdat.rad}
\description{
   This is the description of the bcdat.rad dataset.
}
\usage{bcdat.rad}
\format{This describes the format of bcdat.rad.}
\source{This gives the source of bcdat.rad}
\references{
   Any references for bcdat.rad go here
}
After you have created all the R manual pages, you are ready to see what the R packaging system thinks of your work. Run this command and fix the warnings and errors it generates until the remaining warnings are correct caveats about the package. Here is example check output when run against a clean package directory. As you iterate fixing errors revealed by check and build, a few additional warnings will appear due to work done by previous such as the creation of object files. Most of the check output is in the ..Rcheck directory. The check command has options, which can be shown with $ R CMD check --help.
$ cd ..
$ R CMD check .
* checking for working pdflatex ... OK
* using log directory '/home/bb/r-pkg-tut/npbayes/..Rcheck'
* using R version 2.7.1 (2008-06-23)
* using session charset: UTF-8
* checking for file './DESCRIPTION' ... OK
* this is package 'npbayes' version '0.1'
* checking package name space information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking whether package 'npbayes' can be installed ... OK
* checking package directory ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking index information ... OK
* checking package subdirectories ... WARNING
Found the following directory(s) with names of check directories:
  ..Rcheck
Most likely, these were included erroneously.
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the name space can be loaded with stated dependencies ... OK
* checking for unstated dependencies in R code ... OK
* checking S3 generic/method consistency ... WARNING
summary:
  function(object, ...)
summary.mixdir:
  function(L, prn)

plot:
  function(x, ...)
plot.mixdir:
  function(L, band, mult, xlab, ylab, new, lty1, lty2, use.ci, ...)

See section 'Generic functions and methods' of the 'Writing R Extensions'
manual.
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking data for non-ASCII characters ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking for portable use of $BLAS_LIBS ... OK
* creating npbayes-Ex.R ... OK
* checking examples ... OK
* creating npbayes-manual.tex ... OK
* checking npbayes-manual.tex using pdflatex ... OK

WARNING: There were 2 warnings, see
  /home/bb/r-pkg-tut/npbayes/..Rcheck/00check.log
for details

$
After the complaints from check have been sufficiently satisfied, you are ready to produce the package with the build command. This command does some sanity checks, but nothing like check does. Note that you must go up a directory when you use build: The output from the first build looks like this, producing the new file npbayes_0.1.tar.gz, which is your R package. The npbayes and 0.1 portions of the filename came from the Package: and Version: values in the DESCRIPTION file:
$ cd ..
$ R CMD build npbayes
* checking for file 'npbayes/DESCRIPTION' ... OK
* preparing 'npbayes':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* removing junk files
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'npbayes_0.1.tar.gz'

$ ls
npbayes  npbayes_0.1.tar.gz  npbayes.reference  source
$

Installation

You or any R user may install the new package into your personal R environment this way. If you already have personally-installed R packages, you may have picked a different value for the repos parameter. If not, the default value is fine:
$ R
[...]
> install.packages('npbayes_0.1.tar.gz', repos=NULL)
Warning in install.packages("npbayes_0.1.tar.gz", repos = NULL) :
  argument 'lib' is missing: using '/home/bb/R/i486-pc-linux-gnu-library/2.7'
* Installing *source* package 'npbayes' ...
** libs
gfortran   -fpic  -g -O2 -c combo.f -o combo.o
gcc -std=gnu99 -shared  -o npbayes.so combo.o  -L/usr/lib/gcc/i486-linux-gnu/4.2 -lgfortran -lm -L/usr/lib/R/lib -lR
** R
** data
** preparing package for lazy loading
** help
 >>> Building/Updating help pages for package 'npbayes'
     Formats: text html latex example 
  bcdat.rad                         text    html    latex
  bcdat.radchem                     text    html    latex
  gibbs1                            text    html    latex
  ritcen                            text    html    latex
  sis1                              text    html    latex
  sis2                              text    html    latex
** building package indices ...
* DONE (npbayes)
> q()
Save workspace image? [y/n/c]: n
$ 

Use

The R manual pages you wrote can be viewed with:
$ R
[...]
> help.search('gibbs1')
> help(gibbs1, package = npbayes)
Add the package to your libraries and it is easier to reach:
> library('npbayes')
> ?gibbs1
And here it is working:
> data('bcdat.rad')
> out <- gibbs1(bcdat.rad,c(2,88,10),10000)
> summary(out)
> plot(out)

Resources

The R packaging system is extensively documented in a 127 page manual. It supports doing many fancy things for special cases, but it is harder to tell which of those fancy things is worth putting the effort into doing. The R language allows the use of subroutines written in C, FORTRAN, and other languages, and the packaging system supports the full generality of cross-platform, cross-OS, cross-language use in all its messy detail. Quite a bit of programming language implementation philosophy and detail is necessary to make use of those features. Famous computer scientist Donald Knuth, who wrote TeX, said: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Please consider this advice at length before writing C or FORTRAN to interface with R.

This format of the manual is better for looking up answers:
http://cran.r-project.org/doc/manuals/R-exts.html

This format is better for reading cover to cover:
http://cran.r-project.org/doc/manuals/R-exts.pdf

Bugs

There is no manual generated in PDF format.

It is argued that hiding the visibility of functions with a NAMESPACE file is overly formal and structured for a programming project of this size. Programming intuition will tell people not to use a variable name like "average", that may be in use because it occured to other programmers first. CRAN does accept packages without NAMESPACE files, and packaging into namespaces can break working code if declarations are omitted.

R has a feature which will build the skeleton of the package, leaving you to just edit the documentation. This is not to my taste because it rewrites your source code to be one function per file. I believe that after you've been living with a piece of source code for a while, massively rearranging it would be disruptive. However, the feature works like this:

$ R
[...]
> source('../source/npbayes.R')
> bcdat.rad <- read.table('../source/bcdat.rad.txt', header = TRUE)
> bcdat.radchem <- read.table('../source/bcdat.radchem.txt', header = TRUE)
> package.skeleton('npbayes', list=ls())

Common Linux shell commands

Linux CommandAction
ls
ls DIRNAME
what files are in this directory
cd DIRNAME
cd ..
cd
change to a directory
change to directory above
change to home directory
mkdir DIRNAMEmake a directory
cp EXISTING-NAME NEW-NAME
cp -r EXISTING-DIR NEW-DIR-NAME
copy file
copy tree of files
mv OLD-NAME NEW-NAME
mv OLD-PLACE NEW-PLACE
rename a file or directory
move a file or directory
Rstart the R language
more FILENAMEpage through contents of file
rm FILENAME
rm -r DIRNAME
delete file
delete whole directory


(C) University of Florida, Gainesville, FL 32611; (352) 392-1941.
This page was last updated Tue Sep 25 00:57:44 EDT 2012
http://www.ufl.edu