![]() |
![]() |
npbayes.R, the FORTRAN source code combo.f,
the documentation text file
Readme.txt,
and the data set
bcdat.R. You'll find the completely worked example
package
here. In this example,
the Readme.txt is a holdover from an older version of R
packaging, and all the information contained there should really be
moved into the R manual pages, or files with the names:
LICENCE, COPYING, NEWS, README,
and ChangeLog. There are other filenames which are
expected by convention at the top level directory of an R package, and
after you get your basic package working, please skim the chapters
that seem to be relevant to your project from the
Writing R Extensions manual, listed in the
Resources section below.
Make a directory to work on this tutorial in:
Download the source code, as you would find it before it has been converted to a package:$ cd $ mkdir r-pkg-tut $ cd r-pkg-tut
Make a directory to create the package in:$ wget -r -l 1 http://www.stat.ufl.edu/system/r-pkg-tut/example/source $ wget -r -l 2 http://www.stat.ufl.edu/system/r-pkg-tut/example/npbayes $ mv www.stat.ufl.edu/system/r-pkg-tut/example/source . $ mv www.stat.ufl.edu/system/r-pkg-tut/example/npbayes npbayes.reference $ rm -r www.stat.ufl.edu $ find . -type f -name "index*" -print | xargs rm
Create the directory hierarchy that the R packaging system is expecting. These names are part of the R packaging conventions, and you can't change them. There are other directories that belong in a package with better documentation such as$ mkdir npbayes $ cd npbayes
demo and
tests:
Copy the different file types into the proper directories:$ mkdir data doc man R src
The data sets were coded in plain text files, which look like this:$ cp ../source/npbayes.R R $ cp ../source/Readme.txt doc $ cp ../source/combo.f src
But the data set needs to be turned into a saved R object for inclusion in the package. The conversion is done like this:$ head -n 10 ../source/bcdat.rad.txt left right 1 45 Inf 2 6 10 3 0 7 4 46 Inf 5 46 Inf 6 7 16 7 17 Inf 8 7 14 9 37 44 $ head -n 10 ../source/bcdat.radchem.txt left right 47 8 12 48 0 22 49 24 31 50 17 27 51 17 23 52 24 30 53 16 24 54 13 Inf 55 11 13
$ R
[...]
> bcdat.rad <- read.table('../source/bcdat.rad.txt', header = TRUE)
> bcdat.radchem <- read.table('../source/bcdat.radchem.txt', header = TRUE)
> save(list='bcdat.rad', file='bcdat.rad.rda')
> save(list='bcdat.radchem', file='bcdat.radchem.rda')
> q()
[...]
$ mv bcdat.rad.rda bcdat.radchem.rda data
Create the file DESCRIPTION, which declares many of the
properties of your package other than the ones concerned with getting
the programming to work:
There are other keywords that belong in a$ emacs DESCRIPTION # Normally $ cp ../npbayes.reference/DESCRIPTION . # For tutorial purposes only
DESCRIPTION
file, but these are the minimal set to get the package built:
Create the filePackage: npbayes Type: Package Title: npbayes: Nonparametric Bayesian Analysis using the Gibbs sampler Version: 0.1 Date: 2008-08-11 Author: Hani J. Doss, Fred W. Huffer Maintainer: Hani J. Doss <doss@stat.ufl.edu> Description: Some functions for nonparametric bayesian analysis of survival data License: GPL2.0 LazyLoad: yes
NAMESPACE, which declares what objects
from your source code will be advertised to the package user as being
in the package:
The$ emacs NAMESPACE # Normally $ cp ../npbayes.reference/NAMESPACE . # For tutorial purposes only
NAMESPACE file from the example is shown below.
The useDynLib line declares the connection between the
FORTRAN source code and R. Within R, the FORTRAN subroutines are
called with
.Fortran. This interface is highly picky with little
error checking, and details are in the
Writing R Extensions manual. If you aren't using FORTRAN,
you won't need a line like this. The export line
declares, to some extent, the visibility of your R functions to the
package user. You should only list functions you want the user to
call themselves, not any of the internal functions that implement your
algorithms. Any function listed here will have many checks and
requirements placed on it by the R packaging system, such as
additional documentation in the form of R manual pages. There are
other things that can go in a
NAMESPACE file, and after you get your basic package
working please read about them and check out the ones that apply to
you.
useDynLib(npbayes)
S3method(plot, mixdir)
S3method(summary, mixdir)
export(gibbs1,
ritcen,
sis1,
sis2)
This example illustrates the use of FORTRAN subroutines. If you are
instead using C, the process is similar.
Create an R manual page for each of the functions you listed in
the export section of NAMESPACE, and also
all of the data objects you put in the data directory, by
rephrasing the documentation found in Readme.txt into the R manual
page format. There are many other keywords that belong in R manual
pages besides those shown here, and you should check them out. The R
packaging system tends to take what you give it and run with it. You
are required to document all the parameters of the functions you
mention, and the parameter names are checked for consistency with the
code. If you provide an
\example section for a function, the packaging system
will run them to see if they work.
Here is the R manual page for one of the R functions:$ cd man $ emacs gibbs1.Rd ritcen.Rd sis1.Rd sis2.Rd bcdat.rad.Rd bcdat.radchem.Rd # Normally $ cp ../../npbayes.reference/man/* . # For tutorial purposes only
$ more gibbs1.Rd
\name{gibbs1}
\alias{gibbs1}
\title{This is the title string for gibbs1}
\description{
This is the description string for gibbs1.
}
\usage{
gibbs1(data, prior, nrep, control = c(10, 1, 3), seed = NULL,
eps = 0.01, adtim = NULL, domcse = T, ci = 0, nbatch = 20, new = T,
npass.ci = 3, nrep.ci = NULL, tol = 0, reuse = F)
}
\arguments{
\item{data}{is a matrix containing the data intervals. The matrix has
two columns. Each row of the matrix gives one data interval. The time
value infinity may be coded as NA, Inf, or -99.}
\item{prior}{is a vector specifying the prior distribution. It
contains three values a, b, totm0. The values a, b specify the gamma
mixing distribution. The value totm0 is the total mass alpha(R) of the
Dirichlet measure.}
\item{nrep}{is the number of iterations of the Gibbs sampler to
record.}
\item{control}{is a vector of three values (go, g1, gextra) which
control the running of the Gibbs sampler. \code{g0} is the warm-up or
burn-in period. The first g0-1 iterations of the "complete" Gibbs
sampler are discarded. \code{g1} s the spacing. Every g1 iterations of
the "complete" Gibbs sampler, we record various statistics (keep running
totals). \code{gextra} controls the spacing of the "extra step" which
randomizes the position of the atoms. The extra step is inserted every
gextra sweeps of the "basic" Gibbs sampler. The gextra sweeps of the
"basic" Gibbs sampler plus the extra step make up one iteration of the
"complete" Gibbs sampler.}
\item{seed}{is a vector of three integer seeds for the random number
generators. The first seed should be between 1 and 32.}
\item{eps}{is a small positive value. The program currently does not
handle exact observed death times t, but replaces these by interval
censored observations (t-eps,t).}
\item{adtim}{is a vector of additional time points at which the
survival function is to be computed. If this is not specified, then the
survival function is only computed at the endpoints of the data
intervals.}
\item{domcse}{Needs documentation.}
\item{ci}{Needs documentation.}
\item{nbatch}{Needs documentation.}
\item{new}{Needs documentation.}
\item{npass.ci}{Needs documentation.}
\item{nrep.ci}{Needs documentation.}
\item{tol}{Needs documentation.}
\item{reuse}{Needs documentation.}
}
Here is the R manual page for one the datasets:
$ more bcdat.rad.Rd
\name{bcdat.rad}
\docType{data}
\alias{bcdat.rad}
\title{Title string for bcdat.rad}
\description{
This is the description of the bcdat.rad dataset.
}
\usage{bcdat.rad}
\format{This describes the format of bcdat.rad.}
\source{This gives the source of bcdat.rad}
\references{
Any references for bcdat.rad go here
}
After you have created all the R manual pages, you are ready to see
what the R packaging system thinks of your work. Run this command and
fix the warnings and errors it generates until the remaining warnings
are correct caveats about the package. Here is
example check output when run against a clean package
directory. As you iterate fixing errors revealed
by check and build, a few additional
warnings will appear due to work done by previous such as the creation
of object files. Most of the check output is in
the ..Rcheck directory. The check command has options,
which can be shown with
$ R CMD check --help.
After the complaints from$ cd .. $ R CMD check . * checking for working pdflatex ... OK * using log directory '/home/bb/r-pkg-tut/npbayes/..Rcheck' * using R version 2.7.1 (2008-06-23) * using session charset: UTF-8 * checking for file './DESCRIPTION' ... OK * this is package 'npbayes' version '0.1' * checking package name space information ... OK * checking package dependencies ... OK * checking if this is a source package ... OK * checking whether package 'npbayes' can be installed ... OK * checking package directory ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking index information ... OK * checking package subdirectories ... WARNING Found the following directory(s) with names of check directories: ..Rcheck Most likely, these were included erroneously. * checking R files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking whether the name space can be loaded with stated dependencies ... OK * checking for unstated dependencies in R code ... OK * checking S3 generic/method consistency ... WARNING summary: function(object, ...) summary.mixdir: function(L, prn) plot: function(x, ...) plot.mixdir: function(L, band, mult, xlab, ylab, new, lty1, lty2, use.ci, ...) See section 'Generic functions and methods' of the 'Writing R Extensions' manual. * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... OK * checking Rd files ... OK * checking Rd cross-references ... OK * checking for missing documentation entries ... OK * checking for code/documentation mismatches ... OK * checking Rd \usage sections ... OK * checking data for non-ASCII characters ... OK * checking line endings in C/C++/Fortran sources/headers ... OK * checking line endings in Makefiles ... OK * checking for portable use of $BLAS_LIBS ... OK * creating npbayes-Ex.R ... OK * checking examples ... OK * creating npbayes-manual.tex ... OK * checking npbayes-manual.tex using pdflatex ... OK WARNING: There were 2 warnings, see /home/bb/r-pkg-tut/npbayes/..Rcheck/00check.log for details $
check have been sufficiently
satisfied, you are ready to produce the package with
the build command. This command does some sanity checks,
but nothing like check does. Note that you must go up a
directory when you use build: The output from the
first build looks like this, producing the new
file npbayes_0.1.tar.gz, which is your R package.
The npbayes and 0.1 portions of the
filename came from the Package: and Version:
values in the DESCRIPTION file:
$ cd .. $ R CMD build npbayes * checking for file 'npbayes/DESCRIPTION' ... OK * preparing 'npbayes': * checking DESCRIPTION meta-information ... OK * cleaning src * removing junk files * checking for LF line-endings in source and make files * checking for empty or unneeded directories * building 'npbayes_0.1.tar.gz' $ ls npbayes npbayes_0.1.tar.gz npbayes.reference source $
repos parameter. If not, the default value is fine:
$ R
[...]
> install.packages('npbayes_0.1.tar.gz', repos=NULL)
Warning in install.packages("npbayes_0.1.tar.gz", repos = NULL) :
argument 'lib' is missing: using '/home/bb/R/i486-pc-linux-gnu-library/2.7'
* Installing *source* package 'npbayes' ...
** libs
gfortran -fpic -g -O2 -c combo.f -o combo.o
gcc -std=gnu99 -shared -o npbayes.so combo.o -L/usr/lib/gcc/i486-linux-gnu/4.2 -lgfortran -lm -L/usr/lib/R/lib -lR
** R
** data
** preparing package for lazy loading
** help
>>> Building/Updating help pages for package 'npbayes'
Formats: text html latex example
bcdat.rad text html latex
bcdat.radchem text html latex
gibbs1 text html latex
ritcen text html latex
sis1 text html latex
sis2 text html latex
** building package indices ...
* DONE (npbayes)
> q()
Save workspace image? [y/n/c]: n
$
$ R
[...]
> help.search('gibbs1')
> help(gibbs1, package = npbayes)
Add the package to your libraries and it is easier to reach:
> library('npbayes')
> ?gibbs1
And here it is working:
> data('bcdat.rad')
> out <- gibbs1(bcdat.rad,c(2,88,10),10000)
> summary(out)
> plot(out)
The R packaging system is extensively documented in a 127 page manual. It supports doing many fancy things for special cases, but it is harder to tell which of those fancy things is worth putting the effort into doing. The R language allows the use of subroutines written in C, FORTRAN, and other languages, and the packaging system supports the full generality of cross-platform, cross-OS, cross-language use in all its messy detail. Quite a bit of programming language implementation philosophy and detail is necessary to make use of those features. Famous computer scientist Donald Knuth, who wrote TeX, said: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Please consider this advice at length before writing C or FORTRAN to interface with R.
This format of the manual is better for looking up answers:
http://cran.r-project.org/doc/manuals/R-exts.html
This format is better for reading cover to cover:
http://cran.r-project.org/doc/manuals/R-exts.pdf
It is argued that hiding the visibility of functions with a NAMESPACE file is overly formal and structured for a programming project of this size. Programming intuition will tell people not to use a variable name like "average", that may be in use because it occured to other programmers first. CRAN does accept packages without NAMESPACE files, and packaging into namespaces can break working code if declarations are omitted.
R has a feature which will build the skeleton of the package, leaving you to just edit the documentation. This is not to my taste because it rewrites your source code to be one function per file. I believe that after you've been living with a piece of source code for a while, massively rearranging it would be disruptive. However, the feature works like this:
$ R
[...]
> source('../source/npbayes.R')
> bcdat.rad <- read.table('../source/bcdat.rad.txt', header = TRUE)
> bcdat.radchem <- read.table('../source/bcdat.radchem.txt', header = TRUE)
> package.skeleton('npbayes', list=ls())
Linux Command Action ls
ls DIRNAMEwhat files are in this directory cd DIRNAME
cd ..
cdchange to a directory
change to directory above
change to home directorymkdir DIRNAME make a directory cp EXISTING-NAME NEW-NAME
cp -r EXISTING-DIR NEW-DIR-NAMEcopy file
copy tree of filesmv OLD-NAME NEW-NAME
mv OLD-PLACE NEW-PLACErename a file or directory
move a file or directoryR start the R language more FILENAME page through contents of file rm FILENAME
rm -r DIRNAMEdelete file
delete whole directory
| (C) University of Florida, Gainesville, FL 32611; (352) 392-1941. This page was last updated Tue Sep 25 00:57:44 EDT 2012 |
![]() |