Chulmin Kim,   Dept. of Statistics,   University of Florida.

Unconstrained Models for the Covariance Structure of Multivariate Longitudinal Data.

The constraint that a covariance matrix must be positive definite presents difficulties for modeling its structure. In a series of papers published in 1999 and 2000, Mohsen Pourahmadi proposed a parameterization of the covariance matrix for univariate longitudinal data in which the parameters are unconstrained. This unconstrained parameterization is based on the modified Cholesky decomposition of the inverse of the covariance matrix into a function of a unique unit lower triangular matrix with no constraints on its non-trivial elements and a unique diagonal matrix with positive diagonal elements on the diagonal. The positiveness constraint is removed by taking logarithms of the diagonal elements. We extend this idea to multivariate longitudinal data.

We develop a modified Cholesky block decomposition that provides an unconstrained parameterization for the covariance matrix, and we propose parsimonious models within this parameterization. A Newton-Raphson algorithm is developed for obtaining maximum likelihood estimators of model parameters, assuming that the observations are normally distributed. The results along with penalized likelihood criteria such as BIC for model selection are illustrated using a real multivariate longitudinal dataset and a simulated data set.

Seminar page


Lisa LaVange,   Dept. of Biostatistics,   University of North Carolina (Chapel Hill).

Regulatory versus Research Studies:   A Statistical Perspective.

The pharmaceutical industry is a major source of career opportunities for statisticians in today's market. A statistician in this industry has the potential to make significant contributions along the pathways of discovery and development that, when traversed successfully, bring molecular entities to the marketplace. The objectivity with which statisticians approach a problem can be particularly valuable in an industry where so much is at stake based on the outcomes of studies. The role of the statistician varies at the different stages of product development and may differ substantially from the role served on a research study being carried out in a non-regulatory environment. A brief overview of the drug development process will be provided in this seminar, with particular emphasis on the statistician's involvement in the process. Aspects of study conduct, including data collection and management, clinical site monitoring, and data structures needed to support product registration, will be described. Examples of statistical issues that are frequently encountered in drug development and ways of addressing them will also be discussed briefly, including multiplicity, missing data, and covariate adjustment. Career opportunities in the pharmaceutical and related industries will be discussed, time permitting.

Seminar page


Lauren McIntyre,   Dept. of Molecular Genetics and Microbiology,   University of Florida.

Statistical Methods for Mapping Reciprocal Effects and Application in /Zea mays L./

Reciprocal effects are due to effects of the parents (i.e. maternal and paternal effects), cytoplasmic effects, and parent of origin effects. However, the extent to which reciprocal effects exist, or can be attributed to specific underlying components, and mapped requires the development of new analytic approaches. We develop a statistical analysis to identify and map contribution of specific nuclear chromosomal regions to reciprocal effects. These methods are then applied to a case study in /Zea mays L./

Seminar page


Alex Trindade,   Dept. of Statistics,   University of Florida.

Saddlepoint-Based Bootstrap Inference for Nonlinear Regression Models.

We propose a novel method for making small sample inference on the nonlinear parameter in a conditionally linear nonlinear regression model. A parametric bootstrap method is developed where Monte Carlo simulation is replaced by saddlepoint approximation. Saddlepoint approximations to the distribution of the estimating equation whose unique root is the parameter's maximum likelihood estimator (MLE) are obtained, while substituting conditional MLE's for the remaining (nuisance) parameters. A key result of Daniels (1983) enables us to relate these approximations to those for the estimator of interest. The approach may also be viewed as a form of bootstrap score inference, in which saddlepoint approximations for the distribution of a score test statistic, under a family of bootstrap distributions instead of the asymptotic normal distribution, are inverted to produce more accurate small sample confidence bounds. The method's performance relies on a model reparametrization that orthogonalizes the nonlinear parameter with the nuisance parameters, thereby also validating the substitution of conditional MLE's in for the latter. Confidence intervals produced by the method are shown to have coverage errors of order O(n-1/2), with an error rate that is reduced by the orthogonalizing parameterization. The methodology is shown to be applicable also for inference on ratios of regression parameters in ordinary linear models. Simulations from some celebrated examples show that the proposed method yields confidence intervals with lengths and coverage probabilities that compare favorably with those from several competing methods.

Seminar page


Wei Wu,   Dept. of Statistics,   Florida State University.

Statistical Coding in Motor Cortex.

Effective neural motor prostheses require a method for decoding neural activity representing desired movement. In particular, the accurate reconstruction of a continuous motion signal is necessary for the control of devices such as computer cursors, robots, or a patient's own paralyzed limbs. In this talk, I will present our real-time system for such applications that uses statistical Bayesian inference techniques to estimate hand motion from the firing rates of multiple neurons in a monkey's primary motor cortex. The Bayesian model is formulated in terms of the product of a likelihood and a prior. The likelihood term models the probability of neural firing rates given a particular hand motion. The prior term defines a probabilistic model of hand kinematics. Decoding was performed using a Kalman filter as well as a more sophisticated Switching Kalman filter. Off-line reconstructions of hand trajectories were relatively accurate and an analysis of these results provides insights into the nature of neural coding. Furthermore, I will show on-line neural control results in which a monkey exploits the Kalman filter to move a computer cursor with its brain.

Seminar page


Jim Hobert,   Dept. of Statistics,   University of Florida.

A Theoretical Comparison of the Data Augmentation, Marginal Augmentation and PX-DA Algorithms.

The data augmentation (DA) algorithm is a widely used MCMC algorithm that is based on a Markov transition density of the form
p(x|x′) = Y ƒX|Y(x|y) ƒY|X(y|x′) dy.
The PX-DA algorithm of Liu & Wu (1999, JASA) and the marginal augmentation (MA) algorithm of Meng & van Dyk (1999, Bka) are alternatives to DA that often converge much faster and are only slightly more computationally demanding. The Markov transition densities of these alternative algorithms can be written in the form
pR(x|x′) = Y Y ƒX|Y(x|y′) q(y′|y) ƒY|X(y|x′) dy dy′,
where q is a Markov transition density on Y. We show that, under regularity conditions, pR is more efficient than p in the sense that asymptotic variances in the central limit theorem under pR are never larger than those under p. These results are brought to bear on a theoretical comparison of the DA, MA and PX-DA algorithms. As an example, we compare Albert & Chib's (1993, JASA) DA algorithm for Bayesian probit regression with the alternative PX-DA algorithm developed by Liu & Wu. (This is joint work with Dobrin Marchev, Baruch College (CUNY) and Vivekananda Roy, University of Florida).

Seminar page


Xueli Liu,   Dept. of Statistics,   University of Florida.

Detecting Differentially-Expressed Time Course Gene Expression Profiles.

Among the large amounts of high-throughput biological data, time course gene expression profiles can reveal important dynamic features of cell activities. Yet, not so much effort has been contributed to address the key question of detecting differentially-expressed time course gene expression profiles. One reason may be that the experimental designs for the time course gene expression data are not consistent across subjects (e.g., varying sampling rates and the total number of time points sampled for each subject are often small). We present a statistical method for detecting significance of differential time course gene expression data, which can be applied when there are not so many time points for each subject or when the time grid is not regular. The idea of our method is to integrate a newly-developed principal analysis through conditional expectation method and a nonparametric bootstrap into a hypothesis test framework. In doing so, each gene will be assigned a p-value pertaining to whether the gene is differentially expressed. Simulations and analysis of C. elegans data indicated that the method performed better than the two-way ANOVA in identifying differentially-expressed genes in the dauer development study.

Seminar page


Yongsung Joo,   Dept. of Epidemiology and Biostatistics,   University of Florida.

Detecting Related Genes Using Bayesian Model-Based Clustering.

With microarray technology becoming one of the most popular tools in genetic research, the demand for clustering methods has increased. In this paper, we propose a method to detect a small number of related genes based on longitudinally observed gene expressions. For many reasons, conventional techniques such as k-means or hierarchical clustering, cannot provide satisfactory results. First, these methods do not differentiate between predictors and response variables. Second, even though a goal of cluster analysis is to select a small number of genes that can be investigated further in future research, most clustering algorithms tend to provide large clusters as the end product. To overcome the limits of conventional methods, we use a Bayesian model-based method that detects clusters using a linear mixed model, and identifies related genes within those clusters using relevance probabilities.

Seminar page


Raymond Carroll,   Dept. of Statistics,   Texas A & M University.

Measuring Dietary Intake.

Newspaper articles routinely report the results of epidemiological studies of the relationship between what we eat and disease outcomes such as heart disease and various forms of cancer. One of the most-quoted studies is the Nurses Health Study, which follows the health outcomes 100,000 nurses and asks them questions about their dietary intakes. While there are exceptions, for the most part one can find a relationship between heart disease and diet, e.g., less fat, more fruits, etc. On the other hand, it is rare that prospective epidemiological studies of human populations find links between cancer and dietary intakes. Perhaps the most controversial of all is the question of the relationship between dietary fat intake and breast cancer. Countries with higher fat intakes tend to have higher rates of breast cancer, and yet no epidemiological study has shown such a link. The puzzle of course is to understand the discrepancy.

Obviously, the etiology of disease may explain why heart disease, with its intermediate endpoints such as serum cholesterol, has confirmed links to nutrition while the evidence is mixed with cancer. I will focus instead on a basic question of study design: how do we measure what we eat? Try this out: how many days per year do you eat apples? I am going to review the accumulating evidence that suggests that with complex, subtle disease such as cancer, with no good intermediate endpoints such as serum cholesterol for heart disease, finding links between disease and nutrient intakes will be the exception rather than the rule, simply because of the way diet is measured. I will close with remarks about the Women’s Health Initiative Dietary Intervention Trial and two new cohort studies, along with my own views of the subject.

Seminar page


Raymond Carroll,   Dept. of Statistics,   Texas A & M University.

General Semiparametric Analysis of Repeated Measures Data.

This talk considers the general problem where the data for an individual are repeated measures in the most general sense, with a parametric component and a nonparametric component. It is easy, although not well-known, to handle the problem in the case that the nonparametric component of the likelihood function is evaluated exactly once, e.g., when a baseline variable is modeled nonparametrically. Far more difficult, and non-intuitive, is the case where the nonparametric component is evaluated more than once in the likelihood function. Examples include repeated measures studies, variance component models when the random effect is related to the predictors, matched case-control studies with a nonparametric component, fixed-effects models in econometrics, etc. I will present a constructive (i.e., computable), semiparametric efficient method for this general problem. The constructive part is important: like most semiparametric efficient methods, there is an integral equation lurking in the background, but unlike most such methods, in our approach the integral equation can be avoided. An example involving caloric intake and income in China is used to illustrate the methodology, as a means of contrasting a random effects analysis and a fixed effects analysis.

Seminar page