Seminar page

Using data collected from fishery-independent surveys in the Chesapeake Bay (eastern U.S.), we compare several methods for estimating relative abundance using catch-per-unit-effort (CPUE) data for a study area that is irregular in shape. The methods are: an approximation to block kriging, approximate block kriging in the presence of trend, and design-based estimation based on stratified multistage cluster sampling. We describe a method for estimating the spatial average and its SE using an approximation for block kriging which incorporates a trend component. What makes this work distinctive from universal block kriging is the potential use of covariates other than the spatial indices common in universal kriging and the use of block kriging over an irregular shape. We show that the kriging error for the spatial mean based on the new method is lower than the design-based method for estimating the variance. The method is general and can be applied in other similar situations.

Seminar page

The relationship between a primary endpoint and features of longitudinal profiles of a continuous response is often of interest. One challenge is that the features of the longitudinal profiles are observed only through the longitudinal measurements, which are subject to measurement error and other variation. A relevant framework assumes that the longitudinal data follow a linear mixed model whose random effects are covariates in a generalized linear model for the primary endpoint. Methods proposed in literature require a parametric (normality) assumption on the random effects, which may be unrealistic. We propose a conditional likelihood approach which requires no assumptions on the random effects and a semiparametric full likelihood approach which requires only the assumption that the random effects have a smooth density. It is straightforward and fast to implement the conditional likelihood approach. EM algorithm is used in general for implementation of the semiparametric full likelihood approach and it involves increased computational burden. Simulation results show that, in contrast to methods predicated on a parametric (normality) assumption for the random effects, the approaches yield valid inferences under departures from this assumption and are competitive when the assumption holds. The semiparametric full likelihood approach shows some efficiency gains over the other methods and provides estimates for the underlying random effects distribution. We also illustrate the performance of the approaches by application to a study of bone mineral density and longitudinal progesterone levels in 624 women transitioning to menopause in which investigators wished to understand the association between osteopenia, characterized by bone mineral density at or below the 33rd percentile, and features of hormonal patterns over the menstrual cycle in peri-menopausal women. Data analysis results obtained from the approaches offer the analyst assurance of credible estimation of the relationship. Seminar page

Seminar page

sing gene expression data to classify sample types or patient survivals has received much research attention recently. To accomodate special features of gene expression data, several new methods have been proposed, including a weighted voting scheme of Golub et al (1999), a compound covariate method of Hedenfalk et al (2001) (originally proposed by Tukey (1993)), and a shrunken centroids method of Tibshirani et al (2002). These methods look different and are more or less ad hoc. Here we point out a close connection of the three methods with a linear regression model and partial least squares (PLS). Under the general framework of PLS, we propose a penalized PLS (PPLS) method that can handle both categorical (for classification) and continuous (e.g. survival times) responses. Using real data, we show the competitive performance of our proposal when compared with other methods. This is a joint work with Wei Pan (Biostatistics, U of Minnesota) and Jennifer Hall (Medicine, U of Minnesota).

Seminar page

A recent proposal for randomization inference in covariance adjustment offers the option to control for baseline imbalances with various regression methods while preserving the framework of a randomization test. Applied to a randomized controlled trial, narrower confidence intervals are achieved through adjusting for baseline differences. The method will be illustrated in a clinical trial of treatments following childhood cancer, with incomplete longitudinal data from a thick tailed multivariate distribution.

Seminar page

Longitudinal regression analysis is important in a variety of settings when the goal is to characterize changes that occur over time. The focus of this talk is on marginal regression models for longitudinal, categorical response data. I will first discuss a consistency-efficiency tradeoff with semi-parametric modeling when the goal is to estimate the cross-sectional relationship between the response and an exposure E[Y(t) | X(t)]. Next, I will describe the "marginalized" model class which permits likelihood-based estimation of marginal regression parameters. I will extend this class to accomodate response dependence that I have seen with long series of response data (the functional form of response dependence has both serial and long-range components). Finally, I will discuss prospective inference with outcome dependent sampling. One situation where such a sampling scheme might be important is in a study where interest is in estimating the relationship between a response and a time-varying exposure, the exposure is expensive to measure, and a number of subjects exhibited no response variation during the study period (e.g., never had symptoms). With this sampling design, under certain conditions, we are able to make valid inference that is efficient when we exclude subjects without response variation as long as we account for the covariate ascertainment mechanism.

Seminar page

Our work is directed towards the analysis of aberrant crypt foci (ACF) in colon carcinogenesis. ACF are morphologically changed colonic crypts that are known to be precursors of colon cancer development. In our experiment, all animals were exposed to a carcinogen, and some were exposed to radiation. The colon is laid out as a rectangle, much longer than it is wide (hence the longitudinal aspect), the rectangle is gridded, and the occurrence of an ACF within the grid is noted. The biological question of interest is whether these binary responses occur at random through the colon: if not, this suggests that the effect of environmental exposures is localized in different regions. Assuming that there are correlations in the locations of the ACF, the questions are how great are these correlations, and whether the correlation structures differ when an animal is exposed to radiation. Initially, we test for the existence of correlation. We derive the score test for conditionally autoregressive (CAR) correlation models, and show that this test arises as well from a modification of the score test for M\'atern correlation models. Robust methods are used to lower the sensitivity to regions where there are few ACF. To understand the extent of the correlation, we cast the problem as a spatial binary regression, where binary responses arise from an underlying Gaussian latent process. The use of such latent processes in spatial problems has found widespread acceptance in public health, ecological research and environmental monitoring. Our data are clearly nonstationary, with marginal probabilities of disease depending strongly on the location within the colon: we model these marginal probabilities semiparametrically, using fixed-knot penalized regression splines and single-index models. We also believe that the underlying latent process is nonstationary, and we model this based on the convolution of latent local stationary processes. The dependency of the correlation function on location is also modeled semiparametrically. We fit the models using pairwise pseudolikelihood methods. Assuming that the underlying latent process is strongly mixing, known to be the case for many Gaussian processes, we prove asymptotic normality of the methods. The penalized regression splines have penalty parameters that must converge to zero asymptotically: we derive rates for these parameters that do and do not lead to an asymptotic bias, and we derive the optimal rate of convergence for them. Finally, we apply the methods to the data from our experiment.

Seminar page

Seminar page

This talk is comprised of two parts. In the first part, I will talk about my Ph.D. thesis work. We propose a functional convex synchronization model, under the premise that each observed curve is the realization of a stochastic process. Monotonicity constraints on time evolution provide the motivation for a functional convex calculus with the goal of obtaining sample statistics such as a functional mean. We derive a functional limit theorem and asymptotic confidence intervals for functional convex means. This nonparametric time-synchronized algorithm is also combined with an iterative mean updating technique to find an overall representation that corresponds to a mode of a sample of gene expression profiles, viewed as a random sample in function space. In the second part, I will talk about novel statistical methods for the analysis of tissue microarray data. Tissue microarrays (TMAs) represent a high throughput tool for studying protein expression patterns in tissue specimens. In performing TMA analysis, the tissue is immunohistochemically stained and scored by a pathologist based on tumor marker staining scores. It is standard practice to select a single staining cutoff that stratifies the population based on an endpoint of interest. However, if the dichotomized staining score is included in a Cox model that uses the same outcome that was used to dichotomize the staining data, the significance of the biomarkers may be overstated. We introduce a new method (random forest pre-validation) that circumvents this bias problem. The idea is to summarize all staining scores into a single scalar M which can be used as covariate in a Cox regression model. We demonstrate the use of this method to assess the prognostic significance of eight biomarkers for predicting survival in patients with renal cell carcinoma. Our proposed method avoids problems associated with multi-collinearity and over-fitting. We also carry out a cross-validation scheme to compare the predictive power of different prognostic models. Seminar page

Estimating equation approaches are useful for correlated data because the likelihood function is often unknown or intractable. However, estimating equation approaches lack (1) objective functions for selecting the correct root in multiple root problems, and (2) likelihood-type functions to produce inference functions. In this talk, a general description is given of the quadratic inference function approach, a semiparametric framework defined by a set of mean zero estimating functions, but differing from the standard estimating function approach in that there are more equations than the number of unknown parameters. The quadratic inference function method provides efficient and robust estimation of parameters in longitudinal data settings, and inference functions for testing. Further, an efficient estimator using a nonparametric regression spline is developed, and a goodness-of-fit test is introduced. The asymptotic chi-squared test is also useful for testing whether coefficients in nonparametric regression are time-varying or time invariant.

Seminar page

Motivated by a neuroscience experiment which observes spike trains from the primary motor cortex of Macaca Mulatta (rhesus monkey), we develop methods for estimating the intensity function of a Poisson point process corresponding to a single spike train, and for estimating families of intensity functions that have a common (unknown) shape or amplitude. Additionally, we provide tests for a breakpoint in an intensity function at a given location. These methods are based on local likelihood smoothing. Asymptotic properties of the intensity estimate and test statistics for breakpoints are discussed. We also present results from simulation studies which describe the power and actual significance levels of our tests. Estimates for families of intensity functions build on Functional Data Analysis methodology, but extend beyond the current procedures. In particular, our methods do not require that the point process be observed on the full support of the intensity function for each member of the family. We show that for this case, local likelihood methodology corresponds to using a local polynomial fit with adjusted kernel weights.

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page