Seminar Schedule

(Click here to access the student seminar schedule.)

Seminars are held on Thursdays from 4:00 p.m. - 5:00 p.m. in Griffin-Floyd 100 unless otherwise noted.

Refreshments are available before the seminars from 3:30 p.m. - 4:00 p.m. in Griffin-Floyd Hall 230.

Fall 2008

Date Speaker

Title (click for abstract)

Sep 18 Robert Dorazio (USGS & UF)  
Sep 25 Minjung Kyung (UF)  
Oct 2 Guido Consonni (University of Pavia) Tests Based on Intrinsic Priors for the Equality of two Correlated Proportions  
Oct 9 Montserrat Fuentes (NCSU)  
Oct 16 Aixin Tan (UF) Practical CLT for MCMC Estimators  
Oct 28 (Tue) Partha Lahiri (UMD)  
Nov 3 (Mon) Nan Laird (Harvard)

Missing Data Problems in Public Health

Challis Lecture
general audience
Emerson Hall Classroom

Nov 4 (Tue) Nan Laird (Harvard)

Testing Gene-Environment Interactions with Nuclear Family Data

Challis Lecture
Emerson Hall Classroom
Nov 20 Robert Easterling (Sandia)

Passion-Driven Statistics

Marquardt Speakers Program


Estimation of Manatee Abundance from Aerial Surveys Using Dual Observers and Removal Sampling

Robert Dorazio (USGS & UF)

Predictions of manatee abundance as a function of habitat characteristics are needed for making conservation decisions for this endangered species. The spatial distribution of manatees in southwest Florida is known to vary across the landscape with access to fresh water for drinking, food availability, warm water refuge, and other factors related to habitat. The spatial distribution of abundance also changes seasonally with changes in water temperature and the threat of mortality from cold stress. Consequently, aerial surveys were developed to estimate abundance of manatees in spatially referenced sample units using a combination of sampling protocols. Groups of manatees were detected using dual observers, and the number of manatees in each group were detected by repeated circling to yield a sequence of "removal" counts. Thus, both kinds of counts (i.e., those of groups and those of manatees within groups) were spatially referenced. A hierarchical modeling framework is developed to estimate maps of manatee abundance while accounting for the imperfect detectability of groups and of individuals within groups. A critical component of these models is the functional dependence between the probability of detecting a group and the group's size, which is unknown, but estimable.

Estimation in Dirichlet Random Effects Models

Minjung Kyung (UF)

We develop a new Gibbs sampler for a linear mixed model with a Dirichlet process random effect term, which is easily extended to a generalized linear mixed model with a probit link function. Our Gibbs sampler exploits the properties of the multinomial and Dirichlet distribution, and is shown to be an improvement, in terms of operator norm and efficiency, over other commonly used MCMC algorithms. We also investigate methods for the estimation of the concentration parameter of the Dirichlet process, finding that maximum likelihood may not be desirable, but a posterior mode is a reasonable approach. Examples are given to show how these models perform on real data. Our results complement both the theoretical basis of the Dirichlet process nonparametric prior and the computational work that has been done to date.

Tests Based on Intrinsic Priors for the Equality of two Correlated Proportions

Guido Consonni (University of Pavia)

Correlated proportions arise in longitudinal (panel) studies. A typical example is the "opinion swing'' problem: "Has the proportion of people favoring a politician changed after his recent speech to the nation on TV?''. Since the same group of individuals is interviewed before and after the speech, the two proportions are correlated. A natural null hypothesis to be tested is whether the corresponding population proportions are equal. A standard Bayesian approach to this problem has already been considered in the literature, based on a Dirichlet prior for the cell-probabilities of the underlying two-by-two table under the alternative hypothesis, together with an induced prior under the null. In lack of specific prior information, a diffuse (e.g. uniform) distribution may be used. We claim that this approach is not satisfactory, since in a testing problem one should make sure that the prior under the alternative be adequately centered around the region specified by the null, in order to obtain a fairer comparison between the two hypotheses, especially when the data are in reasonable agreement with the null. Following an intrinsic prior methodology, we develop two strategies for the construction of a collection of objective priors increasingly peaked around the null. We provide a simple interpretation of their structure in terms of weighted imaginary sample scenarios. We illustrate our method by means of three examples, carrying out sensitivity analysis and providing comparison with existing results.

This is joint work with Luca La Rocca, University of Modena and Reggio Emilia, Italy.

Neighborhood and Environmental Factors Associated with Physical Activity During Pregnancy

Montserrat Fuentes (NCSU)

Physical activity has well-documented health benefits for cardiovascular fitness and weight control. For pregnant women, the American College of Obstetricians and Gynecologists currently recommends 30 minutes of moderate exercise on most, if not all, days; however, very few pregnant women achieve this level of activity. Epidemiologists, policy makers, and city planners are interested in whether characteristics of the physical environment in which women live and work have influence on physical activity levels during pregnancy and in the postpartum period. In this paper we study the associations between physical activity and several factors including personal characteristics, meteorological/air quality variables, and neighborhood characteristics in pregnant women in four counties of North Carolina. We simultaneously analyze six types of physical activity and investigate cross-dependencies between these activity types. Exploratory analysis suggests that the associations are different in different regions.

Therefore we use a multivariate regression model with spatially-varying regression coefficients. This model includes a regression parameter for each covariate at each spatial location. For our data with many predictors, some form of dimension reduction is clearly needed. We introduce a spatial Bayesian variable selection procedure to identify subsets of important variables. Our stochastic search algorithm determines the probabilities that each covariate's effect is null, non-null but constant across space, and spatially-varying.

Jointly with Brian Reich (NCSU) and Amy Herring (Biostatistics, UNC, Chapel Hill)

Practical CLT for MCMC Estimators

Aixin Tan (UF)

In Bayesian inference, it is often of interest to inspect posteriors associated with several different priors. Consider, for example, the Bayesian one-way random effects model. Two different (improper) priors are frequently recommended: the standard diffuse prior and the reference prior. Call the resulting posteriors π and π*. Posterior expectations with respect to either of these posteriors are intractable.

If the standard diffuse prior is adopted, there is a simple block Gibbs sampler that can be employed to explore π. As usual, empirical averages can be used to estimate intractable posterior expectations. We have developed a regeneration-based CLT that yields simple, asymptotically valid standard errors for the estimators. The posterior π* is more complicated. Thus, instead of constructing and studying a different MCMC algorithm for π*, we propose using importance sampling estimators based on the block Gibbs sampler for π. We have extended the existing regeneration theory to handle these importance sampling estimators and this yields simple, asymptotically valid standard errors for the importance sampling estimators.

Successful application of our regeneration method rests on the assumption that the block Gibbs sampler for π converges to its stationary distribution at a geometric rate. We have proven that, unless the data set is extremely small and unbalanced, the block Gibbs Markov chain enjoys the desired property.

On parametric bootstrap methods in multi-level models with applications in small area estimation and related problems

Partha Lahiri (UMD)

There is a growing demand to assess socio-economic and health status at both national and sub-national levels from complex survey data. However, data availability at the sub-national (small area) level from a survey is limited by cost and thus analysts must make the best possible use of all available information. Over the last three decades, multi-level models have been frequently used in a wide range of small area applications. Such models offer great flexibilities in combining information from various sources and thus are well suited for solving most small area estimation problems. To estimate small area characteristics, empirical best prediction (EBP) methods are routinely used. One major challenge for this approach is the estimation of an accurate mean squared prediction error (MSPE) of EBP that captures all sources of variations. Different parametric bootstrap methods have been proposed in the literature to solve this problem. But, the basic requirements of second-order unbiasedness and non-negativity of the MSPE estimator of an empirical best predictor (EBP) have led to different complex analytical adjustments to the naïve parametric bootstrap technique for small area estimation. In this talk, we show a way to recover the basic simplicity in the parametric bootstrap method, i.e. replacement of laborious analytical calculations by computer-oriented simple techniques, without sacrificing the basic requirements in an MSPE estimator. The method works for a general class of multi-level models and different techniques of parameter estimation.

The talk is based on my joint work with Professor S. Chatterjee, University of Minnesota.

Missing Data Problems in Public Health

Nan Laird (Harvard)

Missing data arise commonly in many studies in Public Health and Medicine. Here I will review some cases where statistical methodology has contributed to our ability to overcome limitations in technology and sampling and produce good inferences with imperfect data. We will discuss examples from estimation of radon levels from diffusion battery data, longitudinal cohort studies, and testing for genetic effects with family data.

Testing Gene-Environment Interactions with Nuclear Family Data

Nan Laird (Harvard)

The widespread availability of genetic markers for samples of reasonable size has intensified interest in testing for gene-environment interactions with complex diseases. Both traditional case-control and family-based designs are used in genetic association studies, the latter having the advantage of eliminating problems due to population substructure, as well as sensitivity to modeling the genetic effect when testing for genetic effects alone. Here we address the issue of extending the family design to test gene-environment interactions. Robustness to population substructure can be maintained, but robustness to model specification is not. We also discuss joint-tests of gene-environment interaction which are generally more powerful, as well as completely robust to the genetic model and population substructure.

Passion-Driven Statistics

Robert Easterling (Sandia)

Archie Bunker once told his son-in-law, “Don’t give me no stastistics (sic), Meathead! I want facts!” What he was saying was: We statisticians get our kicks from stastistics (i.e., the technical aspects of statistical data analysis), while our clients and collaborators are turned on by the facts (the subject-matter insights provided by data). We create Archie’s impression of a difference between statistics and facts early on by lifeless, sometimes clueless, textbook examples that seem aimed only at teaching formula plug-in; there are no apparent or interesting facts either driving the investigation or revealed by the analysis. If we want people to be passionate (and intelligent) about the use of statistical methods in their work, we need, from the beginning, to connect their enthusiasm for their chosen fields to an appreciation of statistical methods that will help them learn more about their world. This importance of recognizing the importance of subject-matter passion also pertains to our collaborative and consulting work in business and industry. Examples will be given along with an overview of the problem of statistical validation of computer models of physical processes and phenomena.


Past Seminars

Spring 2008 Fall 2007
Spring 2007 Fall 2006 Spring 2006 Fall 2005
Spring 2005 Fall 2004 Spring 2004 Fall 2003
Spring 2003 Fall 2002 Spring 2002 Fall 2001
Spring 2001 Fall 2000 Spring 2000 Fall 1999