Weekly Seminars
Seminars are held on Thursdays from 4:00 p.m.  5:00 p.m. in GriffinFloyd 100 (unless otherwise noted).
Refreshments are available before the seminars from 3:30 p.m.  4:00 p.m. in GriffinFloyd Hall 230.
Date  Speaker  Comments  

Sep 13  George Casella (UF)  
Sep 20   
no seminar this week 
linux tutorial at this time 
Sep 27  Mark Yang (UF)  
Oct 4  Andrew Lawson (USC)  Latent Structure in Spacetime Disease Maps: links to environmental hazard events 

Oct 11  Bruce Turnbull (Cornell)  
Oct 18  Jing Cheng (UF)  Estimation and Inference for the Causal Effect of Receiving Treatment on a Multinomial Outcome  
Oct 26  Jinfeng Zhang (FSU)  UF/FSU seminar 

Nov 1  Vivekananda Roy (UF)  Convergence rates and asymptotic standard errors for MCMC algorithms for Bayesian probit regression  
Nov 8  Jie Yang (UF)  Nonparametric Funtional Mapping of Quantitative Trait Loci  
Nov 13  Lawrence Brown (UPenn)  Challis Lecture 

Nov 14  Lawrence Brown (UPenn)  Challis Lecture Wednesday 

Nov 29  Gauri Datta (UGA) 
Testing for Clusters with Applications in Maize Genetics 
George Casella (UF) The existence of clusters in maize data suggests genetic segregation and could lead to the discovery of new strains. If clustering is suspected, the sample is then subjected to a more detailed, time consuming and expensive, analysis. Thus, there is need for a good screening test for the existence of clusters. We model this as a hypothesis test, and attempt to compute a Bayes factor to test the null hypothesis of no clusters. This leads us down a
path of combinatorics involving Stirling numbers, and the derivation of
an algorithm to generate random partitions in a constrained cluster
space. Ultimately, we can calculate the Bayes factor using an
importance sampler implemented with a hybrid MetropolisHastings algorithm. 
An optimal DNA pooling strategy for progressive fine genetic mapping 
Mark Yang (UF) A basic strategy to optimize DNA pooling for expense reduction while maintaining a desirable resolution in genetic fine mapping is developed. Although any genotyping en masse will lead to loss of estimation accuracy, the loss may be compensated for by a small increase in sample size, suggesting that we can reduce the overall experimental expense by balancing costs for genotyping and sample collection. Within a certain range that depends on the parameter scenario, a pooling scheme can reach nearly the same precision as individual genotyping by just increasing a few percent in the number of individuals, whereas reducing the genotyping effort to only a small fraction of what is needed for genotyping without pooling. Further, the genotyping burden can be reduced through multistage pooling. Numerical results are provided for practical guidelines. Finally, we use a set of genetic data on mapping the rice xl(t) gene to demonstrate the feasibility and efficiency of the DNA pooling strategy. Taken together, the results demonstrate that this DNA pooling strategy can greatly reduce the genotyping burden and the overall cost in fine mapping experiments. 
Latent Structure in Spacetime Disease Maps: links to environmental hazard events 
Andrew Lawson (USC) In the assessment of the linkage between environmental risk gradients and health outcomes there is often a need to consider the possibility that risk is multifaceted. Many models for disease risk in the spatial or spatiotemporal domain regard the map as being defined by global parameters with single underlying components. However it is commonly true that unknown risk structures (multiple risk gradients, or population sub groups) could interweave to result in a single realisation of disease. In this talk I will first discuss the context of spatiotemporal modeling of disease. I will then describe two Bayesian approaches to the analysis of latent structure in spacetime disease maps: 1) time dependent latent mixture components with spatial weights; 2) covariate spline interaction models. A comparison will be made with a standard Bayesian ST model. 
Adaptive and NonAdaptive Group Sequential Trial Designs 
Bruce Turnbull (Cornell) Methods have been proposed to redesign a clinical trial at an interim stage in order to increase power. This may be in response to external factors which indicate power should be sought at a smaller effect size, or it could be a reaction to data observed in the study itself. In order
to preserve the type I error rate, methods for unplanned design change
have to be defined in terms of nonsufficient statistics and this calls
into question their efficiency and the credibility of conclusions reached. We evaluate schemes for adaptive redesign,
assessing the possible benefits of preplanned adaptive designs by numerical computation of optimal tests; these optimal
adaptive designs are concrete examples of optimal sequentially planned
sequential tests proposed by Schmitz (1993). We conclude that the
flexibility of unplanned adaptive designs comes at a price and we
recommend the appropriate power for a study should be determined as
thoroughly as possible at the outset. Then, standard error spending tests, possibly with unevenly spaced analyses, provide efficient designs.
However it is still possible to fall back on flexible methods for redesign
should study objectives change unexpectedly once the trial is under way. (This is joint work with Chris Jennison.) 
Estimation and Inference for the Causal Effect of Receiving Treatment on a Multinomial Outcome 
Jing Cheng (UF) Noncompliance is a common problem in randomized clinical trials. When there is noncompliance, there is often interest in estimating the causal effect of actually receiving the treatment compared to receiving the control. There is a large literature on methods of analysis for clinical trials with noncompliance which have continuous or binary outcomes. Clinical trials with multinomial outcomes are common in practice but not much attention has been paid to analyzing these trials with noncompliance. This talk considers the analysis of these trials. We first define the causal effect in these trials and estimate it with the likelihood method, and then conduct inference based on the likelihood ratio test. The likelihood ratio statistic follows a chisquared distribution asymptotically when the true values of parameters are in the interior of the parameter space under the null. However, when the true values of parameters are on the boundary of the parameter space, the likelihood ratio statistic does not have an asymptotic chisquared distribution. Therefore, we propose a bootstrap/double bootstrap version of a likelihood ratio test for the causal effect in these trials. The methods are illustrated by an analysis of data from a randomized trial of an encouragement intervention to improve adherence to prescribed depression treatments among depressed elderly patients in primary care practices. 
Novel Monte Carlo Methods for Protein Structure Modeling 
Jinfeng Zhang (FSU) Proteins are the machinery of life and are involved in almost all biological processes. A protein functions through a well defined threedimensional structure. The next step after genome sequencing projects is to understand the sequence and structure relationship of proteins or genes encoded in the genomes. Such understanding will fundamentally advance biology and public health. The protein structure modeling problem has been formulated as first defining a free energy function for protein structures, which follow a Boltzmann distribution, and then designing a sampling method so that protein structures can be sampled from the distribution. Both defining the energy function and designing sampling methods are very challenging. We developed new Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) methods for studying protein structures. The MCMC method was tested on a simplified protein folding model, HP model. Finding minimum energies for HP sequences is still an open problem with a 20year history. The new method significantly outperformed all previously reported methods. SMC was applied to estimate entropy and free energy of ensemble protein structures, one of the most challenging problems in computational chemistry and biophysics. I will present some interesting findings discovered through sampled structures using the method. 
Convergence rates and asymptotic standard errors for MCMC algorithms for Bayesian probit regression 
Vivekananda Roy (UF)
Consider a probit regression problem in which Y_{1},. . . ,Y_{n} are independent Bernoulli random variables such that Pr(Y_{i}=1) = Φ(x_{i}^{T} β) where x_{i} is a pdimensional vector of known covariates associated with Y_{i}, β is a pdimensional vector of unknown regression coefficients and Φ(·) denotes the standard normal distribution function. We study Markov chain Monte Carlo algorithms for exploring the intractable posterior density that results when the probit regression likelihood is combined with a flat prior on β. We prove that Albert and Chib's (1993) data augmentation algorithm and Liu and Wu's (1999) PXDA algorithm both converge at a geometric rate, which ensures the existence of central limit theorems (CLTs) for ergodic averages under a second moment condition. While these two algorithms are essentially equivalent in terms of computational complexity, from Hobert and Marchev (2007) it follows that the PXDA algorithm is theoretically more efficient in the sense that the asymptotic variance in the CLT under the PXDA algorithm is no larger than that under Albert and Chib's algorithm. We calculate the standard error of the ergodic averages using regenerative simulation. As an illustration, we apply our results to ven Dyk's and Meng's (2001) lupus data. This particular example demonstrates that huge gains in efficiency are possible by using the PXDA algorithm instead of Albert and Chib's algorithm.

Nonparametric Funtional Mapping of Quantitative Trait Loci 
Jie Yang (UF)
Functional mapping is a powerful tool for mapping quantitative trait loci (QTLs) that control dynamic traits. This method incorporates mathematical aspects of biological processes into the mixture modelbased likelihood setting for QTL mapping, thus increasing the power of QTL detection and the precision of parameter estimation. However, in many situations there is no obvious functional form and, in such cases, this strategy will not be optimal. Here we propose to use nonparametric function estimation, typically implemented with Bsplines, to estimate the underlying functional form of phenotypic trajectories, and then construct a nonparametric test to find evidence of existing quantitative trait loci. Using the representation of a nonparametric regression as a mixed model, the final test statistic is a likelihood ratio test. We consider two types of genetic maps: dense maps and general maps, and the power of nonparametric functional mapping is investigated through simulation studies and demonstrated by examples.

InSeason Prediction of Batting Averages: A Fieldtest of Simple Empirical Bayes and Bayes Methodologies 
Lawrence Brown (UPenn) Batting average is one of the principal performance measures for an individual baseball player. It has a simple numerical structure as the percentage of successful attempts, “Hits”, as a proportion of the total number of qualifying attempts, “AtBats”. This situation, with Hits as a number of successes within a qualifying number of attempts, makes it natural to statistically model each player’s batting average as a binomial variable outcome, with a given value of AB_{i} and a true (but unknown) value of p_{i} that represents the player’s latent ability. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications. We will look at batting records for every Major League player over the course of a single season (2005). The primary focus is on using only the batting record from an earlier part of the season (e.g., the first 3 months) in order to predict the batter’s latent ability, p_{i}, and consequently to predict their battingaverage performance for the remainder of the season. Since we are using a season that has already concluded, we can validate our predictive performance by comparing the predicted values to the actual values for the remainder of the season. The methodological purpose of this study is to gain experience with a variety of predictive methods applicable to a much wider range of situations. Several of the methods to be investigated derive from empirical Bayes and hierarchical Bayes interpretations. Although the general ideas behind these techniques have been understood for many decades*, some of these methods have only been refined relatively recently in a manner that promises to more accurately fit data such as that at hand. One feature of all of the statistical methodologies here is the preliminary use of a particular form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables with known variances. This transformation technique is also useful in validating the binomial model assumption that is the conceptual basis for all our analyses. * A particularly relevant background reference is Efron, B. and Morris, C. (1977) Stein’s paradox in statistics” Scientific American 236 119127, and the earlier, more technical version (1975), “Data analysis using Stein’s estimator and its generalizations” Jour. Amer. Stat. Assoc. 70 311319. 
Nonparametric Density Estimation via the RootUnroot Transform; with an Adaptive Wavelet Block Threshholding Implementation 
Lawrence Brown (U Penn) Nonparametric density estimation has traditionally been treated separately from nonparametric regression. Here, we propose an approach that first transforms a density estimation problem into a nonparametric regression problem. The algorithm for this involves suitably binning the observations and then transforming the binned data counts via a carefully chosen squareroot transformation. Then any suitable nonparametric regression procedure can be used. Here, a wavelet blockthreshholding rule is used for the transformed regression problem, and this produces an estimated nonparametric regression function. Finally an adjusted unroot transform is applied to yield the final nonparametric density estimator. The procedure is easy to implement. It enjoys a high degree of asymptotic adaptivity and is shown in numerical examples to perform well for standard density estimation settings. As time permits, we will also discuss a corresponding procedure to produce confidence bands to accompany the nonparametric regression and density estimators. 
Model selection for count data: Poisson or ZIP? 
Gauri Datta (UGA) Count data are often encountered in many studies, most frequently in disease modeling. The Poisson distribution, which is usually adopted to describe a model for such dataset, sometimes does not work well in presence of many zeros in the data. To account for excessive zeros in count data, a zeroinflated Poisson (ZIP) distribution is suggested in the literature. A ZIP distribution is a mixture of a standard Poisson distribution and a degenerate distribution at zero, with a mixing probability p. The ZIP distribution has been used both for independent and identically distributed (i.i.d.) observations and in noni.i.d. case where suitable auxiliary variables are available to model the mean. In the latter case, which is referred to as a ZIP regression model, each count is assumed to have a different distribution depending on some explanatory variable(s) and suitable generalized linear models are fitted to the Poisson parameter and/or to the mixing probability Although there are a number of frequentist papers discussing statistical inference for such models, Bayesian contribution to this problem is limited. In this talk, we propose two Bayesian solutions to this problem. In our first solution, treating it as a model selection problem, we rewrite the ZIP model as a mixture of a zerotruncated Poisson distribution and a degenerate distribution at zero. We justify an objective prior for the new parameters. Using this prior and the standard Jeffreys' prior for the Poisson mean we obtain the Bayes factor for the ZIP model versus the standard Poisson model. In our second approach, in the i.i.d. setup we embed the ZIP model into a larger class of models by suitably extending the parameter space. Our Bayesian test depends on the posterior probability of the hypothesis of zero inflation. Some applications of both solutions and suitable extension to the regression case will be discussed. 
Past Seminars
Spring 2007  Fall 2006  Spring 2006  Fall 2005 
Spring 2005  Fall 2004  Spring 2004  Fall 2003 
Spring 2003  Fall 2002  Spring 2002  Fall 2001 
Spring 2001  Fall 2000  Spring 2000  Fall 1999 