Galin Jones, University of Florida

Honest Exploration of Intractable Posteriors via Markov chain Monte Carlo Algorithms

Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is applied are (Q1) What is an appropriate burn-in? and (Q2) How long should the sampling continue after burn-in? Developing rigorous answers to these questions requires one to study the convergence properties of the induced Markov chain. Specifically, this requires one to establish drift and minorization conditions. I will motivate the use of these conditions and their connection to the development of rigorous answers to (Q1) and (Q2). Then I use a simple Gibbs sampler to illustrate the required calculations. Finally, I present the results of an application of these techniques in the context of a realistic example. The hope is that this work will serve as a bridge between those developing Markov chain theory and practitioners who would like rigorous answers to (Q1) and (Q2) for their particular MCMC algorithms.
Seminar page

Marty Wells, Cornell University

Categorical Data Analysis Using Algebra

Seminar page

Alan Hutson, University of Florida

Data Analysis Problems That Aren't Assigned For Homework

This talk is geared towards graduate and undergraduate students. I will present aspects of seven somewhat interesting biostatistical consulting problems that I have dealt with over the past year. The problems range from pediatric AIDS to nicotine inhalers to malaria in Ghana. I will not present complicated models or formulas. The goal is to get students wondering "how would I do that?"
Seminar page

James Kepner, University of Florida

On Exact Multi-Stage Designs for Detecting an Increase in a Binomial Proportion

A common statistical application is to show that a treatment causes a binomial proportion to be increased. That is, it is of interest to test $H_0: p = p_0$ versus $H_1: p > p_0$ where $P_0$ is known. The required sample size depends on the magnitude of the Type I and Type II errors that the researchers are willing to tolerate. A computationally efficient algorithm for determining the sample size and the rejection region for the exact binomial test is briefly presented. This algorithm is used as a starting point for finding 2-, 3-, and $k$-stage designs, including rejection, acceptance, and indecision regions, for exact tests for detecting $H_1$, subject to certain optimality criteria. A web-based application is presented to help a user determine such designs. A theorem and its remarkable consequences are discussed. The theorem shows that under very general conditions it is possible to use exact $k$-stage designs, $k>1$ , whose maximum total sample size is no larger than what is required for the exact single stage test without violating the Type I and II error constraints used to find the exact single-stage design. This means that it is generally possible to perform exact interim analyses when conducting a one-sided test about a binomial proportion with no more total observations than required for the exact 1-stage test, while maintaining the desired size and power constraints for the 1-stage test.
Seminar page

Malay Ghosh, University of Florida

The Behrens-Fisher Problem Revisited: A Bayes-Frequentist Synthesis

The Behrens-Fisher problem involves inference about the difference of two normal means when the ratio of the two variances is unknown. In this example, the fiducial interval of Fisher differs drastically from the Neyman-Pearson confidence interval. The frequentist confidence interval proposed by Welch and Aspin has been found to be unsatisfactory from a conditional frequentist perspective, while Fisher's fiducial interval has been criticized by classical frequentists. A prior proposed by Jeffreys leads to a credible interval which is algebraically equivalent to Fisher's fiducial interval, though necessarily, the two approaches carry different interpretations. We propose an alternative prior such that the coverage probability of the credible interval for the difference of the two means matches asymptotically the corresponding frequentist coverage probability more accurately than Jeffreys' prior. Our simulation results indicate excellent matching for small and moderate samples as well. The prior is also justified from the conditional frequentist perspective.
Seminar page

Brent Coull, Harvard University

Crossed Random Effect Models for Multiple Outcomes in a Study of Teratogenesis

Studies that investigate the effect of human teratogens on fetal development typically record the presence or absence of a multitude of birth defects for each infant, resulting in data of multivariate binary form. Such studies typically have three objectives: (1) estimate an overall effect of exposure across outcomes, (2) identify subjects affected by exposure, and (3) identify those outcomes that constitute the syndrome so that these can be used as diagnostic tools during future physical examinations. We propose the use of a logistic regression model with crossed random effect structure to address all three questions simultaneously. Special cases of the model refer to order-restricted exposure effects, exposure effects clustered according to location (face, head, hands, feet or body), and estimation of exposure effects via the lasso. We use the proposed models to analyze data from a study investigating the effects of in utero antiepileptic drug exposure on fetal development. (This is joint work with Jim Hobert, Louise Ryan, and Lewis Holmes.)
Seminar page

Andre Khuri, University of Florida

Graphical Procedures for ANOVA

This seminar discusses the use of graphical techniques in a variety of situations in analysis of variance and designs for variance components estimation. Two types of graphs will be presented, namely, (1) Graphs that can be used to assess the adequacy of the method of unweighted means in providing approximate F-tests for unbalanced random models. In particular, the graphs can be effectively utilized to determine the effects of imbalance and the variance components on the adequacy of the chi-squared approximation of the distribution of the unweighted sums of squares. (2) Quantile Dispersion Graphs for the evaluation and comparison of designs for the estimation of variance components in an unbalanced random model.
Seminar page

Brian Caffo, University of Florida

A Markov Chain Monte Carlo Algorithm For Approximating Exact Conditional Probabilities

Conditional inference eliminates nuisance parameters by conditioning on their sufficient statistics. For contingency tables conditional inference entails enumerating all tables with the same sufficient statistics as the observed data. For moderately sized tables and/or complex models the computing time to enumerate these tables is often prohibitive. Monte Carlo approximations offer a viable alternative provided it is possible to obtain samples from the correct conditional distribution. This talk presents an MCMC extension of the importance sampling algorithm of Booth and Butler 1999 by utilizing their rounded normal candidate to update randomly chosen cells while leaving the remainder of the table fixed. This local approximation can greatly increase the efficiency of the rounded normal candidate. By choosing the number of cells to be updated at random, a balance is struck between dependency in the Markov chain and accuracy of the candidate.
Seminar page

Sam Wu, University of Florida

Optimal Sequential Allocation with Imperfect Feedback Information

A given number of bullets can be fired in an attempt to destroy a fixed number of targets. The probability of successfully destroying a target at each shot is known. The bullets will be fired in sequence and, after each shot is fired, there is a report on the state for the target just fired at. The reports are subject to the usual two types of errors: falsely claiming an intact target as destroyed and falsely claiming destroyed target as intact. The probabilities of these two types of errors are also known. The goal is to destroy as many targets as possible. This paper shows that the myopic decision strategy that picks the next target to be the one with the highest intact posterior probability is the optimal strategy. This strategy is also optimal if the criterion is to maximize the probability of destroying all targets, or a weighted sum of the destroyed targets when the targets are weighted by their importance. (This is joint work with Mo, Chen and Yang.)
Seminar page

Herwig Friedl, University of Technology Graz

Computational Aspects of Quadrature Methods in Random Effects Models

Overdispersion in Generalized Linear Models is often interpreted as evidence that there are other factors varying that are not accounted for in the model, but which are associated with the response. A simple way of representing the extra variation is through an unobserved random effect in the linear predictor. The EM algorithm is a standard procedure to calculate the Maximum Likelihood Estimate (MLE). Its objective function is often an intractable integral that cannot be maximized directly. However, quadrature methods can be used to derive an approximation. While it is already tradition to apply Gaussian Quadrature for normal random effects, the Non-Parametric Maximum Likelihood Estimator (NPMLE), which is a direct extension of Gaussian Quadrature, has not gained wide acceptance. It will be shown that the NPMLE can be easily computed using a simple weighted ML estimation on an artificially enlarged data set. Any statistical software that allows a weighted maximum likelihood fit can be used for NPMLE. Further applications of this technique include variance component modelling and random coefficient models. Various real data examples will be presented showing the efficiency of the proposed NPMLE procedure.
Seminar page