Honest Exploration of Intractable Posteriors via Markov chain
Monte Carlo Algorithms
Two important questions that must be
answered whenever a Markov chain Monte Carlo (MCMC) algorithm is
applied are (Q1) What is an appropriate burn-in? and (Q2) How long
should the sampling continue after burn-in? Developing rigorous
answers to these questions requires one to study the convergence
properties of the induced Markov chain. Specifically, this requires
one to establish drift and minorization conditions. I will motivate
the use of these conditions and their connection to the development of
rigorous answers to (Q1) and (Q2). Then I use a simple Gibbs sampler
to illustrate the required calculations. Finally, I present the
results of an application of these techniques in the context of a
realistic example. The hope is that this work will serve as a bridge
between those developing Markov chain theory and practitioners who
would like rigorous answers to (Q1) and (Q2) for their particular MCMC
algorithms. Seminar page
Categorical Data Analysis Using Algebra
Seminar page
Data Analysis Problems That Aren't Assigned For Homework
This
talk is geared towards graduate and undergraduate students. I will
present aspects of seven somewhat interesting biostatistical
consulting problems that I have dealt with over the past year. The
problems range from pediatric AIDS to nicotine inhalers to malaria in
Ghana. I will not present complicated models or formulas. The goal
is to get students wondering "how would I do that?" Seminar page
On Exact Multi-Stage Designs for Detecting an Increase in a
Binomial Proportion
A common statistical application is to show
that a treatment causes a binomial proportion to be increased. That
is, it is of interest to test $H_0: p = p_0$ versus $H_1: p > p_0$
where $P_0$ is known. The required sample size depends on the
magnitude of the Type I and Type II errors that the researchers are
willing to tolerate. A computationally efficient algorithm for
determining the sample size and the rejection region for the exact
binomial test is briefly presented. This algorithm is used as a
starting point for finding 2-, 3-, and $k$-stage designs, including
rejection, acceptance, and indecision regions, for exact tests for
detecting $H_1$, subject to certain optimality criteria. A web-based
application is presented to help a user determine such designs. A
theorem and its remarkable consequences are discussed. The theorem
shows that under very general conditions it is possible to use exact
$k$-stage designs, $k>1$ , whose maximum total sample size is no
larger than what is required for the exact single stage test without
violating the Type I and II error constraints used to find the exact
single-stage design. This means that it is generally possible to
perform exact interim analyses when conducting a one-sided test about
a binomial proportion with no more total observations than required
for the exact 1-stage test, while maintaining the desired size and
power constraints for the 1-stage test. Seminar page
The Behrens-Fisher Problem Revisited: A Bayes-Frequentist
Synthesis
The Behrens-Fisher problem involves inference about the
difference of two normal means when the ratio of the two variances is
unknown. In this example, the fiducial interval of Fisher differs
drastically from the Neyman-Pearson confidence interval. The
frequentist confidence interval proposed by Welch and Aspin has been
found to be unsatisfactory from a conditional frequentist perspective,
while Fisher's fiducial interval has been criticized by classical
frequentists. A prior proposed by Jeffreys leads to a credible
interval which is algebraically equivalent to Fisher's fiducial
interval, though necessarily, the two approaches carry different
interpretations. We propose an alternative prior such that the
coverage probability of the credible interval for the difference of
the two means matches asymptotically the corresponding frequentist
coverage probability more accurately than Jeffreys' prior. Our
simulation results indicate excellent matching for small and moderate
samples as well. The prior is also justified from the conditional
frequentist perspective. Seminar page
Crossed Random Effect Models for Multiple Outcomes in a Study of
Teratogenesis
Studies that investigate the effect of human
teratogens on fetal development typically record the presence or
absence of a multitude of birth defects for each infant, resulting in
data of multivariate binary form. Such studies typically have three
objectives: (1) estimate an overall effect of exposure across
outcomes, (2) identify subjects affected by exposure, and (3) identify
those outcomes that constitute the syndrome so that these can be used
as diagnostic tools during future physical examinations. We propose
the use of a logistic regression model with crossed random effect
structure to address all three questions simultaneously. Special
cases of the model refer to order-restricted exposure effects,
exposure effects clustered according to location (face, head, hands,
feet or body), and estimation of exposure effects via the lasso. We
use the proposed models to analyze data from a study investigating the
effects of in utero antiepileptic drug exposure on fetal development.
(This is joint work with Jim Hobert, Louise Ryan, and Lewis Holmes.) Seminar page
Graphical Procedures for ANOVA
This seminar discusses the use
of graphical techniques in a variety of situations in analysis of
variance and designs for variance components estimation. Two types of
graphs will be presented, namely, (1) Graphs that can be used to
assess the adequacy of the method of unweighted means in providing
approximate F-tests for unbalanced random models. In particular, the
graphs can be effectively utilized to determine the effects of
imbalance and the variance components on the adequacy of the
chi-squared approximation of the distribution of the unweighted sums
of squares. (2) Quantile Dispersion Graphs for the evaluation and
comparison of designs for the estimation of variance components in an
unbalanced random model. Seminar
page
A Markov Chain Monte Carlo Algorithm For Approximating Exact
Conditional Probabilities
Conditional inference eliminates
nuisance parameters by conditioning on their sufficient
statistics. For contingency tables conditional inference entails
enumerating all tables with the same sufficient statistics as the
observed data. For moderately sized tables and/or complex models the
computing time to enumerate these tables is often prohibitive. Monte
Carlo approximations offer a viable alternative provided it is
possible to obtain samples from the correct conditional
distribution. This talk presents an MCMC extension of the importance
sampling algorithm of Booth and Butler 1999 by utilizing their rounded
normal candidate to update randomly chosen cells while leaving the
remainder of the table fixed. This local approximation can greatly
increase the efficiency of the rounded normal candidate. By choosing
the number of cells to be updated at random, a balance is struck
between dependency in the Markov chain and accuracy of the candidate.
Seminar page
Optimal Sequential Allocation with Imperfect Feedback
Information
A given number of bullets can be fired in an attempt
to destroy a fixed number of targets. The probability of successfully
destroying a target at each shot is known. The bullets will be fired
in sequence and, after each shot is fired, there is a report on the
state for the target just fired at. The reports are subject to the
usual two types of errors: falsely claiming an intact target as
destroyed and falsely claiming destroyed target as intact. The
probabilities of these two types of errors are also known. The goal is
to destroy as many targets as possible. This paper shows that the
myopic decision strategy that picks the next target to be the one with
the highest intact posterior probability is the optimal strategy. This
strategy is also optimal if the criterion is to maximize the
probability of destroying all targets, or a weighted sum of the
destroyed targets when the targets are weighted by their importance.
(This is joint work with Mo, Chen and Yang.) Seminar page
Computational Aspects of Quadrature Methods in Random Effects
Models
Overdispersion in Generalized Linear Models is often
interpreted as evidence that there are other factors varying that are
not accounted for in the model, but which are associated with the
response. A simple way of representing the extra variation is through
an unobserved random effect in the linear predictor. The EM algorithm
is a standard procedure to calculate the Maximum Likelihood Estimate
(MLE). Its objective function is often an intractable integral that
cannot be maximized directly. However, quadrature methods can be used
to derive an approximation. While it is already tradition to apply
Gaussian Quadrature for normal random effects, the Non-Parametric
Maximum Likelihood Estimator (NPMLE), which is a direct extension of
Gaussian Quadrature, has not gained wide acceptance. It will be
shown that the NPMLE can be easily computed using a simple weighted ML
estimation on an artificially enlarged data set. Any statistical
software that allows a weighted maximum likelihood fit can be used for
NPMLE. Further applications of this technique include variance
component modelling and random coefficient models. Various real data
examples will be presented showing the efficiency of the proposed
NPMLE procedure. Seminar page