UF Statistics Seminar Schedule
Seminars are held on Thursdays from 4:00 p.m.  5:00 p.m. in GriffinFloyd 100 unless otherwise noted.
Refreshments are available before the seminars from 3:30 p.m.  4:00 p.m. in GriffinFloyd Hall 230.
Spring 2009
Abstracts
Inference for Quantitation Parameters in Polymerase Chain Reactions via Branching Processes with Random Effects 
Bret Hanlon (Cornell) Quantitative polymerase chain reaction (qPCR) is one of the most widely used tools for gene quantification. Most of the existing
methods for analyzing qPCR data fail to account for the sources of
variability present in the PCR dynamics. In this talk, I develop a
branching process model with random effects to account for this
variability, and describe a new inferential procedure for the
quantitation parameters. I illustrate the effectiveness of my
methods using both simulated and experimental data. (This talk is
based on a joint work with Professor Anand Vidyashankar, Cornell
University).

Perfect simulation of Matérn Type III point processes 
Mark Huber (Duke) Spatial data are often more widely separated than would be expected if the points were independently placed. Such data can be modeled with repulsive point processes, where the points appear as if they are repelling one another. Various models have been created to deal with this phenomenon. Matérn created three procedures that generate repulsive processes. While the third type allows the most flexibility in modeling, Matérn was unable to resolve the high dimensional integrations needed to utilize the process for inference. In this talk, I will show how to build an algorithm for using Matérn Type III processes that can be used to approximate the likelihood and posterior values for data. First, a Metropolis Markov chain is created using a secondary Poisson process. Next, this chain is used together with bounding chains to obtain perfect draws from the stationary distribution of the chain. Finally, a product estimator is constructed (again using a secondary Poisson process) in order to obtain approximations with provably good error bounds.

Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression 
Bo Kai (Penn State) Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. Although least squares method is a popular and convenient choice in local polynomial regression, the performance can be adversely influenced by departure from normality as well as existence of outliers. Motivated by these concerns, we propose a new nonparametric regression technique that is highly efficient, robust and computationally simple. Sampling properties of the proposed estimation procedure are studied. We derive the asymptotic bias, variance and normality of the proposed estimate. Asymptotic relative efficiency of the proposed estimate with respect to the local polynomial regression is investigated. It is shown that the proposed estimate can be much more efficient than the local polynomial regression estimate for various nonnormal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the proposed estimates. The simulation results are consistent with our theoretic findings. A real data example is used to illustrate the proposed method. This is joint work with Runze Li and Hui Zou.

Nonparametric and distributed decision making methods for functional and spatial data 
XuanLong Nguyen (Duke) I will present two statistical methods for modeling and decisionmaking with functional and spatial data. The first method is concerned with a class of nonparametric labeling processes for clustering and partitioning curves and surfaces. The labeling process provides a flexible prior for a hierarchical latent variable model, which posits that a collection of curves can be described in terms of a number of typical canonical behaviors, whereas the canonical curve allocation is driven by the latent labeling process. The spatial dependence of labels is driven by the use of a latent Gaussian process, while label sharing across curve collection is enabled by the use of a Dirichlet process. A variational Bayesian inference method is proposed to obtain approximate posterior distribution update embedded within an MCMC algorithm for model fitting. In the variational methodology, posterior inference can be implicitly viewed as optimization over the space of posterior distributions. This view allows us to modify the underlying optimization (e.g., the loss function and/or the search space) to obtain more tractable and robust inference. Our method is illustrated by two specific applications, one arising from the analysis of hormone data, and another arising from image analysis. The second method addresses the issue of efficient statistical inference that arises in a distributed data collection and processing system. This problem is motivated by an application of distributed detection in a wireless sensor network. Here the goal is to infer about the local decision rules at individual sensors, as well as the global decision at the base station, so as to minimize a predictive error criterion (e.g., 01 loss). I will present a theory of equivalent surrogate loss functions using a link between loss functions and informationtheoretic divergence functionals.
This theory allows us to place a range of wellknown but somewhat heuristic methods in the signal processing literature in a firm statistical decisiontheoretic framework. Moreover, it allows us to derive an efficient joint estimation method for both global and local decision functions by considering convex surrogate loss functions that are equivalent to the 01 loss.

Penalized Regression, Standard Errors, and Bayesian Lassos 
Minjung Kyung (UF) Penalized regression methods for simultaneous variable selection and coefficient estimation, especially those based on the lasso of Tibshirani (1996), have received a great deal of attention in recent years, mostly through frequentist models. Properties such as consistency have been studied, and are achieved by different lasso variations. Here we look at a fully Bayesian formulation of the problem, which is flexible enough to encompass most versions of the lasso that have been previously considered. The advantages of the hierarchical Bayesian formulations are many. In addition to the usual easeofinterpretation of hierarchical models, the Bayesian formulation produces valid standard errors (which can be problematic for the frequentist lasso), and is based on a geometrically ergodic Markov chain. We compare the performance of the Bayesian lassos to their frequentist counterparts using simulations and data sets that previous lasso papers have used, and see that in terms of prediction mean squared error, the Bayesian lasso performance is similar to and, in some cases, better than, the frequentist lasso.

Inference in Gaussian covariance graph models 
Kshitij Khare (Stanford) Covariance estimation in highdimensional settings has recently received widespread attention. In this context, graphical models (where dependencies between variables are represented by means of a graph) can act as a tool for regularization and have proven to be useful for the analysis of high dimensional data. A subclass of graphical models, known as Gaussian covariance graph models, encode marginal independence among random variables by means of a graph G. These are distinctly different from the traditional concentration graph models (often referred to as covariance selection models). Inference for these models is challenging both in the frequentist and Bayesian frameworks, since the models give rise to a curved exponential family. Maximum likelihood estimation for these models has received much attention recently but is not in general possible when the sample size is smaller than the dimension of the problem. In this talk, we address the issue of Bayesian inference for these
models. Sine we are now in a curved setting, the DiaconisYlvisaker theory
is no longer applicable, hence the standard Wishart distributions or those
proposed for concentration graph models are not useful for analyzing
covariance graph models. We propose a rich family of Wishart
distributions which act as a conjugate family of priors for our
class of models. By studying the appropriate conditional distributions, we
derive a block Gibbs sampling procedure to sample from these distributions,
and rigorosly prove convergence of the block Gibbs sampler. We also
present various useful theoretical properties of this class of distributions,
which enable Bayesian inference in high dimensions. Our techniques
will be illustrated using simulated and real examples.

Nonparametric Inference of Quantile Curves for Nonstationary Time Series 
Zhou Zhou (Chicago) Nowadays nonstationary time series are frequently collected in various areas and the scientific questions involving such time series generally cannot be solved by traditional stationary time series approaches. In this talk I shall address nonparametric specification tests of quantile curves for a general class of nonstationary processes. Using Bahadur representation and Gaussian approximation results for nonstationary time series, simultaneous confidence bands and integrated squared difference tests are proposed to test various parametric forms of the quantile curves with asymptotically correct type I error rates. A wild bootstrap procedure is implemented to alleviate the problem of slow convergence of the asymptotic results. In particular, our results can be used to test the trends of extremes and variability of climate variables, an important problem in understanding climate change. An interesting example involves the analysis of the maximum speed of tropical cyclone winds. It was found that an inhomogeneous upward trend for cyclone wind speeds is pronounced at high quantile values.
However, there is no trend in the mean lifetimemaximum wind speed. This example shows the effectiveness of the quantile regression technique.

Conditional inference for assessing the statistical significance of neural spiking patterns 
Matthew Harrison (CMU)
Conditional inference has proven useful for exploratory analysis of neurophysiological point process data. I will illustrate this approach and then focus on a specific subproblem: random generation of binary matrices with margin constraints. Sequential importance sampling (SIS) is an effective technique for approximate uniform sampling of binary matrices with specified margins. I will describe how to simplify and improve existing SIS procedures using improved asymptotic enumeration and dynamic programming (DP). The DP approach is interesting because it facilitates generalizations.

IS ANYBODY OUT THERE? 
Richard Cleary (Bentley)
As a long time Associate Editor for Reviews for journals of the American Statistical Association, I began to wonder if anybody read the reviews I wrote. In the August 2005 issue of The American Statistician, I carried out an experiment to find out. We will analyze the results, which suggest nice applications of some classic problems. We will also consider how with some additional data we could generate a model of the social network of statisticians. This will be a highly interactive presentation with plenty of chances for audience participation.

Multiresponse Surface Models with Block Effects 
André Khuri (UF)
This talk considers linear multiresponse surface
models which may contain block effects that can be either fixed or
random. The effect of blocking on the estimation of the mean
responses, the prediction variancecovariance matrix, and the
determination of optimum operating conditions will be addressed.
The special case of orthogonal blocking in a multiresponse
situation will be discussed.

ϕDivergence Classes of Models for Categorical Data 
Maria Kateri (Piraeus) Modelling categorical data is viewed through an information theoretic perspective. The models are characterized by their distance from the most parsimonious model in the direction of their qualitative substance, which serves as a reference model. This way, apparently different (and often competitive) models are unified in families sharing common properties and characteristics. It can be proved that all of them measure the distance from the same reference model under the same conditions. Their only difference lies on the measure applied to express this distance. Hence their difference is not qualitative but just a scale difference and consequently if the distances are expressed in terms of a generalized measure, then a family of models can be developed, having wellknown models as its members. As such a generalized measure, we have chosen the ϕdivergence and we have built the corresponding classes of models. For example, when modeling association (i.e. departure from independence), the ϕdivergence association model for twoway tables is characterized by the property of being the closest model, in terms of ϕdivergence, to independence. Well known models in the literature on contingency tables (as association models, correlation models) are proved to be special cases of the ϕdivergence association model. Thus, their properties and features can be studied unified and model selection can be faced differently. For square contingency tables with commensurable classification variables, the complete symmetry model is the most parsimonious model that serves as reference model. In this context, the quasisymmetry (QS) model, under certain conditions, is the closest model to symmetry in terms of the KullbackLeibler distance. Replacing the KullbackLeibler distance by the ϕdivergence, the generalized quasisymmetry model QS[ϕ] is developed, providing alternative QStype models. Properties, connections to other models and interpretational aspects studied for the general QS[ƒ] model apply to all its special cases. Logistic regression is another model that can be generalized through ϕdivergence to a class of models, unifying alternative approaches.

The Malaria Atlas Project (MAP): quantifying malaria endemicity, burden and elimination feasibility 
Andy Tatem (UF) Over 2 billion people are estimated to be exposed to the threat of the dangerous Plasmodium falciparum strain of malaria, resulting in around 500 million clinical episodes and 13 million deaths a year. In allocating public health resources for control, the guiding principle should be an evidencebased quantification of need. The evidence base for allocating resources for malaria control on a global scale is poor, however, with endemicity levels often unknown or ignored. Endemicity is a measure of the level of malaria challenge in a human population, and determines the average age of first exposure, the rate of development of immunity, and thus, the expected clinical spectrum of disease. Therefore, suites of relevant interventions to control malaria should be tailored to these basic epidemiological foundations. The primary goal of the Malaria Atlas Project (MAP, www.map.ox.ac.uk) is to develop the science of malaria cartography, aiming primarily to produce the first empiricallyderived global maps of transmission limits and endemicity levels within these limits. The interdisciplinary work of MAP will provide the background to my talk, detailing how community surveys, satellite imagery and a wealth of other spatiotemporal data layers are being used to understand, quantify and model malaria transmission, burdens and population distribution spatially. I will focus principally however on MAP research recently funded by the Bill and Melinda Gates Foundation aimed at designing spatial tools for local malaria elimination planning. A key part of such tools involves quantifying and describing human population movement patterns in relation to the transport of infections. Novel migration and movement datasets including microcensus data, mobile phone records and travel history surveys have been acquired. These in turn open up opportunities for the development of novel statistical approaches to extract valuable information for the strategic planning of malaria elimination.

Past Seminars