UF Statistics Seminar Schedule

(Click here to access the student seminar schedule.)

Seminars are held on Thursdays from 4:00 p.m. - 5:00 p.m. in Griffin-Floyd 100 unless otherwise noted.

Refreshments are available before the seminars from 3:30 p.m. - 4:00 p.m. in Griffin-Floyd Hall 230.

Spring 2009

Date Speaker

Title (click for abstract)

Jan 15 Bret Hanlon (Cornell)  
Jan 20 (Tue) Mark Huber (Duke)  
Jan 22 Bo Kai (Penn State) Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression  
Jan 28 XuanLong Nguyen (Duke) Nonparametric and distributed decision making methods for functional and spatial data  
Feb 3 (Tue) Minjung Kyung (UF) Penalized Regression, Standard Errors, and Bayesian Lassos  
Feb 5 Kshitij Khare (Stanford) Inference in Gaussian covariance graph models  
Feb 10 (Tue) Zhou Zhou (Chicago)

Nonparametric Inference of Quantile Curves for Non-stationary Time Series

Feb 12 Matt Harrison (CMU) Conditional inference for assessing the statistical significance of neural spiking patterns  
Feb 19 Richard Cleary (Bentley)


highly accessible to undergrads
Mar 26 André Khuri (UF)  
Apr 9 Maria Kateri (Piraeus) ϕ-Divergence Classes of Models for Categorical Data  
Apr 16 Andy Tatem (UF) The Malaria Atlas Project (MAP): quantifying malaria endemicity, burden and elimination feasibility   


Inference for Quantitation Parameters in Polymerase Chain Reactions via Branching Processes with Random Effects

Bret Hanlon (Cornell)

Quantitative polymerase chain reaction (qPCR) is one of the most widely used tools for gene quantification. Most of the existing methods for analyzing qPCR data fail to account for the sources of variability present in the PCR dynamics. In this talk, I develop a branching process model with random effects to account for this variability, and describe a new inferential procedure for the quantitation parameters. I illustrate the effectiveness of my methods using both simulated and experimental data. (This talk is based on a joint work with Professor Anand Vidyashankar, Cornell University).

Perfect simulation of Matérn Type III point processes

Mark Huber (Duke)

Spatial data are often more widely separated than would be expected if the points were independently placed. Such data can be modeled with repulsive point processes, where the points appear as if they are repelling one another.  Various models have been created to deal with this phenomenon.  Matérn created three procedures that generate repulsive processes.  While the third type allows the most flexibility in modeling, Matérn was unable to resolve the high dimensional integrations needed to utilize the process for inference.  In this talk, I will show how to build an algorithm for using Matérn Type III processes that can be used to approximate the likelihood and posterior values for data.  First, a Metropolis Markov chain is created using a secondary Poisson process.  Next, this chain is used together with bounding chains to obtain perfect draws from the stationary distribution of the chain.  Finally, a product estimator is constructed (again using a secondary Poisson process) in order to obtain approximations with provably good error bounds.

Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression

Bo Kai (Penn State)

Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. Although least squares method is a popular and convenient choice in local polynomial regression, the performance can be adversely influenced by departure from normality as well as existence of outliers. Motivated by these concerns, we propose a new nonparametric regression technique that is highly efficient, robust and computationally simple. Sampling properties of the proposed estimation procedure are studied. We derive the asymptotic bias, variance and normality of the proposed estimate. Asymptotic relative efficiency of the proposed estimate with respect to the local polynomial regression is investigated. It is shown that the proposed estimate can be much more efficient than the local polynomial regression estimate for various non-normal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the proposed estimates. The simulation results are consistent with our theoretic findings. A real data example is used to illustrate the proposed method. This is joint work with Runze Li and Hui Zou.

Nonparametric and distributed decision making methods for functional and spatial data

XuanLong Nguyen (Duke)

I will present two statistical methods for modeling and decision-making with functional and spatial data. The first method is concerned with a class of nonparametric labeling processes for clustering and partitioning curves and surfaces. The labeling process provides a flexible prior for a hierarchical latent variable model, which posits that a collection of curves can be described in terms of a number of typical canonical behaviors, whereas the canonical curve allocation is driven by the latent labeling process. The spatial dependence of labels is driven by the use of a latent Gaussian process, while label sharing across curve collection is enabled by the use of a Dirichlet process. A variational Bayesian inference method is proposed to obtain approximate posterior distribution update embedded within an MCMC algorithm for model fitting. In the variational methodology, posterior inference can be implicitly viewed as optimization over the space of posterior distributions. This view allows us to modify the underlying optimization (e.g., the loss function and/or the search space) to obtain more tractable and robust inference. Our method is illustrated by two specific applications, one arising from the analysis of hormone data, and another arising from image analysis.

The second method addresses the issue of efficient statistical inference that arises in a distributed data collection and processing system. This problem is motivated by an application of distributed detection in a wireless sensor network. Here the goal is to infer about the local decision rules at individual sensors, as well as the global decision at the base station, so as to minimize a predictive error criterion (e.g., 0-1 loss). I will present a theory of equivalent surrogate loss functions using a link between loss functions and information-theoretic divergence functionals. This theory allows us to place a range of well-known but somewhat heuristic methods in the signal processing literature in a firm statistical decision-theoretic framework. Moreover, it allows us to derive an efficient joint estimation method for both global and local decision functions by considering convex surrogate loss functions that are equivalent to the 0-1 loss.

Penalized Regression, Standard Errors, and Bayesian Lassos

Minjung Kyung (UF)

Penalized regression methods for simultaneous variable selection and coefficient estimation, especially those based on the lasso of Tibshirani (1996), have received a great deal of attention in recent years, mostly through frequentist models. Properties such as consistency have been studied, and are achieved by different lasso variations. Here we look at a fully Bayesian formulation of the problem, which is flexible enough to encompass most versions of the lasso that have been previously considered. The advantages of the hierarchical Bayesian formulations are many. In addition to the usual ease-of-interpretation of hierarchical models, the Bayesian formulation produces valid standard errors (which can be problematic for the frequentist lasso), and is based on a geometrically ergodic Markov chain. We compare the performance of the Bayesian lassos to their frequentist counterparts using simulations and data sets that previous lasso papers have used, and see that in terms of prediction mean squared error, the Bayesian lasso performance is similar to and, in some cases, better than, the frequentist lasso.

Inference in Gaussian covariance graph models

Kshitij Khare (Stanford)

Covariance estimation in high-dimensional settings has recently received widespread attention. In this context, graphical models (where dependencies between variables are represented by means of a graph) can act as a tool for regularization and have proven to be useful for the analysis of high dimensional data. A subclass of graphical models, known as Gaussian covariance graph models, encode marginal independence among random variables by means of a graph G. These are distinctly different from the traditional concentration graph models (often referred to as covariance selection models). Inference for these models is challenging both in the frequentist and Bayesian frameworks, since the models give rise to a curved exponential family. Maximum likelihood estimation for these models has received much attention recently but is not in general possible when the sample size is smaller than the dimension of the problem.

In this talk, we address the issue of Bayesian inference for these models. Sine we are now in a curved setting, the Diaconis-Ylvisaker theory is no longer applicable, hence the standard Wishart distributions or those proposed for concentration graph models are not useful for analyzing covariance graph models. We propose a rich family of Wishart distributions which act as a conjugate family of priors for our class of models. By studying the appropriate conditional distributions, we derive a block Gibbs sampling procedure to sample from these distributions, and rigorosly prove convergence of the block Gibbs sampler. We also present various useful theoretical properties of this class of distributions, which enable Bayesian inference in high dimensions. Our techniques will be illustrated using simulated and real examples.

Nonparametric Inference of Quantile Curves for Non-stationary Time Series

Zhou Zhou (Chicago)

Nowadays non-stationary time series are frequently collected in various areas and the scientific questions involving such time series generally cannot be solved by traditional stationary time series approaches. In this talk I shall address nonparametric specification tests of quantile curves for a general class of non-stationary processes. Using Bahadur representation and Gaussian approximation results for non-stationary time series, simultaneous confidence bands and integrated squared difference tests are proposed to test various parametric forms of the quantile curves with asymptotically correct type I error rates. A wild bootstrap procedure is implemented to alleviate the problem of slow convergence of the asymptotic results.

In particular, our results can be used to test the trends of extremes and variability of climate variables, an important problem in understanding climate change. An interesting example involves the analysis of the maximum speed of tropical cyclone winds. It was found that an inhomogeneous upward trend for cyclone wind speeds is pronounced at high quantile values. However, there is no trend in the mean lifetime-maximum wind speed. This example shows the effectiveness of the quantile regression technique.

Conditional inference for assessing the statistical significance of neural spiking patterns

Matthew Harrison (CMU)

Conditional inference has proven useful for exploratory analysis of neurophysiological point process data. I will illustrate this approach and then focus on a specific sub-problem: random generation of binary matrices with margin constraints. Sequential importance sampling (SIS) is an effective technique for approximate uniform sampling of binary matrices with specified margins. I will describe how to simplify and improve existing SIS procedures using improved asymptotic enumeration and dynamic programming (DP). The DP approach is interesting because it facilitates generalizations.


Richard Cleary (Bentley)

As a long time Associate Editor for Reviews for journals of the American Statistical Association, I began to wonder if anybody read the reviews I wrote. In the August 2005 issue of The American Statistician, I carried out an experiment to find out. We will analyze the results, which suggest nice applications of some classic problems. We will also consider how with some additional data we could generate a model of the social network of statisticians. This will be a highly interactive presentation with plenty of chances for audience participation.

Multiresponse Surface Models with Block Effects

André Khuri (UF)

This talk considers linear multiresponse surface models which may contain block effects that can be either fixed or random. The effect of blocking on the estimation of the mean responses, the prediction variance-covariance matrix, and the determination of optimum operating conditions will be addressed. The special case of orthogonal blocking in a multiresponse situation will be discussed.

ϕ-Divergence Classes of Models for Categorical Data

Maria Kateri (Piraeus)

Modelling categorical data is viewed through an information theoretic perspective. The models are characterized by their distance from the most parsimonious model in the direction of their qualitative substance, which serves as a reference model. This way, apparently different (and often competitive) models are unified in families sharing common properties and characteristics. It can be proved that all of them measure the distance from the same reference model under the same conditions. Their only difference lies on the measure applied to express this distance. Hence their difference is not qualitative but just a scale difference and consequently if the distances are expressed in terms of a generalized measure, then a family of models can be developed, having well-known models as its members. As such a generalized measure, we have chosen the ϕ-divergence and we have built the corresponding classes of models.

For example, when modeling association (i.e. departure from independence), the ϕ-divergence association model for two-way tables is characterized by the property of being the closest model, in terms of ϕ-divergence, to independence. Well known models in the literature on contingency tables (as association models, correlation models) are proved to be special cases of the ϕ-divergence association model. Thus, their properties and features can be studied unified and model selection can be faced differently.

For square contingency tables with commensurable classification variables, the complete symmetry model is the most parsimonious model that serves as reference model. In this context, the quasi-symmetry (QS) model, under certain conditions, is the closest model to symmetry in terms of the Kullback-Leibler distance. Replacing the Kullback-Leibler distance by the ϕ-divergence, the generalized quasi-symmetry model QS[ϕ] is developed, providing alternative QS-type models. Properties, connections to other models and interpretational aspects studied for the general QS[ƒ] model apply to all its special cases.

Logistic regression is another model that can be generalized through ϕ-divergence to a class of models, unifying alternative approaches.

The Malaria Atlas Project (MAP): quantifying malaria endemicity, burden and elimination feasibility

Andy Tatem (UF)

Over 2 billion people are estimated to be exposed to the threat of the dangerous Plasmodium falciparum strain of malaria, resulting in around 500 million clinical episodes and 1-3 million deaths a year. In allocating public health resources for control, the guiding principle should be an evidence-based quantification of need. The evidence base for allocating resources for malaria control on a global scale is poor, however, with endemicity levels often unknown or ignored. Endemicity is a measure of the level of malaria challenge in a human population, and determines the average age of first exposure, the rate of development of immunity, and thus, the expected clinical spectrum of disease. Therefore, suites of relevant interventions to control malaria should be tailored to these basic epidemiological foundations. The primary goal of the Malaria Atlas Project (MAP, is to develop the science of malaria cartography, aiming primarily to produce the first empirically-derived global maps of transmission limits and endemicity levels within these limits.

The interdisciplinary work of MAP will provide the background to my talk, detailing how community surveys, satellite imagery and a wealth of other spatiotemporal data layers are being used to understand, quantify and model malaria transmission, burdens and population distribution spatially. I will focus principally however on MAP research recently funded by the Bill and Melinda Gates Foundation aimed at designing spatial tools for local malaria elimination planning. A key part of such tools involves quantifying and describing human population movement patterns in relation to the transport of infections. Novel migration and movement datasets including microcensus data, mobile phone records and travel history surveys have been acquired. These in turn open up opportunities for the development of novel statistical approaches to extract valuable information for the strategic planning of malaria elimination.


Past Seminars

Fall 2008 Spring 2008 Fall 2007
Spring 2007 Fall 2006 Spring 2006 Fall 2005
Spring 2005 Fall 2004 Spring 2004 Fall 2003
Spring 2003 Fall 2002 Spring 2002 Fall 2001
Spring 2001 Fall 2000 Spring 2000 Fall 1999