Mark Yang, University of Florida

Using Pilot Study to Increase Efficiency in Clinical Trials

Pilot study is often conducted to determine the sample size required for a clinical trial. Due to the difference in the sampling environments, the pilot study data are usually discarded after sample size calculation. They are not embedded into the new data for the final analysis. This paper tries to use the pilot information to modify the subsequent testing procedure when the t-test is used to compare two treatments. The new test maintains the required probability of making the Type I error, but increases the power and consequently reduces the sample size requirement. It loses power only when the pilot study is a bad sample, i.e., of little resemblance to the real data. Due to the small likelihood of the latter, the new approach is a viable alternative to the current practice.

This is join work with Sam Wu.
Seminar page

Carsten Botts, Iowa State University and University of Florida

A Shrinkage Estimator for Spectral Densities

We propose a shrinkage estimator for spectral densities based on a multilevel normal hierarchical model. The first level captures the sampling variability via a likelihood constructed using the asymptotic properties of the periodogram. At the second level, the spectral density is shrunk towards a parametric time series model. To avoid selecting a particular parametric model for the second level, a third level is added which induces an estimator that averages over a class of parsimonious time series models. The estimator derived from this model, the model averaged shrinkage estimator (MASE), is consistent, is shown to be highly competitive with other spectral density estimators via simulations, and is computationally inexpensive.
Seminar page

Pavlina Rumcheva, University of Florida

Projected Multivariate Linear Models for Directional Data

We consider the spherically projected multivariate linear model for directional data in the general d-dimensional case. This model treats directional observations as projections onto the unit sphere of unobserved responses from a multivariate linear model. A formula for the mean resultant length of the projected normal distribution, which is the underlying distribution of the model, is presented for the general d-dimensional case. We show that maximum likelihood estimates for the model can be computed using iterative methods. We develop a test for equal mean directions of several populations assuming a common unknown concentration. An example with three-dimensional anthropological data is presented to demonstrate the application of the test. We also consider random effects models for directional data, and show how maximum likelihood estimates can be computed.
Seminar page

Sounak Chakraborty, University of Florida

Multiclass Cancer Diagnosis Using Bayesian Kernel Machine Models

Precise classification of tumors is critical, for cancer diagnosis and treatment. In recent years, several works showed successful classification of tumor types using gene expression patterns. Thus gene expression data is proving to be a very promising tool in cancer diagnosis. However, the simultaneous classification across a heterogeneous set of tumor types has not been well-studied yet. Usually, this multicategory classification problems are solved by using a binary classifiers which may fail in a variety of circumstances. We tackle the problem of cancer classification in the context of multiple-tumor types. We develop a full probabilistic model-based approach, specifically probabilistic relevance vector machine (RVM), as well as support vector machines (SVM) for multicategory classification. A hierarchical model is also proposed where the unknown smoothing parameter is interpreted as a shrinkage parameter. We assign a prior distribution to it and obtain its posterior distribution using a Bayesian computation. In this way, not only, do we, obtain the point predictors but also the associated measures of uncertainty.

We also propose a Bayesian variable selection method for selecting the differentially expressed genes, integrated with our RVM and SVM models for improved classification. Our method makes use of mixture priors and Markov chain Monte Carlo technique to identify the important predictors (genes) and classify samples simultaneously. We have applied our methods on two different microarray datasets to identify differentially expressed genes and compared their classification performance with the existing methods.
Seminar page

Marinela Capanu, University of Florida

Testing for Misspecification of Parametric Models

Because model misspecification can lead to inconsistent and inefficient estimators and invalid tests of hypotheses, testing for misspecification is critically important. The IOS test recently proposed by Presnell & Boos (2004) is a general purpose goodness-of-fit test which can be applied to assess the adequacy of a wide variety of parametric models without specifying an alternative model. The test is based on the ratio of in-sample and out-of-sample likelihoods, and can be viewed asymptotically as a contrast between two estimates of the information matrix that are equal under correct model specification. The statistic is asymptotically normally distributed, but parametric bootstrap is recommended for computing the p-value of the test. Using properties of locally asymptotically normal parametric models we prove that the parametric bootstrap provides a consistent estimate of the null distribution of the IOS statistic under quite general conditions. Finally, we compare the performance of the IOS test with existing goodness-of-fit tests in several applications and through simulations involving models such as logistic regression, Cox proportional-hazards regression, beta-binomial models, zero-inflated Poisson models, and autoregressive of order 1 models. While the IOS test is broadly applicable and not restricted to a particular model, the applications show that its performance is comparable or superior to that of other goodness-of-fit tests. Moreover, the IOS test is automatic and easy to employ, and useful for situations in which sometimes the Information Matrix test is the only competitor.
Seminar page

David Wilson, University of Florida

Tips For Improving Your Public Speaking Skills

Public speaking is a fundamental skill required for a successful professional career. Since a talk at a professional meeting or job interview can make or break you, it is imperative that you take this task seriously. The goal of this talk is to indicate positive steps for improving your public speaking skills. The talk begins by recounting the 13 worst disasters I have observed, including examples of miserable presentations by prominent researchers. I then indicate positive steps you might take to improve your own presentations by discussing the structure a well-organized talk and by offering tips for avoiding commonly made mistakes.
Seminar page

Hani Doss, University of Florida

Estimation of Bayes Factors for Nonparametric Bayes Problems via Radon-Nikodym Derivatives

We consider situations in Bayesian analysis where the prior is indexed by a hyperparameter taking on a continuum of values. We distinguish some arbitrary value of the hyperparameter, and consider the problem of estimating the Bayes factor for the model indexed by the hyperparameter vs. the model indexed by the distinguished point, as the hyperparameter varies. We assume that we are able to get iid or Markov chain Monte Carlo samples from the posterior distribution for a finite number of the priors. We show that it is possible to estimate the family of Bayes factors if are able to calculate the likelihood ratios for any pair of priors, and we show how to obtain these likelihood ratios in some nonparametric Bayesian problems. We illustrate the methodology through two detailed examples involving model selection problems.
Seminar page

Denis Larocque, Department of Management Sciences, HEC Montréal

Sign and Signed-Rank Tests for Clustered Data

In this talk, I will present some recent results concerning sign and signed-rank tests for the univariate and multivariate one-sample problem with cluster correlated data. The major part of the talk will focus on a new weighted multivariate sign test. Asymptotic properties including Pitman ARE and some simulation results will be presented.
Seminar page

Alex Trindade, University of Florida

Measures of Financial Risk: Connections With Quantile Regression and an Application to the Discovery of Optimal Alloys

This will be a two-part talk focusing on newly discovered connections between the seemingly disparate topics of: measures of risk (finance), selection differentials (genetics), and the now well-established field of quantile regression first introduced by Koenker and Bassett (1978) (statistics, econometrics). The first part will look at estimation and discuss asymptotic properties of some common measures of risk. A connection is made between nonparametric estimators of the upper tail mean, the Conditional Value-at-Risk (CVaR), and the mean improvement when selecting the top fraction of subjects in genetic breeding experiments. We will then show that in a regression context, estimators derived from a generalization of least absolute deviations (quantile regression), coincide with estimators obtained by minimizing the residual error distance of the upper tail mean from the overall mean. The second part will be an application to modeling the mechanical properties of steel alloys via quantile regression. Model predictions are shown to give comparable results to manual selection when used to rank steels. We will show how the models can thus be used in conjunction with physical data to find better steels by solving a large constrained optimization problem. The approach may greatly accelerate the discovery of new optimized alloys, hitherto a slow and laborious process relying exclusively on physical experimentation.
Seminar page

Clyde Schoolfield, University of Florida

Generating a Uniformly Random Configuration from Various Combinatorial Urn Models

In traditional combinatorial urn models, a number of balls are to be randomly distributed into a number of urns. In some models the balls may or may not be distinguishable; and likewise, the urns may or may not be distinguishable. We present a very general algorithm that induces several Markov chains whose stationary distributions are uniform on the set of configurations in many of these models. We will also explore the mixing times of these Markov chains. Specific examples that we will consider are a random walk on the hypercube, the Ehrenfest urn model, a Markov chain on the partitions of a set of objects, and a random walk among Young diagrams.
Seminar page

Jim Hobert, University of Florida

Markov Chain Conditions for Admissibility

Suppose that $X$ is a random vector with density $f(x|\theta)$ and that $\pi(\theta|x)$ is a proper posterior density corresponding to an improper prior $\nu(\theta)$. The prior is called strongly admissible if the formal Bayes estimator of every bounded function of $\theta$ is admissible under squared error loss. Eaton (1992, Annals of Statistics) showed that recurrence of a certain Markov chain, $W$, defined in terms of $f$ and $\nu$, implies the strong admissibility of $\nu$. Hobert and Robert (1999, Annals of Statistics) showed that $W$ is recurrent if and only if a related Markov chain, $\tilde{W}$, is recurrent. I will show that when $X$ is an exponential random variable, a fairly thorough analysis of the Markov chain $\tilde{W}$ is possible and this leads to a simple sufficient condition for strong admissibility. I will also explain how the relationship between $W$ and $\tilde{W}$ can be used to establish that certain perturbations of strongly admissible priors retain strong admissibility. (This is joint work with M. Eaton, G. Jones, D. Marchev and J. Schweinsberg.)
Seminar page

Peter Wludyka, University of North Florida

The Analysis of Means

The Analysis of Means (ANOM) is a method for comparing means, rates, and proportions. The basic idea is to create an ANOM decision chart (similar in appearance to a Shewhart process control chart) on which means, rates, or proportions are plotted along with decision lines associated with a particular level of significance. Similarly, the Analysis of Means for Variances (ANOMV) employs a decision chart on which sample variances are plotted. Since analysis of means type tests offer an easy to understand decision chart that is useful for conveying results to non-statisticians, they are often a nice alternative to the ANOVA-F test, Pearson's chi-squared test for an I by 2 table, one-way Poisson regression, or Bartlett's test. There is also an ANOM version of Levene's test. This talk will present some basics for practitioners who might want to use ANOM in collaborative work. Some SAS examples using PROC ANOM (now available in version 9.1) will be presented. Results from a study comparing ANOM to Pearson's test will be presented as well as a SAS program that produces accurate power estimates useful in prospective or retrospective applications for both balanced and unbalanced binomial data. The SAS program also estimates the probability that a particular population "signals" by plotting outside the ANOM decision limits. Those wanting more information can visit
Seminar page

Christopher Jermaine, CISE, University of Florida

Sampling From Massive Databases

The largest databases are now so large that answering complicated, analytic queries can take hours or days. One alternative to providing exact answers to queries is to make use of approximation techniques, with sampling being a prime candidate. The application of sampling to database queries provides a challenging set of problems, since sampling in a database environment is quite different than traditional finite population or survey sampling. For example, in a database nearly all queries compute complicated functions involving operations such as set differences over multiple populations, making inference particularly challenging. Another difference is that in a database, it is possible to pre-compute and store a large number of exact population totals offline, and then use those values to help with subsequent inference problems. This talk will consider some of the challenges involved in applying sampling techniques to the largest databases.
Seminar page

André Khuri, University of Florida

Applications of Dirac's Delta Function in Statistics

The Dirac delta function has been used for many years in mathematical physics. The purpose of this talk is to bring attention to several useful applications of this function in mathematical statistics. Some of these applications include a unified representation of the distribution of a function (or functions) of one or several random variables, which may be discrete or continuous, a proof of a well-known inequality, and a representation of a density function in terms of its noncentral moments.
Seminar page

Yan Li, Rush University Medical Center

Estimation of the Mediation Effect with a Binary Mediator

A mediator acts as a third variable in the causal pathway between a risk factor and an outcome. We consider the estimation of the mediation effect when the mediator is a binary variable. We give a precise definition of the mediation effect and examine asymptotic properties of five different estimators of the mediation effect. Our theoretical developments, which are supported by a Monte Carlo study, show that the estimators that account for the binary nature of the mediator are consistent for the mediation effect while other estimators are inconsistent. We use these estimators to study the mediation effect of chronic cerebral infarction in the causal relationship between the apolipoprotein E epsilon-4 allele and cognitive function among 233 deceased participants from the Religious Orders Study, a longitudinal, clinical-pathologic study of aging and Alzheimer's disease.
Seminar page

Bradley Efron, Stanford University

Correlation and Large-Scale Significance Testing

Large-scale hypothesis testing problems, with hundreds or thousands of test statistics "z[i]" to consider at once, have become commonplace in current practice. Applications of popular analysis methods such as false discovery rates do not require independence of the z[i]'s but their accuracy can be compromised in high-correlation situations. This talk discusses methods, both theoretical and computational, for assessing the size and effect of correlation in large-scale testing problems. Two microarray examples will be used to illustrate the ideas. The examples show surprisingly large correlations that badly destabilize standard statistical analyses, but newer methods can remedy at least part of the trouble.
Seminar page

Bradley Efron, Stanford University

Fifty Years of Empirical Bayes

Scientific inference is the process of reasoning from observed data back to its underlying mechanism. The two great schools of statistical inference, Bayesian and frequentist, have competed over the past two centuries, often bitterly, for scientific supremacy. Empirical Bayes, a novel hybrid, appreared in the early 1950's, showing promise of immense possible gains in inferential accuracy. Nevertheless it has languished in the statistics literature, with its gains viewed as suspicious and even paradoxical by Bayesians and frequentists alike. New scientific technology, exemplified by dna microarrays, has suddenly revived interest in empirical Bayes methods. This talk, which is aimed at a general scientific audience, examines the ideas involved through a series of real examples, and proceeds with a minimum of technical development.
Seminar page

Parimal Mukhopadhyay, Indian Statistical Institute

Some Lower Bounds to the Variances of Estimating Functions

Godambe (1960) obtained a lower bound to the variance of a estimating function under some regularity conditions. We shall obtain here a lower bound to the variance which does not require Wald-Wolfwitz regularity conditions. When there is only one nuisance parameter, we shall, using Bhattacharyya's (1946) scores obtain another lower bound to the variance of an estimating function for one interesting parameter. The use of these results in obtaining an optimal estimating function in the presence of nuisance parameters will be indicated.
Seminar page

Elias Moreno, Universidad de Granada

Comparison of Bayesian Objective Procedures for Variable Selection in Linear Regression

In the objective Bayesian approach to variable selection in regression a crucial point is the encompassing of the underlying nonnested linear models. Once the models have been encompassed one can define objective priors for the multiple testing problem involved in the variable selection problem. There are two natural ways of encompassing: one way is to encompass all models into the model containing all possible regressors, and the other one is to encompass the model containing the intercept only into any other. In this paper we compare the variable selection procedures that result from each of the two mentioned ways of encompassing: Relations with frequentist criteria for model selection such as those based on the adjusted R2, lasso, and Mallows Cp are provided incidentally.
Seminar page