Seminars are held from 4:00 p.m. - 5:00 p.m. in Griffin-Floyd 100 unless otherwise noted.
Refreshments are available before the seminars from 3:30 p.m. - 4:00 p.m. in Griffin-Floyd Hall 201.
Title (click for abstract)
|Sep 14||Miles Lopes (University of California, Davis)|
|Sep 21||Gen Li (Columbia University)|
|Oct 3||Hira Koul (Michigan State University)|
|Oct 12||Galin Jones (University of Minnesota)|
|Oct 26||Mariana Pensky (University of Central Florida)||
|Nov 2||Jason Roy (University of Pennsylvania)||
|Nov 16||Lifeng Lin (Florida State University)||
Assessing publication bias in meta-analysis
|Nov 30||Rebecca Steorts (Duke University)||
|Bootstrap Methods for High-Dimensional and Large-Scale Data|
Miles Lopes University of California, Davis
Bootstrap methods are among the most broadly applicable tools for statistical inference and uncertainty quantification. Although these methods have an extensive literature, much remains to be understood about their applicability in modern settings, where observations are high-dimensional, or where the quantity of data outstrips computational resources. In this talk, I will present a couple of new bootstrap methods that are tailored to these settings. First, I will discuss the topic of "spectral statistics" arising from high-dimensional sample covariance matrices, and describe a method for approximating the laws of such statistics. Second, in the context of large-scale data, I will discuss a more unconventional application of the bootstrap -- dealing with the tradeoff between accuracy and computational cost for ensemble classifiers. More specifically, I will explain how the bootstrap can be used to decide when an ensemble of classifiers trained by bagging or random forests is sufficiently large. This will include joint work with Alexander Aue and Andrew Blandino.
|A general framework for the association analysis of heterogeneous data|
Gen Li Columbia University
Multivariate association analysis is of primary interest in many applications. Despite the prevalence of high-dimensional and non-Gaussian data (such as count-valued or binary), most existing methods only apply to low-dimensional datasets with continuous measurements. We develop a new framework for the association analysis of two sets of high-dimensional and heterogeneous (continuous/binary/count) data. We model heterogeneous random variables using exponential family distributions, and exploit a structured decomposition of the underlying natural parameter matrices to identify shared and individual patterns for two datasets. We also introduce a new measure of the strength of association, and a permutation-based procedure to test its significance. An alternating iteratively reweighted least squares algorithm is devised for model fitting, and several variants are developed to expedite computation and achieve variable selection. The application to the Computer Audition Lab 500-song (CAL500) music annotation study sheds light on the relationship between acoustic features and semantic annotations, and provides an effective means for automatic annotation and music retrieval.
|Goodness-of-fit Testing of Error Distribution in Linear Measurement Error Models|
Hira Koul Michigan State University
In this talk we shall discuss a class of goodness-of-fit tests for the error density function in linear measurement errors regression models using a deconvolution kernel density estimators of the regression model error density. The test statistic is an analog of the Bickel and Rosenblatt type test statistic. The asymptotic null distribution of the proposed test statistics is derived for both the ordinary smooth and super smooth cases. The consistency against a fixed alternative and the asymptotic power of the proposed tests against a class of local nonparametric alternatives are also obtained for both cases. A finite sample simulation study shows some superiority of the proposed test compared to very few other existing tests. Joint work with Weixing Song and Xiaoyu Zhu.
|Bayesian Penalized Regression (and a little MCMC)|
Galin Jones University of Minnesota
I will consider ordinary least squares, lasso, bridge, and ridge regression methods under a unified framework. The particular method is determined by the form of the penalty term, which is typically chosen by cross validation. The goal is to introduce a fully Bayesian approach which allows selection of the penalty through posterior inference if desired and discuss how to use a type of model averaging approach to eliminate the nuisance penalty parameters. Sufficient conditions for the posterior to concentrate near the true regression coefficients as the dimension grows with sample size will be discussed.
The resulting posterior is analytically intractable and requires a component-wise Markov chain Monte Carlo algorithm. The MCMC estimation problem is highly multivariate, an issue which has been largely ignored in the MCMC literature. A new relative-volume simulation termination rule will be introduced and connected to a new concept of effective sample size. This allows termination of the simulation in a principled manner.
Numerical results show that the proposed model and MCMC method tends to select the optimal penalty and performs well in both variable selection and prediction. Examples will be provided.