Banner

Student Seminar Schedule

(Click here to access the faculty seminar schedule.)

Seminars are held on Tuesdays from 4:00 p.m. - 5:00 p.m. in Griffin-Floyd 100.

Refreshments will be provided!

Fall 2010

Date Speaker

Title (click for abstract)

Comments
Sep. 21 Shibasish Dasgupta
(University of Florida)
 
Sep. 28 Doug Sparks
(University of Florida)
 
Oct. 5 Claudio Fuentes
(University of Florida)
 
Oct. 12 Meixi Guo
(University of Florida)
 
Oct. 19 Jeremy Gaskins
(University of Florida)
 
Oct. 26 Arkendu Chatterjee
(University of Florida)
Will be held in Room 230 
Nov. 2 Nabanita Mukherjee
(University of Florida)
 
Nov. 9
Mike Hyman
(University of Florida)
(Mike's seminar is postponed until the spring semester)
No student seminar this week 
Nov. 16 Dr. Hani Doss
(University of Florida)
(Dr. Doss's seminar is postponed until the spring semester. We are looking for a replacement speaker for this week; if interested, please contact Jason Murray at jtmurray "at" stat "dot" ufl "dot" edu)
 

Abstracts


Title: An Introduction to the Probabilistic Modeling of Text

Shibasish Dasgupta (Sep. 21)

The management of large and growing collections of information is a central goal of modern statistical science. Data repositories of texts have become widely accessible, thus necessitating good methods of retrieval, organization, and exploration. Probabilistic models have been paramount to these tasks, used in settings such as text classification, information retrieval, text segmentation, information extraction etc. These methods entail two stages:

(1) Estimate or compute the posterior distribution of the parameters of a probabilistic model from a collection of text; &
(2) For new documents, answer the question at hand (e.g., classification, retrieval) via probabilistic inference.

The goal of such modeling is document generalization. Given a new document, how is it similar to the previously seen documents? Where does it fit within them? What can one predict about it? Efficiently answering such questions is the focus of the statistical analysis of document collections. In this talk, I’ll consider the problem of modeling text corpora. The goal is to find short descriptions of the members of a collection that enable efficient processing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance judgments. I’ll discuss about the basic methodology for text corpora which successfully deployed in modern Internet search engines.

schedule


Title: Posterior Consistency in Bayesian Regression Models

Doug Sparks (Sep. 28)

Consistency is among the most fundamental properties that we expect to be satisfied by any reasonable statistical estimator. While Bayesian methods have expanded to cover increasingly diverse types of data, it is often taken for granted that these Bayesian procedures are consistent in the frequentist sense. We will examine the circumstances under which the Bayes estimator is consistent for a wide variety of regression models, including interesting cases where the Bayes estimator is not consistent. Results and concepts from inference and probability will be explained as needed in order to make the topic accessible to all students.

schedule


Title: The Receiver Operating Characteristic Curve (A Brief Introduction)

Claudio Fuentes (Oct. 5)

Consider medical tests with results that are not simply positive or negative, but that are measured on a continuous or ordinal scale. Assume that larger values of the test results, say Y, are more indicative of a disease. Then, the values of Y are needed to make a dichotomous decision, namely, whether the disease is present or not. This problem is fundamental to the evaluation of medical tests and the choice of a suitable threshold for the values of Y is crucial. Implicitly, the choice of a threshold depends on the trade-off that is acceptable between failing to detect a disease and falsely identifying the disease with the test. In the context of the problem, the Receiver Operating Characteristic (ROC) curve is among the best developed statistical tools to describe the range of trade-offs that can be achieved by the test and evaluate its performance.

Although ROC curves are nowadays widely used in medicine and related fields, its origin goes back to the 50's and, since then, they have been extensively used in signal detection theory among other disciplines. In this talk, I will present a brief introduction to the topic, with some emphasis in a few properties and practical difficulties associated to it.

schedule


Title: Small Area Estimation when Auxiliary Information is Measured with Error

Meixi Guo (Oct. 12)

The problem of Small area estimation occur when the sample is not large enough to support direct estimates of adequate precision, therefore people turn to the use of indirect model-based estimates for producing small area estimates. In the context where auxiliary information used in the model is measured with error, which is quite common in practice, the usual small area estimator which ignore the measurement error might be worse than using the direct estimator. There's an interesting paper by Ybarra and Lohr(2008) considered such circumstances but did not provide second-order unbiased estimator for the MSE of the EBLUP, so we propose alternative approaches such as profiling likelihood and integrated likelihood to try to develop second-order unbiased MSE of the EBLUP by using Taylor's expansion.

schedule


Title: The Dirichlet Process and a Multivariate Extension

Jeremy Gaskins (Oct. 19)

Bayesian nonparametrics is an important and growing field of statistics, which is concerned with making Bayesian-style inference without making strong distributional assumptions about model parameters. The Dirichlet Process has been the foundation of Bayesian nonparametrics, in part, because it encourages clustering of the parameter under consideration. The first portion of the seminar will introduce (or recall) the Dirichlet Process and a few of its key features. Unfortunately, the Dirichlet Process can have undesirable behavior in a multivariate setting. We will introduce the Matrix Stick-Breaking Process (MSBP), as introduced by Dunson, Xue, Carin 2008, as a remedy. The talk is intended to be accessible to students with no previous experience with the Dirichlet Process.

schedule


Title: Bayesian Model Selection For Incomplete Data Using The Posterior Predictive Distribution

Arkendu Chatterjee (Oct. 26)

Model choice is a fundamental and much discussed activity in the analysis of data sets. When several parametric models are under consideration, we need to determine how well they fit the observed data.

Model selection problem consists of distributions of various quantities by considering probability model for the observables Y conditioned on each model "m" and the parameter vector. We chose the model with the best value for the model selection criterion.

We have explored the use of posterior predictive loss criterion for model selection for incomplete Longitudinal Data. We show that a straight forward extension of the Gelfand and Ghosh(1998) criterion to incomplete data introduces extra term, in addition to the Goodness Of Fit term and Penalty term, that compromises the criterion. We have proposed an alternative and explored it via simulations and on a real data set.

Key Words: Posterior Predictive Distribution, DIC, Pattern Mixture Model, Selection Model.

schedule


Title: Asymptotic Variance Evaluations in Discrete Markov Chains

Nabanita Mukherjee (Nov. 2)

Markov chain Monte Carlo (MCMC) methods have become widely used in various statistical applications as well as in theoretical approaches to statistical computing. The motivation behind performing this computer based simulation method is due to the possible intractable nature of the distribution of the quantity of interest. Suppose we are interested in finding the expected value of a function f, of a random variable X with $\pi$ as the probability distribution and in many cases $\pi$ is known only up to a normalizing constant.

For a given $\pi$, there are many Markov chains that preserve the same stationary distribution. So, orderings defined on Markov state spaces with a specified stationary distribution guide us in choosing one Markov chains over another, in terms of lower asymptotic variance.

We propose different methods of constructing a better Markov chain from a given chain, in terms of Peskun ordering (Peskun, 1973). As in the method of constructing a better chain, preserving stationarity is very delicate, the Metropolis-Hastings algorithm comes to the rescue. We also propose an algorithm for getting the Optimal transition matrix which does not require the knowledge of normalizing constant of $\pi$.

schedule


seminar main page

 

Past Seminars

Fall 2009
Spring 2010