Response Surface Methods for Multiresponse Experiments
The purpose of this seminar is to provide some coverage of the
so-called multiresponse surface methodology. This is a
particular area in the design and analysis of experiments dealing
with multiresponse experiments. In such experiments, several response
variables can be measured, or observed, for each setting of a group
of control variables. Quite often, the responses are correlated. It is
therefore more appropriate to use multivariate techniques to analyze
data from such responses. The seminar will review the following topics:
- Estimation of parameters of a multiresponse model.
- A test for lack of fit of linear multiresponse model.
- Designs for multiresponse models.
- Multiresponse optimization.
Seminar page
Objective Bayes Variable Selection (or What I Did on My Spanish
Vacation)
A novel fully automatic Bayesian procedure for variable selection in normal
regression models is proposed, along with computational strategies for model
posterior evaluation. A stochastic search algorithm is given, based on the
Metropolis-Hastings Algorithm, that has a stationary distribution
proportional to the model posterior probabilities. The procedure is
illustrated on both simulated and real examples. Seminar page
Statistical analysis of a Telephone Call Center: A Queueing Science Perspective
A call center is a service network in which agents provide telephone-based services. Customers that seek such services may be delayed in tele-queues, which are invisible to them. The talk summarizes an analysis of a unique record of call center operations. The data comprise a complete operational history of a small banking call center, call by call, over a full year. Taking the perspective of queueing theory, we decompose the service process into three fundamental components: arrivals, waiting times, and service durations. Each component involves different basic mathematical structures and requires a different style of statistical analysis. Some of the key results will be sketched, along with descriptions of the varied techniques required. In conclusion we survey how the characteristics deduced from the statistical analyses form the building blocks for theoretically interesting and practically useful mathematical models for call center operations. This reports on joint work with Larry Brown, Linda Zhao and Noah Gans from Wharton, and Avishai Mandelbaum, Anat Sakov and Sergey Zeltyn from Technion (Israel). Seminar page
Microarry Classification; Support Vector Machine, Kernel
Logistic Regression and Import Vector Machine
n the first part of the talk, I will talk about microarray classification.
Classification of patient samples is an important aspect of cancer diagnosis
and treatment. We propose a simple model, penalized logistic regression
(PLR), for the microarray cancer diagnosis problem. A fast algorithm for
solving PLR is described. Often a primary goal in microarray cancer
diagnosis is to identify the genes responsible for the classification,
rather than class prediction. We consider two gene selection methods used
in the literature, univariate ranking (UR) and recursive feature elimination
(RFE). Empirical results indicate that PLR combined with RFE tends to
select less genes than other methods and also performs well in both
cross-validation and test samples.
In the second part of the talk, I will talk about the support vector
machine, kernel logistic regression and import vector machine. The support
vector machine (SVM) is known for its good performance in binary
classification, but its extension to multi-class classification is still an
on-going research issue. In this talk, we propose a new approach for
classification, called the import vector machine (IVM), which is built on
kernel logistic regression (KLR). We show that the IVM not only performs as
well as the SVM in binary classification, but also can naturally be
generalized to the multi-class case. Furthermore, the IVM provides an
estimate of the underlying probability. Similar to the ``support points''
of the SVM, the IVM model uses only a fraction of the training data to index
kernel basis functions, typically a much smaller fraction than the SVM.
This gives the IVM a computational advantage over the SVM, especially when
the size of the training data set is large. Seminar page
An Overview of Benford's Law with Applications to Auditing
Benford's law proposes a distribution of digits, most notably first digits, in
measurements that span many orders of magnitude. Auditors have begun using Benford's law as part of
fraud detection schemes in a variety of settings. It is well known, however, that Benford's law does
not apply in certain conditions, such as when the data is all of the same order of magnitude. In this
presentation we give an overview of Benford's law and some ways to use it as a teaching tool. We
discuss some diagnostic procedures for deciding when Benford's law should apply, and the use of these
diagnostics in practice by auditors. Seminar page
Misspecification Error in Missing Data Models
When a statistical model is incorrect, the MLE is inconsistent,
converging to the minimizer $\theta^*$ of Kullback-Leibler
information. Any difference between the density $f_{\theta^*}$ and
the true density $g$ is error due to model misspecification. We
propose a Monte Carlo method to find $\theta^*$ when there are missing
data and the observed data likelihood doesn't have closed form. The
motivating example was models for mutation accumulation data from
statistical genetics.
We prove consistency and asymptotic normality of the Monte Carlo
estimate of $\theta^*$. The method involves generating two samples,
the first for observed data from the true density and the second for
missing data from an importance sampling density. The entire second
sample is used with each member of the first sample. We show that
this results in an asymptotic variance for the estimate smaller than
that obtained by using the first sample only once.
If nature, instead of a computer, generates the first sample, then our
estimate is a Monte Carlo approximation to the MLE. Now its
asymptotic variance reflects sampling variability of the first sample
and Monte Carlo variability of the second sample. Seminar page
Distance-based Model-selection with Application to the Analysis of
Gene Expression Data
Multivariate mixture models provide well known and widely used methods
for density estimation, model-based clustering, and explanations for the
data generation process. However, the problem of choosing the number of
components of a mixture model in a statistically meaningful way is still a
subject of considerable research. I introduce several rules for selecting
a finite mixture model, and hence estimating the number of components,
using quadratic distance functions. In one approach, the goal is to find
the minimal number of components that are needed to adequately describe
the true distribution, where the decision is based on a nonparametric
confidence set for the true distribution. Two alternative approaches for
estimating the number of components are based on density concordance and
risk analysis. In the density concordance approach the resulting
concordance curves have properties which determine the amount of
variability (in the empirical density) explained by the model. In the risk
analysis approach to model selection, I demonstrate how the distance can
be decomposed into two parts pertaining to (1) the lack of fit of the
model and (2) the cost of parameter estimation. Applications of my methods
to the analysis of gene expression data will be presented during the talk. Seminar page
Rotation of Principal Components: A Penalized Likelihood Approach
Principal component analysis remains a standard tool for exploratory
multivariate analysis, recently finding renewed use in exploration of
functional data. To facilitate interpretation of individual components,
investigators in many applied sciences sometimes choose to perform
a rotation on selected groups of components ---
usually a subspace-preserving orthogonal transformation of the component
direction vectors that brings them into closer alignment
with an easily interpretable orthogonal basis for the space.
Until recently, principal component rotation has
not received much attention from statisticians, perhaps because
it apparently lacks a formal statistical foundation. This talk will introduce
a new framework for rotation via maximizing a penalized profile likelihood
based on the multivariate Gaussian case. Likelihood provides an appropriate
quantification of component ill-definedness, while rotation criteria
like those used in factor analysis can serve as penalty functions for
encouraging component interpretability. A single penalty parameter smoothly
controls the degree of rotation, from the original principal components
to components fully aligned with the axes, with ill-defined components having
the greatest susceptibility to rotation. The connection between
likelihood and approximate confidence regions provides a new way to
measure the
degree to which rotated components are consistent with the data.
Although the problem of maximizing the penalized likelihood is generally
not analytically tractable, numerical solutions can be computed efficiently to
any level of accuracy with the assistance
some recently introduced algorithms designed for
orthogonality-constrained optimization. Seminar page
Inverse Gaussian and Gaussian Analogies
The Inverse Gaussian (IG) distribution is potentially more
useful in practice than the better known Gaussian distribution. The IG
distribution goes back to Schrodinger (1915), Schmoluchowski (1915),
Wald (1947) and Tweedie (1947), whereas the normal distribution can be
traced to De Moivre (1738), about a century before Gauss popularized it.
The two parameter IG distribution is ideally suited for modelling non-negative positively skewed data, which arose out of the analysis of
Brownian motion and now is used for analyzing data from fields as
diverse as ecology and the internet. The distribution is intriguingly similar to the normal distribution in many respects, and the inference methods associated with it use well-known normal theory entities such
as t, chi-square, and F tests. In this talk, we discuss the Inverse
Gaussian and Gaussian Analogies with some emerging results. Seminar page
Hessian Eigenmaps for Nonlinear Dimensionality Reduction
Suppose I look at many pictures of a face gesturing.
I know that underlying each of those pictures there is a set
of face muscles with a set of 'parameters' controlling the
extension of the muscle. Can I learn, simply from looking at
lots of such pictures and with no other assistance, the parameters
underlying such pictures? I'll discuss two articles in the journal 'Science' that introduced
methods
for analysing databases of articulated images... e.g. many pictures
of a face gesturing or many pictures of a hand gesturing or pictures
of a vehicle from many different positions. The methods, ISOMAP
and Local Linear Embedding, claimed to find the hidden `rule'
(parametrization) lying behind the image database. The methods
form part of the large body of techniques for discovering the structure
of data lying on manifolds in high-dimensional
space. I'll discuss what I view to be weaknesses in our understanding
based on the original articles, along with subsequent research that has
clarified these issues. We now have mathematical
theory showing that certain classes
of image databases (e.g. gesturing cartoon faces) can be
analysed perfectly by an improvement of ISOMAP and LLE
which we call the Hessian Eigenmap, which
I will explain and apply. Joint work with Carrie Grimes. Seminar page
Particulate Air Pollution Mortality Time Series Studies
In recent years, there has been much interest in the health effects of
particulate air pollution. There have been many time series studies that
have shown increases in particulate air pollution increase the
expected mortality rate. In this talk, I will review how a typical
Particulate Air Pollution Mortality Time Series Study is conducted.
I will then focus on two potential problems of such studies: Mortality
displacement and combining particulate air pollution data from multiple
monitors. Seminar page
Local quasi-likelihood method for generalized random curve models
for longitudinal data
We consider a class of generalized random curve models for
continuous or discrete longitudinal data. The subject-specific smooth
curve is decomposed into two components, a population (fixed) curve
and a subject-specific (random) curve. The local quasi-likelihood
method is developed to fit the proposed models. Our modeling
approach allows us to estimate not only the population curve, but also
the individual curves. We establish the asymptotic results of the
resulting estimators from which inference procedures are derived. The
proposed models and methods are applied to a longitudinal binary data
set from an AIDS clinical study. We also conduct a simulation to study
the finite sample properties of the proposed estimators. Seminar page
Block-dependent thresholding in wavelet regression
Nonparametric regression via wavelets is usually implemented
under the assumptions of dyadic sample size, equally spaced and fixed
sample points, and i.i.d. normal errors. By applying linear
transformations to the data and block thresholding to the discrete wavelet
transform of the data, one can still achieve optimal rates of convergence,
fast computational time, and spatial adaptivity for functions lying in
Holder spaces even for data that does not possess the above three
assumptions. The thresholds are dependent on the varying levels of noise
in each block of wavelet coefficients, rather than on a single estimate of
the noise as is usually done. This block-dependent method is compared
against term-by-term wavelet methods with noise-dependent thresholding via
theoretical asymptotic convergence rates as well as by simulations and
comparisons on a well-known data set. Seminar page
Beyond the Proportional Hazards Models
Survival data have been studied extensively in statistical
literature. A variety of regression models are also developed.
Among these models, the most successful one is the Cox
proportional hazards model. In this talk, we will first discuss
the potential limitations of this model and other available models
in practical data analysis. Then we will present a series of works
in alternative modelling strategies and model selections to cope
with these limitations. An actual randomized clinical trial data
will be analyzed for the new methodologies throughout the
demonstration. Seminar page
Covariate-adjusted spatio-temporal cumulative distribution functions
with application to air pollutant data
We provide a fully hierarchical approach to the modeling of
spatial cumulative distribution functions (SCDFs), using a
Bayesian framework implemented via Markov chain Monte Carlo (MCMC)
methods. The approach generalizes the SCDF to accommodate
block-level variables, possibly utilizing a spatial change of
support model within an MCMC algorithm. We then extend our
approach to allow covariate weighting of the SCDF estimate. We
further generalize the framework to the bivariate random process
setting, which allows simultaneous modeling of both the responses
and the weights. Once again MCMC methods (combined with a
convenient Kronecker structure) enable straightforward estimates
of weighted, bivariate, and conditional SCDFs. A temporal
component is added to our model, again implemented with a
Kronecker product covariance stucture, corresponding to separable
correlations. We illustrate our methods with two air pollution
data sets, one concerning ozone exposure and race in Atlanta, GA,
and the other recording both NO and NO_2 ambient levels at 67
monitoring sites in central and southern California. Seminar page
Uncertainty
"Uncertainty", like its complementary cousin, "information",
is a much used but not very well defined concept despite its
intrinsic role in statistics. (Indeed, that latter is often described
as the "science of uncertainty".)
In this talk, I will explore some of the meanings (from
the manuscript written with Constance van Eeden) that are ascribed to that
term and readily discover that seemingly natural questions can have
answers that are either elusive or counter-intuitive. For example,
surprisingly (in answer to one of those questions), the level of
uncertainty (according to one defintion) can actually increase rather
than decrease as the amount of information increases. For other
definitions we have not been able to give general answers to that
question.
I will also address the issue of combining information to reduce
uncertainty. Specifically, I will survey some recent work including
that with Malay Ghosh and Constance van Eeden using the weighted
likelihood in conjunction with samples from populations different from,
but similar to that under study. That resemblence can lead to very
effective trade-offs of bias for precision when it derives from
structural relations among the various population parameters, for
example, when the difference in the population means may be bounded by a
fixed constant. Seminar page
Statistical Multiplexing: Math and Stat Take Over the Internet
When two hosts communicate over the Internet --- for example, when
a Web page is down-loaded from a server to a PC --- the two hosts
set up a connection and a file is broken up into packets that are
transmitted over a path made up of routers connected by transmission
links. An Internet link typically carries the packets of many active
connections. The packets of the different connections are intermingled
on the link; for example, if there are three active connections, the
arrival order of 10 consecutive packets by connection number might be 1,
1, 2, 3, 1, 1, 3, 3, 2, and 3. This intermingling is referred to as
``statistical multiplexing'' in the Internet engineering literature,
and as ``superposition'' in the literature of point processes. True, network devices put the packets on Internet links and do the
multiplexing, but then the mathematical and statistical laws of stochastic
processes take over. Extensive empirical and theoretical studies of
detailed packet data, inter-arrival and size time series, reverse the
commonly-held belief that Internet traffic is everywhere
long-range dependent, or bursty. The magnitude of the statistical
multiplexing has a dramatic effect on the statistical properties of the
time series; as the magnitude increases, the burstiness becomes less and
less significant. The magnitude needs to become part of the fundamental
conceptual framework that guides the study of Internet traffic. These
results have critical implications for Internet engineering. Seminar page