*Perfect Sampling: Basic Ideas and a Recent Result*

A perfect sampler is an algorithm that allows one to use a Markov
chain with stationary density $\pi$ to make exact (or perfect) draws
from $\pi$. A simple, three-state Markov chain is used to explain the
perfect sampling algorithm called coupling from the past (CFTP) (Propp
& Wilson 1996). Extending CFTP to Markov chains with uncountable
state spaces has proved difficult. One success story is Murdoch &
Green's (1998) multigamma coupler, which is based on the fact that a
minorization condition can be used to represent the Markov transition
density as a two-component mixture. The multigamma coupler is
illustrated using a Markov chain from Diaconis & Freedman (1999). Our
main result is a representation of $\pi$ as an infinite mixture that
is based on a minorization condition. When the minorization condition
is of a certain type, it is possible to make exact draws from this
mixture and hence from $\pi$. The resulting algorithm turns out to be
equivalent to the multigamma coupler. (This is joint work with
Christian Robert, Universite Paris - Dauphine.) Seminar page

*Web-Based Clinical Research*

There are a number of well known
issues in the conduct of medical research that lead to inefficiencies and loss of accuracy in clinical
research data. This talk will summarize some of the current problems in
clinical trials research, and proposed solutions offered through
web-based technology. A UF developed web-based clinical research system
will be described. The UF system is being used to conduct the world's
largest and longest running web-based clinical trial to date. INVEST is
a 22,599 hypertension subject Phase IV clinical trial being conducted at
870 primary care medical sites in nine countries, being conducted by the
Divisions of Cardiology and Biostatistics. This talk will illustrate
how clinical research will be transformed by process change available
through web-based technology. Seminar page

*Assessing the Performance of Burg Algorithms in Fitting Multivariate Subset Autoregressions*

We present three new algorithms that extend Burg's
original method for the recursive fitting of univariate autoregressions
on a full-set of lags, to multivariate modeling on a
subset of lags. The algorithms differ only in the manner in which the
reflection coefficients are computed, one such adjustment leading to
the well-known Yule-Walker method. Using simulated data, we show
that two of these algorithms tend to be superior performers for their
respective fitted models, by averaging higher likelihoods with smaller
variability across a large number of realizations. To better evaluate
this difference in performance, we compare saddlepoint approximations to the distributions of the Yule-Walker and
Burg algorithms in a simple univariate setting. In this context, each
estimator can be written as a ratio of quadratic forms in normal
random variables. The smaller bias and variance seen in the distribution of
Burg, particularly at low sample sizes and when the
autoregressive coefficient is close to $\pm 1$, agrees with its
tendency to give higher likelihoods. We speculate that a possible reason
for its enhanced performance, might be connected to the discovery that
the Burg estimator for the white noise variance of the process
coincides, in many cases, with that obtained by maximizing the likelihood. Seminar page

*Dealing with Discreteness: Making `Exact' Confidence Intervals for
Proportions, Differences of Proportions, and Odds Ratios More Exact*

`Exact' methods for categorical data are exact in terms of using
probability distributions that do not depend on unknown parameters.
However, they are conservative inferentially, having actual error
probabilities for tests and confidence intervals that are bounded
above by the nominal level. We examine the conservatism for interval
estimation and suggest ways of reducing it. We illustrate for several
parameters of interest with contingency tables, including the binomial
parameter, the difference between two binomial parameters, the odds
ratio and relative risk in a $2\times 2$ table, and the common odds
ratio for several such tables. Less conservative behavior results
from devices such as (1) inverting tests using statistics that are
"less discrete," (2) inverting a single two-sided test rather than two
separate one-sided tests of at least half the nominal level each, (3)
using unconditional rather than conditional methods (where
appropriate) and (4) inverting tests using alternative P-values. We
also summarize simple ways of adjusting standard large-sample methods
to improve dramatically their small-sample performance. Seminar page

*Comparison of Designs for Generalized Linear Models Using
Quantile Disperion Graphs*

Designs for generalized linear models depend on the unknown
parameters of the fitted model. The use of any design optimality
criterion would
therefore require some prior knowledge of the parameters. In this talk,
a
graphical technique is proposed for comparing and evaluating designs for
a logistic regression model using the so-called quantile dispersion
graphs of the scaled mean squared error of prediction.. These plots
depict the
dependence of a given design on the model's parameters. They also
provide a comprehensive assessment of the overall prediction
capability of the design within the region of interest. Some examples
will be presented to illustrate the proposed methodology. Seminar page

*Characterizing Classes of Antiretroviral Therapies by Genotype*

The research presented in this talk is intended to establish a
framework for understanding the complex relationships between HIV-1
genotypic markers of resistance to antiretroviral drugs and clinical
measures of disease progression. Antiretroviral therapies have
demonstrated a powerful ability to lower the level of HIV-1 in plasma
and delay the onset of clinical disease and death. Unfortunately,
resistance to these therapies is often rapidly acquired, reducing or
eliminating their usefulness. Making decisions about the next best
treatment for a patient will inevitably depend on the specific
genotypic and phenotypic characteristics of the infecting viral
population. A new classification scheme based on the probabilities of
how new patients will respond to therapy given the available data is
proposed as a method for distinguishing among groups of viral
sequences. This approach draws from existing cluster analysis,
discriminant analysis and recursive partitioning techniques and
requires a model relating genotypic characteristics to phenotypic
response. A dataset of 2746 sequences and the corresponding Indinavir
and Nelfinavir $IC_{50}$s are described and used for illustrative
purposes. Seminar page

*Nonlinear Path Models with Continuous or Dichotomous Variables*

Path models are useful for describing inter-relationships among causally
ordered random variables. Because the sequence of variables is assumed
to be causally ordered, each variable can have both a direct effect
on any subsequent variable in the chain and/or an indirect effect
through its influence on intermediate variables within the causal chain.
Classical approaches to analyzing path models allow only linear
equations with continuous variables. In this work, the traditional
methodology for studying and analyzing linear path models with
continuous variables is extended. Methodology to analyze path models
with nonlinear relationships is developed, as well as methodology for
models containing dichotomous variables. By extending classical
methodology, we develop a ``Calculus of Effects'' which is applicable to
a broader scope of models than the traditional ``Calculus of
Coefficients''. An application to path models in the field of maternal
and child health is included. Seminar page

*Data Mining Techniques for Mortality at Advanced Age*

This paper addresses issues and techniques for advanced age mortality
study using data mining techniques, a new technology on the horizon with
great actuarial potential. Data mining is an information discovery
process that includes data acquisition, data integration, data
exploration,
model building, and model validation. Both expert opinion and
information
discovery techniques are integrated together to guild each step in the
information discovery process. Seven factors were considered in this
study
and the influences of these factors on advanced-age mortality
distribution
were identified with exploratory data analysis and decision tree
algorithm. Models to address their effects on advanced age mortality
were built
with logistic regression technique. These models will be derived for
projecting advanced age mortality distribution. Seminar page

*Functional Mapping of Quantitative Trait Loci Affecting Growth Trajectories*

Growth trajectories, morphological shapes, and norms of reaction are
regarded as infinite-dimensional characters in which the phenotype of an
individual is described by a function, rather than by a finite set of
measurements. We present an innovative statistical strategy for mapping
quantitative trait loci (QTL) underlying infinite-dimensional characters.
This strategy, termed functional mapping, integrates mathematical
relationships of different traits or variables within the statistical
mapping framework. Logistic mapping presented in this talk can be viewed as
an example of functional mapping. Logistic mapping is based on a universal
biological law that growth for each and every living organism follows a
logistic or S-shaped curve with time. A maximum likelihood approach based on
a logistic-mixture model, implemented with the EM algorithm, is developed to
provide the estimates of QTL positions, QTL effects and other model
parameters responsible for growth trajectories. Although logistic mapping is
statistically simple, it displays tremendous potential to increase the power
of QTL detection, the precision of parameter estimation and the resolution
of QTL localization due to the pleiotropic effect of a QTL on growth and/or
residual correlations of growth at different ages. More importantly,
logistic mapping allows for the testing of numerous biologically important
hypotheses concerning the genetic basis of quantitative variation, thus
gaining an insight into the critical role of development in shaping plant
and animal evolution and domestication. The power of logistic mapping is
demonstrated by an example of a forest tree, in which a number of
QTL affecting stem growth processes are detected. The advantages of
functional mapping are discussed.
. Seminar page

*A Latent Class Model Analysis of Case Ascertainment*

The Florida Legislature mandated and funded development of the Florida Birth Defects Registry (FBDR) in 1997. A consortium of state universities planned and built the registry in 1998-99, under contract with the State Department of Health, Bureau of Environmental Epidemiology. The purpose of the registry was to facilitate detection, investigation, and prevention of birth defects in Florida. The registry was based on retrospective surveillance of four statewide databases: Birth Vital Statistics (BVS), the birth hospitalization discharge database of the Agency for Health Care Administration (AHCA), the Children's Medical Services (CMS) Early Intervention Program (EIP) data system, and the CMS Regionalized Perinatal Intensive Care Centers' (RPICC) data system. Each of these source datasets was searched for diagnostic and/or procedure codes that identified children with birth defects. Cases ascertained in this way were accumulated to form the FBDR. Two goals of the consortium were to investigate the accuracy of case ascertainment by each source, and overall, and to estimate the prevalence of birth defects in Florida. Unfortunately, each ascertainment was subject to error and the true status of each child was unknown, thereby complicating the estimation of prevalence and of the accuracy parameters: sensitivity, specificity, positive predictive value, and negative predictive value. In the absence of a perfect indicator of birth defects, latent class model analysis was used to estimate these parameters based on inter-agreement among the imperfect indicators obtained from each source database. The results showed that only the AHCA dataset had high sensitivity and specificity, 0.82 and 0.96 respectively. BVS, EIP, and RPICC had high specificities, 0.99, 0.99, and 0.99, but low sensitivities, 0.30, 0.15, and 0.16 respectively. Overall, the FBDR surveillance system had an estimated 91% sensitivity and 96% specificity for ascertaining cases correctly. These estimates were validated in a separate small-scale study in which the true status of each child was known. The estimated prevalence of birth defects was 2%. Based on the results of the validation study, this appeared to be an underestimate of true prevalence. This talk will focus on the statistical issues faced in this and related situations where several imperfect diagnostic tests are used in the absence of a true indicator of disease status. Keywords are: Diagnostic tests, errors in variables, latent class model analysis, conditional independence, estimated generalized non-linear least squares estimation, Bayes' rule. Seminar page

*CyberStats, an Online Introductory Statistics Course*

Alex is the developer of CyberStats and will use it to
demonstrate Web-based pedagogy in statistics and course
management. CyberStats contains over 500 active simulations and
calculations and hundreds of immediate-feedback practice
items. NSF-supported CyberStats 2.0 reflects extensive and successful
classroom use and is equally applicable to on-campus and to distance
learning courses. Seminar page

*Intrinsic Priors in Problems with a Change-Point*

The Bayesian formulation of the changepoint problem involves priors for
discrete and continuous parameters. When the prior information is vague a
default Bayesian analysis might be useful and presents some difficulties
that can be solved with the use of intrinsic priors. In this paper a default Bayesian model selection approach is considered to
the problem of making inference on the point in a sequence of random
variables at which the underlying distribution changes. Inferences are based on the posterior probabilities of the possible
changepoints. However, these probabilities depend on Bayes factors for which
improper default priors for the parameters leave the Bayes factors defined
up to a multiplicative constant. To overcome that difficulty intrinsic
priors arising from the conventional priors are considered. With intrinsic priors the posterior distribution of the changepoint and the
size of the change can be computed. The results are applied to some common
sampling distributions and illustrations to some much studied dataset are
given. Seminar page

*Empirical and Generalized Bayes Ridge Regression Estimators with Minimaxity and Stability*

I will talk about the classical problem of estimating the regression
parameters in a multiple linear regression model when the multicollinearity is present.
The least squares estimator (LSE) is instable, and one candidate of stabilized procedure is
the ridge regression estimator with parameter k, which is not minimax, however.
Also it includes arbitrariness of k, so that k may be estimated from the data.
However it is known that such adaptive ridge regression estimators do not satisfy the conditions
for the minimaxity under the squared loss in the multicollinearity cases (Casella(1980)).
In this talk, I will employ a weighted squared loss suggested by Strawderman (1978) instead of the
usual squared loss, and derive conditions for adaptive ridge regression estimators to be better than the LSE,
namely, minimax. Especially, the empirical Bayes estimator with estimating the parameter k by the root of
the marginal likelihood equation is shown to satisfy the minimaxity and stability in the multicolliearity cases
and to have very nice risk-performances even for the usual squared loss. The usefulness of the empirical
Bayes estimator will be also illustrated through an example.
As another candidate with stability, I will present the generalized Bayes estimator
against a natural prior and give conditions for the minimaxity. Hence admissible, minimax and
stabilized estimators can be provided. Seminar page

*Piecewise Gompertz Model on Solving Cure Rate Problem*

In cancer research, only some of the patients are disease-free
after certain treatments. It is interesting to compare treatment
efficacy with long-term survival rates. One commonly used approach
in analyzing this type of data is to compare the Kaplan-Meier
estimates of the cure rate. Another approach is to apply the
mixture model proposed by Farewell. However, the Kaplan-Meier
estimates are unstable toward the end point, while the mixture
model is computationally too complex.
To overcome these difficulties, we propose a test based on a
piecewise Gompertz model to compare drug efficacy with the cure
rate. The proposed test also accommodates the situation where
patients display different hazard patterns during different
treatment stages. In this work, we have derived the strict
concavity of the log-likelihood function and the existence,
consistency, and asymptotic normality of the maximum likelihood
estimates of the parameters. In addition, our Monte Carlo
simulation study shows that the proposed test is more
computationally feasible and more powerful than the test based on
Farewell's mixture model. Moreover, an example is given to show
the utility of our proposed test. A goodness-of-fit test of our
proposed model is also discussed. Seminar page

*Signal Identification and Forecasting in Nonstationary Time Series Data*

Traditional time series analysis focuses on finding the optimal model to fit the data in a learning period and using this model to make predictions in a future period. However, many practical applications, such as earthquake time series or epileptic brain electroencephalogram (EEG) time series, may only contain a few meaningful, or predictable patterns, which can be used for meaningful forecasting such as the occurrences of some specific events following similar patterns. In these cases, the traditional time series model such as the autoregressive ($AR$) model usually gives poor predictions since the model is constructed to fit the entire learning period, while the pattern useful for prediction may occur during only a small portion of the period.
The purpose of this research is to provide a statistical algorithm to identify the most predictable pattern in a given time series and to apply this pattern to make predictions.
In this dissertation, we propose the Pattern Match Signal Identification (PMSI) algorithm to identify the most predictable pattern in a given time series. In this algorithm, the concept of the pattern match is used instead of the generally used value match criterion. The most predictable pattern is then identified by the significance of a test statistic. The feasibility of this algorithm is proved analytically and is confirmed by simulation studies. An epileptic brain EEG time series and the well known Wolf's monthly sunspot time series are used as applications of this algorithm.
A forecasting method based on the identified pattern by the PMSI algorithm is introduced. Multivariate regression models are applied to subsequences in the learning period with the most predictable patterns, and these regression equations are used to make predictions in a future period. The performance of this method is compared with the one by the autoregressive ($AR$) models. The two applications (EEG and sunspot time series) show that the proposed forecasting method gives significantly better predictions than $AR$ models, especially for more step ahead predictions. Seminar page

*Two Cheers for P-Values*

P-values are a practical success but a critical failure. Scientists the world over use them but scarcely a statistician can be found to defend them. Bayesians in particular find them ridiculous but even the modern frequentist has little time for them. The invention of P-values is often mistakenly ascribed to RA Fisher but in fact they are far older, dating back at least as far Daniel Bernoulli's significance test of 1734 regarding the inclinations of the planetary orbits. The Bayesian Karl Pearson also used them in his famous paper of 1900 on the chi-square goodness of fit test, some 25 years before the publication of Fisher's influential Statistical Methods for Research Workers. Recently there has been a growing campaign against their use in medical statistics. The journal Epidemiology has even banned them. Bayesian critics have drawn attention to the fact that a just significant result has a moderate replication probability whilst failing to note that this is a desirable and necessary property shared by Bayesian statements. P-values have even been attacked in the popular press.In this talk I shall consider whether there are any grounds for continuing to use this ubiquitous but despised device. Seminar page