Job Hunting in Academe: Landing on the Tenure Track!
Abstract: I will talk about some strategies for job hunt in academia. I will be helpful for students who are about to plunge in the job market next year.
NOTE:Min Lin and Yuehua Cui will also be there. They will also share with us their job hunting experience and tips.
Statistical Functional Mapping for Genetic Control of Programmed Cell Death
Abstract: The development of any organism represents a complex dynamic process which is controlled by a network of genes and environmental factors. Traditional mapping approaches that analyze phenotypic data measured at a single time point are too simple to describe the genetic control of the development process. A general statistical mapping framework called functional mapping, has been developed for mapping quantitative trait loci (QTL) that underlie variation of a complex dynamic trait. In this study, I extend the principle of functional mapping to study the relationship between gene actions and a physiological cell suicide process, programmed cell death (PCD). PCD occurs during the development of most organisms. The biological mechanisms of PCD can be mathematically described as exponential growth for the first stage and polynomial death for the second stage. The developed model incorporates this unique PCD features into the functional mapping framework to localize genes that affect this process. Different modeling strategies are derived to model the mixture proportions for various genetic designs. A series of stationary and non-stationary models are applied to model the covariance structure among the phenotypes at different time points. The model is statistically investigated through simulation studies and validated by a real example for tiller number in rice. Our model provides a quantitative and testable framework for assessing the interplay between genes and organism developmental patterns.
Statistical Methods for Identifying DNA Sequence Variants that Encode Drug Response
Abstract: Drug response is typically a complex trait that is controlled by a network of multifarious genes and environmental factors. With the release of the haplotype map, or HapMap, constructed for the entire human genome based on high-throughput single nucleotide polymorphisms (SNPs), the detection of specific genes affecting responses to drugs can now be made possible. In this talk, I will present a series of statistical models for mapping and identifying genetic variants that are associated with the dynamic features of drug effects. These models are founded on the SNP-based haplotype blocking theory, constructed within the context of maximum likelihood and implemented with the EM algorithm. The incorporation of clinically important mathematical functions for drug response not only makes my models more powerful for gene detection, but also allows for a number of hypothesis tests at the interplay between gene actions/interactions and pharmacological actions. I have performed various simulation studies to test different statistical aspects of my models. The successful detection of DNA sequence variants for drug response in worked examples has validated the usefulness of the models. It can be anticipated that my models will have many implications for elucidating the detailed genetic architecture of drug response and ultimately designing personalized medications based on each patient's genetic blueprint.
Abstract: In the process of model selection and specification it
is of particular importance to have a general and
reliable misspecification test. The IOS test recently
proposed by Presnell and Boos (2004) is a general
purpose goodness-of-fit test which can be applied to
assess the adequacy of any parametric model without
specifying an alternative model. The test is based on
the ratio of in-sample and out-of-sample likelihoods,
and can be viewed asymptotically as a contrast between
two estimates of the information matrix that are equal
under correct model specification. The statistic is
asymptotically normally distributed, but parametric
bootstrap is recommended for computing the p-value of
the test. Using properties of locally asymptotically
normal parametric models we prove that the parametric
bootstrap provides a consistent estimate of the null
distribution of the IOS statistic under quite general
conditions. Finally, we compare the performance of the
IOS test with existing goodness-of-fit tests in
several applications and through simulations involving
models such as logistic regression, Cox
proportional-hazards regression, beta-binomial, and
zero-inflated Poisson models.
Rating Teams and Scheduling Games
Abstract: In this talk, I will first discuss several rating methods based on
pairwise comparisons and other kind of rank data. Then I will
describe the connection between the proportional hazard model for
survival times and the Plackett-Lucy model for ranking, which is
an extension of the well-known Bradley-Terry model. A few
asymptotic results will be presented under these models. In
addition, we will describe a few playoff systems, with emphasis on
seeding knockout tournament.
The Job of Professor
ABSTRACT: This will be an eclectic talk about what takes up a professor's time (teaching, research, etc.), and how to maintain a good mix of activities. There will also be some examples of my own research, and descriptions of some other current projects. If time permits, we can also talk about the editorial process, and what it takes to get a paper published.
Statistical Models for Characterizing Fundamental Patterns Underlying Gene Expression Profiles
Abstract: Recent development of DNA microarray technology that enables the production of massive amounts of genomic data has highlighted the need for powerful pattern recognition techniques that can discover biologically meaningful knowledge in large datasets. In particular, the identification of fundamental patterns or clusters of genes for temporal profiles of their expression can provide a quantitative and testable framework for cutting edge research between gene action and development. Here, we develop a novel statistical model for clustering gene expression profiles based on their underlying physiological functions. This model integrates the Fourier series approximation of time-dependent gene expression with the statistical modelling of the structure of the covariance matrix across the time course within the framework of finite mixture models. By estimating and testing the Fourier coefficients that determine the shapes of temporal profiles, the patterns of gene expression can be compared and their functions determined. The statistical properties of our model are studied through computer simulation.
Quantile Dispersion Graph for Poisson Regression Models
Abstract: The choice of design for a generalized linear model depends on the unknown parameters of the fitted model. This poses a difficult problem since the purpose of a design is to provide efficient estimates of the model’s parameters. One approach to solving this problem uses the so-called quantile dispersion graphs (QDGs) of the mean-squared error of prediction (MSEP) associated with a given model. These are plots of the maxima and minima, over a parameter space, of the quantiles of the MSEP, which are obtained on concentric surfaces inside a region of interest. The plots provide a comprehensive assessment of the quality of prediction afforded by a given design. They also portray the dependence of the design on the model’s parameters. The application of the QDGs is demonstrated using a model with a logarithmic link function and a Poisson-distributed response variable. Several variants of these conditions are considered, including a square root link in conjunction with the Poisson distribution and several other combinations. The results indicate that the choice of the link function and/or the nature of the response distribution can have an effect on the shape of the QDGs for a given design.
Wavelet and SiZer Analysis of Internet Traffic Data
Abstract:It is important to characterize burstiness of Internet traffic and find the causes for building models that can mimic real traffic. To achieve this goal, exploratory analysis tools and statistical tests are needed, along with new models for aggregated traffic. This talk introduces statistical tools based on wavelets and SiZer (SIgnificance of ZERo crossings of the derivative). The intricate fluctuations of Internet traffic are explored in various respects and lessons on long range dependence and non-stationarities from real data analysis are summarized.
A Joint Model for the Association Between Longitudinal Binary and Continuous Process
A joint model for the association of longitudinal binary and continuos processes is proposed. The model is used for the analysis of the experiment, in which moderate-intensity exercise wes used as an adjunct to smoking cessation, and weight gain and smoking quit status were measured repeatedly on subjects through 8 weeks, to assess the interrelation of them across treatments. The main question of interest is the effect of the treatment on the relationship between smoking cessation and weight gain. The model is reparameterized such that the dependence can be characterized by the unconstrained regression coefficients. Bayesian variable selection techniques are used to parsimoniously model these coefficients for each treatment. An MCMC algorithm is developed for estimating the parameters by implementaing the data augmentation step and the posterior sampling step.