A new test of model misspecification is proposed, based on the ratio of in-sample and out-of-sample likelihoods. The test is broadly applicable, and in simple problems approximates well known and intuitively appealing methods. Using jackknife influence curve approximations, it is shown that the test statistic can be viewed asymptotically as a multiplicative contrast between two estimates of the information matrix, both of which are consistent under correct model specification. This approximation is used to show that the statistic is asymptotically normally distributed, though it is suggested that p-values be computed using the parametric bootstrap. The resulting methodology is demonstrated with a variety of examples and simulations involving both discrete and continuous data. This is joint work with Dennis Boos.

Seminar page

Click Here for a PDF version of this abstract.

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page

Seminar page

.Seminar page

Seminar page

Seminar page

We first develop asymptotic information bounds and the form of the efficient score and influence functions for coefficients in semiparametric regression models fitted to two phase stratified samples, and point out the relationship of this work to the information bound calculations of Robins and colleagues. By verifying conditions of Murphy and van der Vaart for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.

Using data from the National Wilms Tumor Study, and simulations based on these data, we then demonstrate the advantages of careful selection of the phase two sample and use of an efficient analysis method. One can limit collection of data on an "expensive" covariate to a fraction of the phase one sample, yet almost exactly reproduce the results that would have been obtained with complete data for everyone. The basic principles include: (i) fine stratification of the phase one sample using outcome and available covariates; (ii) use of a near optimal "balanced" rather than a simple case-control sample at phase two; and (iii) fully efficient estimation. These same basic principles extend to the design and analysis of two phase exposure stratified case-cohort studies, which involve a censored "survival" outcome and on which much work is in progress.

Portions of this work are joint with Nilanjan Chatterjee, Brad McNeney and Jon Wellner. Seminar page

The NWTSG Data and Statistical Center, located in Seattle for the 33 year duration of the study, played a major role in this effort. Systematic follow-up of surviving patients documented the long term "costs of cure" and the wisdom of reserving the most toxic treatments for those who actually needed them. Secondary cancers, for example, which once affected 1.6% of Wilms tumor survivors by 15 years from diagnosis, have been much reduced since 60% of patients no longer receive radiation therapy.

Statistical study of the NWTSG database has challenged prevailing theories for the genetic origins of WT and led to new hypotheses for investigation by molecular biologists.

This talk considers three issues: (1) whether all bilateral and multicentric WT are hereditary; (2) whether Asians lack WT caused by loss of imprinting of the insulin growth factor gene IGF2; and (3) whether constitutional deletion of the WT gene WT1 in patients with the WT-aniridia (WAGR) syndrome has a less severe effect on renal function than a point mutation in WT1 in patients with the Denys-Drash syndome. Key factors that facilitated these statistical contributions include a compulsive effort to maintain continuity in data collection and follow-up and a constant search for ways to use the clinical and epidemiologic data to answer questions of basic biological significance. Seminar page