The University of Florida Statistics Symposium on Selected Topics in Nonparametric Methods
January 2223, 1999 



William
Schucany
Southern Methodist University
Many practical problems involve detecting trends. By considering differences in adjacent levels, an increasing sequence of three may be transformed to an equivalent problem of detecting the positive quadrant. The likelihoodratio test (LRT) of the null hypothesis that a multivariate mean equals zero versus the positiveorthant alternative is reexamined. Perlman (1969) derived the exact null distributions under normality for general cone alternatives. However, because these distributions depend on the unknown covariance matrix, the usable critical points have only been bounds. For important cases the resulting onesided tests are biased. The disappointing performances of these approximate LRT have been the subject of several critical articles over the years.
The bootstrap can rescue the LRT by estimating the approximate critical point. Monte Carlo comparisons confirm its superiority to Hotelling's T^{2}, a "halfspace" alternative investigated by Tang (1994), and closely related simple test due to Follmann (1996). Wang and McDermott (1998) show that their conditional LRT is unbiased and uniformly more powerful than these but they must evaluate integrals numerically. The proposed bootstrap test performs well even if the distribution is not multivariate normal. In addition, it is easy to extend the methodology to general coneshaped critical regions.
Brett
Presnell
University of Florida
We introduce a new nonparametric approach to distancebased robust estimation, in which parameters are estimated by evaluating the corresponding functional on a weighted version of the empirical distribution. We call this biasedbootstrap "trimming", although no observations are actually deleted by the procedure. Our approach has two variants, corresponding to two different methods of choosing the weights.
In fixeddispersion biasedbootstrap trimming, weights are chosen to minimize the distance between the weighted and ordinary empirical distributions, while fixing a measure of the dispersion of the weighted distribution, such as its (generalized) variance. We measure distance by Read and Cressie's powerdivergence, though other distances can be used. In practice this approach requires an external robust estimate of scale, but yields a redescending estimator that is competitive with the Huber estimator in location problems.
A dual to the fixeddispersion approach is fixeddistance trimming. Here a measure of dispersion is minimized while fixing the distance between the weighted and ordinary empirical distributions. Importantly, an appropriate distance can be determined via breakdown considerations, and without reference to the data at hand or its dimension. In the case of location estimation, this yields a robust estimator that requires no external estimate of scale. The method is thus especially useful in multivariate problems where robust estimates are not readily available. In fact, if the measure of dispersion used is the generalized variance, then the fixeddistance approach yields robust and affine equivariant estimates of both location and scale.
Thomas
Hettmansperger
Pennsylvania State University
In this talk we consider ways to estimate the mixing proportions in a finite mixture distribution or estimate the number of components of the mixture distribution without making parametric assumptions about the component distributions. We require a vector of observations on each subject. This vector is mapped into a vector of zeros and ones and summed. The resulting distribution of sums can be modelled as a mixture of binomials. We then work with the binomial mixture. Efficiency and robustness of this method are compared to the strategy of assuming multivariate normal mixtures when, typically, the true underlying mixture distribution is different. It is shown that in many cases the approach based on simple binomial mixtures is superior.
Pranab K. Sen
University of North Carolina, Chapel Hill
The empirical Bayes methodology, in a parametric setup, incorporates a conjugate prior in the estimation of the Bayes (shrinkage) factor that characterizes the estimator. The situation becomes more complex in semiparametric and (even more in) nonparametric models. Using the Hajek convolution theorem and firstorder asymptotic representations for semiparametric and nonparametric estimators it is shown that the Zellner (Gaussian) gprior on the regression parameters can be readily adopted to formulate suitable empirical Bayes estimators that are essentially related to the robust Steinrule estimators. Empirical Bayes statistical functionals and adaptive versions are also considered in the same vein.
Myles Hollander
Florida State University
We consider the problem of nonparametrically estimating the distribution function governing the time to occurrence of a recurrent event based on data accruing from an informative sumquota stopping rule. Finitesample and asymptotic properties of the estimators are presented. Furthermore, loss in efficiency is studied for the cases (1) when the rightcensored last observation for each unit is ignored and (2) when only the first observation for each unit is used. The procedures are illustrated using the gastroenterology data set in Aalen and Husebye (1991).
P. V.
Rao
University of Florida
Permutation tests are proposed for comparing marginal survival functions based on three independent randomly right censored samples  a sample of size n from a bivariate lifetime distribution and samples of sizes s and k from the two marginals. Exact tests for testing equality of survival functions based on two independent samples and paired samples can be obtained by setting n = 0, s, k > 0 and n > 0, s = k = 0, respectively. Since there exist several tests for these two cases, the proposed tests provide new alternatives. However, there is no satisfactory method for comparing survival functions when the data consist of n complete pairs and s + k singletons (n > 0 and s + k > 0). The proposed tests fill this gap by providing a method for handling this case. The null distribution of the test statistic is asymptotically normal and simulation results show that the tests have good power for detecting scale and location shifts in exponential and loglogistic distributions.
Dennis Boos
North Carolina State University
New resamplingbased tests should be evaluated by Monte Carlo simulation, but the resulting nested computations can be overwhelming. A new generalized jackknife procedure is proposed to reduce the size of the inner resampling loop to be much smaller than would be used in the analysis of real data. A version of the method that uses simple linear extrapolation is shown to perform well in correcting for bias and thus reduces overall computation time in Monte Carlo power studies.
Michael
Ernst
University of Florida
Because many bootstrap problems are analytically intractable, the bootstrap is commonly viewed solely as a resampling technique. We show that for the broad class of statistics that are linear combinations of order statistics (Lestimators) exact analytic expressions for the bootstrap mean and variance can be obtained, eliminating the error due to bootstrap resampling. The expressions follow from direct calculation of the bootstrap mean vector and covariance matrix of the whole set of order statistics. We examine the nonnegligible error of the resampling approach for estimating the bootstrap variance using some classical Lestimators such as the trimmed mean and the median on some real data and consider an application of these exact estimates in linear regression. We also consider exact percentiles and moments of more general functions of order statistics.
Jana
Jurecková
Charles University, Prague, Czech Republic
Jureckova (1981) introduced a tail behavior measure of performance of equivariant estimators of location. He, et al. (1990) showed that, for a broad class of estimators, this measure is equivalent to the finite sample breakdown point of Huber and Donoho and extended this measure to the linear regression model.
Koenker and Bassett (1978) introduced regression quantiles to generalize the notion of order statistic from the case of a single sample to the linear regression setting. Since then, most of the onesample properties of quantiles have found natural analogues in the regression quantile situation. The extreme regression quantiles, analogous to the smallest and largest order statistics, were only recently considered in depth by Smith (1994), who developed some asymptotic results for algebraically tailed error distributions.
We shall consider the tail behavior of a linear form of extreme regression quantiles in the linear regression model with regression matrix of order n x p. In the location submodel, this form coincides with the pertaining extreme order statistic. The tail behavior differs depending on whether the basic distribution of errors has heavy (algebraic) or exponential tails and is generally different from that in the location model. For the exponentially tailed distribution, it involves the dimension p, while in the case of heavy tailed distribution it is more similar to the location situation.
Joseph Gastwirth
George Washington University
The selection of a single method of analysis is problematic when the data could have been generated by one of several possible models. We show that two efficiency robust approaches both depend on the correlation matrix of the standardized optimum test statistics for each of the models. The first procedure, the MERT, uses the linear combination of these statistics that maximizes the minimum efficiency across all models in the family. The second procedure, the MX, uses the maximum of the optimal statistics. We show that both the MERT and MX approaches can be used to obtain efficiency robust procedures for a wide variety of survival analyses and dose response data, including the combination of ordered tables. We illustrate these with several biomedical data sets. Because the distribution of the maximum of several statistics is more complicated than the corresponding MERT, we provide guidelines derived from the correlation matrix of the optimum tests to assist the user in deciding when the MX statistic is preferable to the MERT.
Hannu Oja
University of Jyväskylä, Finland
We consider affine equivariant empirical multivariate sign and rank processes based on the Oja (1983) median. It is shown that the corresponding multivariate sign and rank covariance matrices are then also affine equivariant and carry information about the shape and geometry of the underlying distribution. In the elliptic case, the eigenvalues and eigenvectors of the usual covariance matrix, i.e., the correlation structure, as well as (multivariate multiple) regression coefficients can be estimated using these matrices. Robustness and efficiency properties of the resulting estimates are illustrated with some examples, empirical sensitivity (influence) plots and simulation studies. Finally, tests for multinormality (which compare the sizes, i.e., determinants, of the covariance matrices) are proposed.
Regina
Liu
Rutgers University
A data depth is a measure of how deep or central a given point is with respect to a multivariate distribution. It gives rise to a new set of parameters for the distribution, which can easily quantify its many complex multivariate features. These parameters can also be visualized by simple graphs. Furthermore, the centeroutward ranking of the sample points provided by a data depth leads to a systematic nonparametric inference scheme. We discuss examples in rank tests, multivariate process control charts, constructions of confidence regions, determinations of Pvalues for a vast range of tests, and regression analysis.
John
Marden
University of Illinois
In multiple dimensions, the distribution of the multivariate ranks defined using the multivariate signs depends on distribution of the underlying data, so that procedures based on these ranks are not distributionfree. (They are distributionfree in one dimension.) Several authors have developed multivariate rankbased procedures that are affine invariant and distributionfree under elliptically symmetric distributions.
This talk explores an approach that is conjectured to give distributionfree multivariate ranks under very general conditions. The approach is to take the multivariate ranks, then the multivariate ranks of the ranks, then the multivariate ranks of those, etc., until these ranks converge. The conjecture is that the limiting distribution of these "iterated" ranks is independent of the underlying distribution of the original variables, at least if the underlying variables have a continuous distribution not concentrated on a lowerdimensional plane.
Evidence for the truth of the conjecture arises from taking various data sets and finding the multivariate iterated ranks of the observations. In all examples (as long as the points are not exactly collinear), the iterated ranks appear to converge to a spherically symmetric distribution. Furthermore, when the sample size is not too small, this distribution appears to be approximately the same spherically symmetric one no matter what the initial distribution. For two dimensions, this conjectured limiting distribution has a density over the unit disk that is not uniform but more concentrated at radii around 0.70. In higher dimensions, it appears that the distribution is uniform on the sphere.
If this approach is valid, one can easily perform truly nonparametric analysis in oneway multivariate analysis of variance and regression. Tests of goodness of fit of data to a particular distribution F can be constructed by looking at the iterated ranks relative to the target F. Inverting the iterated rank function would provide a method for simulating observations from F, analogous to the inverse probability transform for univariate distributions.
Ronald
Randles
University of Florida
The assumptions which underly mutivariate sign tests are compared in order to concentrate on weak assumptions that still yield a permutation principle for affineinvariant methods. A transformationretransformation type of multivariate sign test is proposed, which has a distributionfree property similar to that of the interdirection sign test (Randles(1989)). It is simpler in nature and easier to compute. It can readily be applied to data of any practical dimension. Its performance characteristics are demonstrated.


Mike Ernst (mernst@stat.ufl.edu) and was last modified on 

