ORIE 576 - SPRING 2004 TAKEHOME PRELIM EXAM - DUE April 21 ----------------------------------- 1. Find a multivariate dataset suitable for a multiple regression analysis. The dataset must not be taken from the course textbook. However, it can be from any other statistics text, from a journal, or from data sources on the internet. The data set must include at least four independent variables (predictors) not including polynomial and interaction terms. When you find a suitable data set, send me an email with a brief description of it, and a reference for its source. If two students choose the same data, the first one to email me will have priority. 2. Conduct an all subsets regression analysis to find the best models based on the Mallow's Cp criterion. Compare the Cp results with those based on adjusted Rsquared. Also compare your all-subsets results with the model found using the forward stepwise regression procedure. 3. After selecting a model (or a small set of models), use the diagnostic tools discussed in class (such as partial regression, studentized residual, leverage and Cook's distance plots) to check the MLR model assumptions, and to identify potential problems (such as influential observations and outliers). In some cases the diagnostics may suggest modifications to your model(s). 4. Type a short report summarizing your findings. Your report should be self-contained, including a description of the data, the type of models considered, and your results. In particular, what does your final model say about the relationships between the response and predictors? Only include relevant statistical summaries and plots in your report. An outline might be as follows: Introduction: Description of the data and models considered. Model selection: Summary of results of model selection procedures. Diagnostic checks: Summary of model diagnostics. Conclusions: Have you found a good model for the data? What does the model tell you about the relationships among the variables? The target readership for your report should be your classmates; i.e. someone who is not familiar with your data, but has a similar knowledge of statistics to yourself.