This project is intended to provide students in STA 6127 with a guide to using SPSS. It covers the material presented up through the first exam, including multiple regression, analysis of variance (ANOVA), and analysis of covariance (ANCOVA). The topics in this paper are presented in approximately the same order that they were introduced in the class, with examples taken from the homework problems and course web page to illustrate SPSS outputs. The instructions included in this guide are applicable to SPSS versions 9 and 10. Students using earlier versions may wish to consult the guide to using SPSS included in the appendix of the Agresti and Finlay text. In addition to the instructions provided in this guide, students should also note that the SPSS program provides both a topics and tutorial help function. The topics help function provides descriptions and instructions for most of the functions provided for by SPSS and can be accessed by selecting TOPICS from the HELP menu. The tutorial help function actually guides users through a limited set of SPSS functions, providing both visual and text instructions needed to complete a desired task. It can be accessed by selection TUTORIAL from the HELP menu.
Reading Text File: Several of the homework problems require the use of statistical software to analyze and interpret data found on the course web page (http://www.stat.ufl.edu/users/aa/ social/data.html). Rather than reentering this data manually, students can use SPSS to read this data into the program. The first step in this procedure is to save the material on the web page in a text format. To do this, select the SAVE AS option in the FILE menu of your web browser. In the Save as Text box, pull down the menu and select Plain Text (*.txt). Give the file a name and save it to a convenient location on your computer. Second, you will need to clean up the saved text file before reading a specific data set into the SPSS editor. In order to do this, open up the text file and delete all text not affiliated with the specific data set that you wish to examine. SPSS can read variable labels and, therefore, these should not be deleted. The student should note that SPSS can read only one data set at a time thus if you are interested in the homes data for Gainesville, for example, you should delete all of the text pertaining to other data sets. Make sure to save the changes you have made and close the text file.
Third, now you are ready to read the text file into the SPSS editor. In order to do this, open the SPSS program and click on the READ TEXT DATA option in the FILE menu. Select the text file that you just saved and click Open this will open the text import wizard. In the first frame, you are being asked whether you would like the data read according to a predefined format. Since you have not already created a format to read the data, select No (the default selection) and click Next. In the second frame, you are being asked whether the data in your text file is delimited or fixed width. Delimited means that the data is separated by a common character, such as a comma, tab, or space. Fixed width, by contrast, means that the data is arranged in rows and columns, without a common character separating them. The web data is arranged in this latter format. Thus, you should click on the Fixed Width option. If you kept the variable labels in your text file, you should also select Yes in the Variable Names Include At Top of File option and then click Next. In the third frame, the default options already selected are appropriate for reading the web data. This frame asks you to identify the line location where the data begins (which is line #2 if you opted to include variable labels), the number of lines per case (in this instance 1), and the number of cases you would like read into the SPSS editor. The fourth frame asks you to identify the column divisions between the data. Move the lines so that the data for each variable fits between them. Make sure to screen your data before moving on to the next frame, as some of the data further down in the columns might cross one of the lines (and thus would be treated as two variables).
In the fifth frame, you have the option to change your variable labels. You should note that some of the variable labels used in the web data sets are reserved by SPSS (i.e. won't allow you to use them) if this is the case, the variable labels applied by SPSS are likely incomplete and may be in the wrong place. It is, therefore, a good idea to check all of your variables to ensure that they are properly labeled. You should also check to ensure that SPSS has made the correct choice in interpreting your data as either string (i.e. text) or numeric. Both of these checks can be done either in the fifth frame of the conversion process or in the SPSS editor itself in my personal experience, I have found it easier to do the latter. Finally, the sixth frame asks whether you would like to save this format for the future. Clicking Finish reads the data into the SPSS editor. Make sure to check over your data, as unintended spaces in your text file may result in an added variable or case that needs to be deleted. Also, you can change variable names and text format by clicking on the VARIABLE VIEW tab at the bottom of the page. Now the real fun begins!!
Multiple Linear Regression (no interaction): For multiple linear regression in which both the response and explanatory variables are quantitative, choose REGRESSION from the ANALYZE menu and select the LINEAR suboption. Enter your response variable in the Dependent box and your explanatory variables in the Independent box. Clicking on Statistics in the main dialogue box opens up a number of options to the user. The Estimates option (default) provides for parameter estimates in the SPSS output. The Confidence Intervals option creates a 95% confidence interval for these parameter estimates. The Covariance Matrix option provides a matrix containing the Pearson correlation between all pairs of variables in the model. The Model Fit option (default) provides many of the values essential to regression analysis, including the multiple correlation (R), coefficient of multiple determination (R2), sums of squares, mean squares, the F value, and p value. The R Squared Change option indicates changes in the R2 statistic produced by adding or deleting a variable. The Descriptives option provides means and standard deviations for the individual variables. And the Part and Partial Correlations option lists partial correlations between each individual explanatory variable and the response variable, controlling for the other variables in the model. Select the options desired and then click Continue to return to the main dialog box. Clicking on Plots in the main dialogue box and then marking Produce All Partial Plots produces partial regression plots, which are useful for ensuring that the relationship between an explanatory and response variable is linear (an important assumption in multiple regression) when controlling for all other variables in the model.
An example of an SPSS regression printout is included in Appendix I. This example uses the data for question 5 in chapter 11 of the Agresti and Finlay text. Most of the data output should appear familiar to the student in this course; however, there are a few notable differences from a SAS printout. The =ANOVA table presents F and p values for the regression model. The Regression Sum of Squares in the SPSS printout is the same as the Model Sum of Squares in the SAS printout. Likewise, the Residual Sum of Squares in the SPSS printout is the same as the Error Sum of Squares in the SAS printout. In the =Coefficients table, the Beta values represent standardized coefficients.
Partial correlation values can be obtained as part of the regression analysis (by checking the Part and Partial Correlations option in the Statistics option box) or separately by choosing CORRELATE from the ANALYZE menu and then selecting the PARTIAL suboption. Using the latter method, the student should enter the two variables for which a partial correlation is being sought in the Variables box and the control variable(s) in the Controlling For box.
Multiple Linear Regression (with interaction): SPSS is not as user-friendly as SAS in dealing with interaction between explanatory variables in a multiple regression model. Two approaches, however, can be employed within SPSS for addressing interaction. The first approach requires the user to construct the interaction variable(s) within the SPSS data editor. This can be accomplished by selecting the COMPUTE option within the TRANSFORM menu. The student should label the interaction term in the Target Variable box and create the mathematical formula for the interaction term in the Numeric Expressions box (e.g. metro*povrate is the interaction formula for the explanatory variables metro and povrate). Click Ok when finished and observe the appearance of a new column in the SPSS editor for the interaction term. This interaction term can then be entered into the multiple linear regression equation using the methods described above. In practice, this first method can be cumbersome to use and is best suited for cases where there are only a few cross-product (i.e. interaction) variables.
A second method for analyzing interaction within a multiple regression model requires the student to use the general linear model function rather than the multiple linear regression function within SPSS. This second method is well suited for addressing multiple interaction terms, but presents output data in a slightly different form than the multiple regression method and offers fewer options for data analysis. Click the GENERAL LINEAR MODEL (GLM) in the ANALYZE menu and select the UNIVARIATE suboption. Enter the response variable into the Dependent Variable box and the explanatory variables into the Covariate(s) box. The student should note that the UNIVARIATE GLM function within SPSS can also be used for ANOVA and ANCOVA, for which quantitative variables are always entered in the Covariate(s) box and fixed, qualitative variables are entered in the Fixed Factor(s) box.
In order to test for interaction, click on the Model box and select the option labeled Custom. Under the Build Terms arrow select the option labeled Interaction. Select the terms that you would like included in the model by highlighting them in the Factors and Covariates box and then using the arrow button to move them to the Model box. Students should note that interaction terms are added to the model by highlighting two or more variables in the Factors and Covariates box and then clicking the arrow. The All Two Way, All Three Way, etc. options can also be used to facilitate the construction of interaction terms. Once you have finished entering the main effects and interaction variables into your model, click Continue and return to the main dialog box. In order to display parameter estimates for the model, select the Options box and check the Parameter Estimates option. Click Continue to return to the main dialog box and then OK to perform the regression analysis.
An example of an SPSS output for a multiple linear regression model with interaction is included in Appendix II. This example uses data from question 13 in chapter 11 of the Agresti and Finlay text. Again, the SPSS product differs somewhat from its SAS counterpart, though most of the information should be recognizable to the student in this course. The Corrected Model Type III Sum of Squares is equivalent to the Regression Sum of Squares in the SPSS multiple regression output and the Model Sum of Squares in SAS. Likewise, the Error Type III Sum of Squares is equivalent to the Residual Sum of Squares in the SPSS multiple regression output and the Error Sum of Squares in SAS. The Type III sum of squares for the interaction term (and the F and p values calculated from it) are used to test the null hypothesis that there is no interaction occurring. In the case of a single interaction term, the t value for the interaction term can also be used to test the null hypothesis of no interaction.
One-Way ANOVA: For a one-way ANOVA in which there is a quantitative response variable and a single qualitative explanatory variable, click COMPARE MEANS in the ANALYZE menu and select the ONE-WAY ANOVA suboption. Place the quantitative response variable in the Dependent List box and the qualitative explanatory variable in the Factor box. By clicking on the Post-Hoc box, the student can select the LSD and/or Bonferroni confidence interval options, setting the desired confidence level in the Significance Level box (the default significance level value is .05). By clicking on the Options box, the student can elect to have SPSS provide Descriptive statistics and a test for Homogeneity-of-Variance. The test for homogeneity of variance, labeled the Levene Statistic on the SPSS printout, is used to check for equal variances across the groups of the explanatory variable equal group variances being one of the assumptions of ANOVA. The Levene statistic provides both an F and p value, both of which are interpreted as a test of the null hypothesis that the group population variances are equal. An example of an SPSS output for a one-way ANOVA is included in Appendix III, using data from question 1 in chapter 12 of the Agresti and Finlay text.
Factorial ANOVA and ANCOVA: SPSS uses the same GLM univariate procedure for handling both factorial ANOVA and ANCOVA. This is also the same procedure used for multiple linear regression with interaction, and thus the student may wish to review that section of this paper. Factorial ANOVA pertains to models with a quantitative response variable and two or more qualitative explanatory variables. ANCOVA is used with models that have a quantitative response variable, one or more quantitative explanatory variables, and one or more qualitative explanatory variables. To perform a factorial ANOVA or an ANCOVA, click on the GENERAL LINEAR MODEL option in the ANALYZE menu and select the UNIVARIATE suboption. Place the explanatory variable in the Dependent Variable box. Qualitative, categorical variables should be place in the Fixed Factor(s) box and quantitative variables in the Covariate(s) box. The student should be aware that if the categories of the qualitative variable have been converted to dummy variables in the SPSS data editor (i.e. each category of the qualitative variable is set up as a separate variable, with each case assigned a value of 1 or 0), then these dummy variables should be treated as quantitative data and placed in the Covariate(s) box. Further discussion on working with dummy variables will be provided in the next section.
As already described in the section on multiple regression with interaction, interaction effects can be evaluated by clicking on the Model option in the main dialog box, selecting the Custom suboption, and then adding main effects and interaction variables into the model to be tested. By clicking on Options in the main dialog box, the student can select options for Descriptive Statistics and Parameter Estimates. The student can also create Bonferroni or LSD confidence intervals for the categorical variables. To do this, select the qualitative variable for which a confidence interval is desired and move it using the arrow key to the Display Means For box. Check the Compare Main Effects box and select the preferred confidence interval measure from the drop-down window. Indicate the desired confidence level in the Significance Level box below. Click Continue to return to the main dialog box and then Ok to perform the statistical analysis. An example of an SPSS output for an ANCOVA without interaction is included in Appendix IV. The data for this output was taken from Table 13.1 of the course web page, with income (in thousands of dollars) being the response variable, education the quantitative explanatory variable, and race the qualitative explanatory variable.
A Note on Dummies: In dealing with qualitative, categorical data, the student should be careful to observe how the data has been entered into the SPSS editor. Categorical data can be entered into the SPSS editor using one of two methods. First, categorical data can be entered into the SPSS editor as a single variable. For example, the variable race might entered into a single column within the SPSS editor, with the values black, hispanic, and white used to indicate the race of the respondent in the cells within that column. If categorical data has been entered into the SPSS editor in this manner, then the student should treat this variable as qualitative for purposes of conducting statistical analyses with SPSS. Thus, the student conducting a one-way ANOVA should use the one-way ANOVA procedure outlined above rather than the linear regression procedure. Likewise, in using the general linear model procedure to conduct a factorial ANOVA or an ANCOVA, the student should enter the categorical data in the Fixed Factor(s) column.
Second, categorical data can also be entered into SPSS using dummy variables. In this instance, the categories of the qualitative variable would be divided up into separate column variables (e.g. a separate column variable would be created for black and hispanic ), with a 1 or 0 used to indicate whether the respondent exhibited the column characteristic. If the data for a given qualitative variable has been entered into the SPSS editor using dummy variables, then the student should treat the set of dummy variables as quantitative data. Thus, the student can use the multiple linear regression procedure to examine the influence of the dummy set on the explanatory variable. Or if the general liner model procedure is used, the student should enter the dummy variable(s) in the Covariate(s) box. The student should recall that the final category of a qualitative variable is not needed for conducting statistical analyses using dummy variables thus this category will likely be omitted in the SPSS data editor and should be omitted when entering categorical variables into the model using either the multiple regression or general linear model procedures.
In order to better understand the different methods for entering categorical data into the SPSS data editor, it might be helpful to examine Table 13.1 on the course web page. In this example, data for the variable =ethnic group has been entered twice. First, it was entered as a single variable under the column heading =race. Second, it was also entered as separate dummy variables. A dummy variable (=z1 ) was constructed to identify respondents who were black. A second dummy variable (=z2 ) was constructed to identify respondents who were Hispanic. Of course, respondents who scored a zero on both dummy variables would be classified as white. Appendix V presents an example of a multiple regression output using dummy variables. This example uses the same data set as in Appendix IV but uses the race dummy variables rather than the race single category variable. The student should note that the parameter estimates, F value, p value, and individual t values are the same for the two outputs.
Repeated Measures: In order to conduct a repeated measures ANOVA, click on the GENERAL LINEAR MEASURES option in the ANALYZE menu and select the REPEATED MEASURES suboption. In the Within-Subject Factor Name box, provide a label that characterizes the shared relationship between the groups (e.g. status). In the Number of Levels box, enter the number of groups to be compared. Click the Add button and then the Define button. The next screen asks you to define the within-subject variables. Move each of the groups into the Within-Subjects Variables box. By clicking on the Options box, the student has the option to select Descriptive Statistics, Parameter Estimates, and Confidence Intervals. The SPSS output for repeated measures ANOVA provides for a number of statistical measures that go beyond the scope of the course (and this project). For the purpose of answering the questions in chapter 13 of the Agresti and Finlay text, the student should examine the table entitled Tests of Within-Subject Effects and the values labeled Sphericity Assumed. Appendix VI provides an example of part of an SPSS output from a repeated measures ANOVA. The data in this example was taken from question 26 in chapter 12 of the Agresti and Finlay text.
Agresti, Alan and Barbara Finlay. 1997. Statistical Methods for the Social Sciences. 3rd Edition. Upper Saddle River, NJ: Prentice Hall. Pp. 658-666.
Martinez, Michael. 2000. Conversation with Author. Department of Political Science. University of Florida.
SPSS Inc. 1999. SPSS Base 10.0 Applications Guide. Chicago, IL: SPSS Inc. Pp. 117-213.