Data Analysis with IBM SPSS Statistics
上QQ阅读APP看书,第一时间看更新

Using explore to check subgroup patterns

While explore is useful for looking at the distribution of individual fields, it is particularly helpful for the investigation of patterns across subsets of the data. We'll look at an example of this approach next. Go back to the Explore dialog box, the HIGHEST YEAR OF SCHOOL COMPLETED field should still be in the upper Dependent List box (if not, add it). In the lower Factor List, add REGION OF INTERVIEW and click on OK.

The descriptives produced by explore now contain a separate set of results for each of the nine regions used to group the states for the purposes of the survey. Values for New England (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont) are shown first (see Figure 12) as this region is coded with the value 1 in the data.

This area of the US is relatively well-educated as can be seen by the mean (14.29) and median (14) values in the table:

By comparison, the West South Central region (Arkansas, Louisiana, Oklahoma, and Texas), which is coded 7 in the data, has a lower mean (12.91) and median (12) years of schooling:

The stem and leaf plot for the New England region (see the figure below) indicates that there are only two extreme values and a large proportion of individuals with 14 and 16 years of education:


The corresponding plot for the West South Central region, shown in the following figure, has 19 extreme values at the lower end, 8 or fewer years, and another 19 extreme values at the higher end, 18 or more years of schooling. It is also evident that in this area of the US, people very often finish their education after 12 years when they complete high school:

The boxplot (following figure) included in the explore output provides an excellent visual depiction of the pattern across the groups and highlights potential areas to address in terms of the distribution of education. At a glance, one can see that five of the regions (New England, Middle Atlantic, South Atlantic, Mountain, and Pacific) have a similar pattern in terms of the median (14), size of the box, and small number of extreme values. By contrast, the West North Central and West South Central regions have a lower median value (12), a smaller box indicating a concentration of values just above the median, and several extreme values at both the top and bottom. These patterns are important because the variance across, groups involved in an analysis is assumed to be consistent and, when that is not the case, it can cause problems. The boxplot is a convenient means of comparing the variability of the subgroups in the data visually on a single page:

The vertical axis was modified to add more values. Chapter 5, Visually Exploring the Data, will discuss how to modify the charts produced by SPSS.