1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation. These are intended to serve as a resource for the Soci 380 data analysis. The information provided here is not exhaustive. There is more to learn about assumptions, applications, and interpretation of these procedures. Further information can be obtained in statistics textbooks and statistics courses. Crosstabs: Crosstab is short for cross-tabulation or cross-classification table. In its basic form it is a bivariate table. Usually the independent variable is represented by the columns and the dependent variable is represented by the rows. One can use any variables with any level of measurement in a crosstab – but usually they are constructed using nominal or ordinal variables. Because interval/ratio variables tend to have many potential variables, crosstabs are usually impractical for these levels of measurement. More complex multivariate crosstabs can also be constructed (e.g., where a third variable is controlled), but for the purposes of Soci 380 I will suggest that you just focus on bi-variate crosstabs. The data in crosstabs is usually presented either as percentages, or frequencies. Percentages can pertain to the cell as a function of either: 1) the column, 2) the row, 3) the total. In constructing a crosstabulation for a report you should make clear which of these types of percentages are being calculated. (This can often be done easily by providing a total percentage at the end of the row or column.) In providing descriptive interpretation of results one can discuss the relative frequency or percentage of cases falling in particular cells. Usually this is done in reference to the column variable. E.g., 35% of women strongly agreed with statement X, while only 15% of men strongly agreed with statement X.
2 Chi-Square: Technically this is a “test of statistical independence”. That is, if two variable are unrelated then they are independent of one another. If not, they are dependent. Another way of thinking about this is that they are “associated”. Chi-square can be used with nominal and ordinal variables. If the significance value associated with the chi-square test is less than or equal to .05, then the test is deemed to be statistically significant and you can interpret the two variables in the test as being dependent – or associated. There are several limitations to the chi-square test. Two of these are: 1) the test does not tell you about the direction of an association (e.g., positive or negative), 2) the test does not tell you about the strength of an association. From the chi-square statistic (and its related level of significance) all you can say is that the variables are statistically associated or not. You can, however, try to interpret the percentages in the related crosstabulation. In Table 1, the chi-square is significant. This means that employment status and gender are statistically associated. The results in the crosstabulation suggest that men are more likely to be employed full-time.
3 Pearson’s Correlation: Pearson’s correlation is a bi-variate measure of association for interval/ratio level variables. Pearson’s correlation ranges from 0 to the absolute value of 1 (e.g. 1 or -1). A correlation of 0 means that there is no linear statistical association between two variables. A correlation of 1 means that there is a perfect positive correlation (or linear association) between two variables. A correlation of -1 means that there is a perfect negative correlation between two variables. A correlation of .50 means that there is a moderately strong positive correlation between two variables. There is also an associated test of significance. If the significance value (p.) is � .05, then the correlation is deemed to be statistically significant. In Table 2 the correlation between years of education and personal income is .42, and p. is < .01. Thus there is a significant, moderately strong positive correlation between education and income. (Another way of saying this is that there is a significant moderately strongly positive linear association between education and income.) In other words, people with higher levels of education tend to earn higher levels of income, people with lower levels of education tend to earn lower levels of income.
4 Difference in Means and t-test: When you wish to examine the relationship between a nominal (or ordinal) variable with two categories that is an independent variable, and a dependent variable that is measured at the interval/ratio level then an appropriate then an appropriate procedure and test is to examine the difference in means, and calculate a t-test. To see the direction of the difference in means just examine the respective means for the two groups. For the t-test there is an associated significance level. If the significance level is � .05, then the difference in means is statistically significant. For example, examine the third row of Table 3. This displays the mean personal income for women and men. Men made an average of $46,968 while women made a an average of $24,268. This difference is statistically significant (p. � .01). Thus you can conclude that (for this sample) men make more than women.
5 Univariate Statistics: Frequencies and Percentages: Often it is useful to provide basic univariate statistics describing key variables. For nominal and ordinal variables this can be done by providing frequencies and percentages. (There are also a variety of other useful statistics that will not be discussed here.) Technically, you can also provide frequencies and percentages for interval/ratio variables – but it is usually not practical to do so because there are so many potential values. (Instead, such data are sometimes portrayed in graphs.) When you provide tables of frequencies and percentages you should provide totals. Also, if there is missing data you should indicate this in the table. In Table 4, the response category with the largest number of cases is “strongly agree”. 7 out of 20 people or 35% of the sample selected this response.
6 Univariate Statistics: Means, Standard Deviations, and N For interval/ratio level variables, one way of summarizing data is to provide means, standard deviations, and N. The mean is the arithmetic average of the data. The standard deviation is a measure of how dispersed the data are. The N is the number of (valid) cases that were used to calculate these statistics. In row 2 of Table 5 we see that for this sample the mean years of education were 15.36, and the standard deviation was 2.17. These statistics were calculated from 183 cases. The standard deviation means that about 68% of the cases fell between 13.19 and 17.53, and about 95% of all the cases fell between 11.02 and 19.70.
7 Percentage Tables for Multiple Items: Sometimes it is useful to provide tables that summarize multiple variables at the same time. Table 2 does this for some correlations. Table 5 does this for means, standard deviations, and Ns. When you have likert-type scales it is sometimes useful to present data in the form of a matrix with the categories across the top (or columns) and the different questionnaire items down the side (or rows). Table 6 does this for the political efficacy items. For example, for item #4, 35% strongly disagreed, 15% disagreed, 0% had no opinion, 20% agreed, and 30% strongly agreed. When the data are displayed this way we can try to discern patterns by comparing across the items. In this particular instance the responses look pretty similar across items – with lots of responses in the extreme categories and fewer responses in the middle of the scale (especially for no opinion).
Recommend
More recommend