## What type of data analysis do you need?

Before beginning an analysis, you must first answer a few basic questions about what approach is best for your data. Review your research question(s) and then select from the following very broad statements to see where you might start your analysis.

#### T Test

A t test is a good method for ascertaining whether means are different for two different groups. For instance, you might want to know whether weight differs by gender. A t test is more sensitive if the group weights—or whatever your scalar variable is—are distributed normally and if the variance is roughly the same in both groups. It is also very important that the two groups be unrelated, as with gender. If you are investigating whether mean GPA is different for two different schools, and students take classes from both schools at different times, a t test will not provide any real information.

#### Paired t Test

Use a paired t test to find differences in two related groups of the same scalar variable. For instance, suppose you perform an experiment in which participants first provide their weight, then go on a test diet, and then provide their weight a second time. You would use a paired t test to find out whether individuals' weights (in general) have changed from Time 1 to Time 2.

Another example would be if you asked teachers to rate two different classroom situations, one in a classroom with windows and one in a classroom without windows. You could use a paired t test to find out if respondents preferred one situation over another by comparing their two different responses. In fact, in a situation like this you would not be able to use an ANOVA or a unpaired t test because the two readings are not independent of each other, being provided by the same people.

#### ANOVA

Use analysis of variance (ANOVA) if the following is true about your data:

- You want to know whether a scale variable has different mean values for different categories of a nominal variable
- The distributions of the RESIDUALS in the different categories of the nominal variable are roughly normal
- The scale variable has roughly the same variance in each of the categories Example: Use ANOVA if you want to know whether people from different countries have different average heights.
- Height is the dependent variable, the variable you want to know about. It's scalar, or interval in nature, a number-based variable for which any value is possible (sometimes any value within a certain range).
- Country is the independent variable, the variable with 3 or more categories.

#### Correlation or Regression

Use correlation or regression analysis to explore whether higher (or lower) values in one scalar variable predict higher (or lower) values in another scalar variable. You can also use binary (yes/no, male/female) variables in correlation and regression analysis, but interpretation of the results is slightly different.

#### Multivariate Regression

Use multivariate regression if you have an outcome variable (some people say "dependent variable") with normally distributed residuals and more than one predictor variable. The predictors may be either scalar, ordinal, or binary. Using multivariate regression you can answer the question, "Which of my predictor variables influence the outcome variable and to what extent?" You can also find out whether the predictor variables explain most of the change in the outcome variable, just some of it, or very little of it.

For instance, you might be looking at GPA and wonder how well gender and attendance predict it. You might find out that both variables predict GPA but that gender predicts only 1% of the variation in GPA while attendance predicts 20% of the variance in GPA. Knowing that, you might decide to go looking for which other variables predict the other 79% of the variation in GPA, if you can.

#### Chi Square Test of Independence

Use the chi square test of independence to find out whether two categorical variables are related to each other. For instance, you might want to find out whether different diseases are more common to one country or another from a list of countries. The two categorical variables are disease type and country. The chi square test of independence can indicate whether country and disease type are related to each other—whether one disease or another is more common in one country or another.