Today we return to the issue of comparing central tendency. Often, we want to compare more than one pair of means at a time. We may also wish to compare means across many groups when we control for one or more factors. Even a regression can be seen as asking whether varying the independent variable has any effect on the mean value of the dependent variable. All of these situations lend themselves to an **ANOVA**, or **AN**alysis **O**f **VA**riance. Despite its name, ANOVA is usually applied to understanding differences in central tendency. It gets its name because it uses *variance* to address questions about the *mean*.

A thought model helps to understand how an ANOVA works. Suppose that we have a factor that lets us divide our data into several groups (these groups are often called *treatments*). For example, we might have grown plants under several types of fertilizer; fertilizer would be considered a factor, and each type of fertilizer would represent a group (treatment). Factors are categorical variables, in other words, nominal or ordinal variables.

Let’s also suppose that all of the groups come from populations with the same variance, although the means of those populations might be different. Imagine comparing the variation within each group to the total variation when all of the groups are combined. If the means of the individual groups are equal, the total variation will not be greater than the variation in any one group. However, if at least one of the groups has a different mean, then the total variance will be greater than the variance in any one of the groups. This is similar to what an ANOVA does: by comparing variance within and among groups, an ANOVA can test whether the means of the groups differ. An F-test is used to compare the variances, and that is at the core of an ANOVA. Specifically, an ANOVA tests whether the variance among the groups is greater than the variance within a group.

Another way to view this problem is that we could **partition variance**, that is, we could divide the total variance in our data into the different components that produce that variation. For example, some of the variance must be derived from variation among replicates within each group, and the rest of the variance must come from variation among the groups. This is an incredibly important concept within statistics, and for scientists, it is a useful way of looking at the world. When you see variation, you should ask what causes that variation.

An ANOVA makes three assumptions. First, the population must be randomly sampled. Second, each population must be normally distributed. Third, all of the populations must have the same variance, that is, they must be **homoscedastic**.

The null hypothesis of an ANOVA is that the means of all of the populations are equal, with the alternative hypothesis being that at least one of the means is different.

The simplest type of an ANOVA is called a one-way ANOVA. We use a one-way ANOVA when we have a sample that is grouped by a single factor. For example, we may have measured weight percent strontium on a series of rocks, and we have categorized them by rock type: basalt, andesite, rhyolite. We would want to test whether the mean weight percent strontium is the same for all three rock types. In a one-way ANOVA, we therefore have two potential sources of variation in our data. The variation among our groups is called the explained variation, because rock type will be the factor that accounts for this variation. We also have the variation within our groups, that is, among our replicates within each group, which is called the unexplained variation because we cannot attribute it to the factor.

ANOVA results are presented in a standard table format, and you should become comfortable with interpreting any ANOVA table you come across.

Analysis of Variance Table

Response: strontium

Df Sum Sq Mean Sq F value Pr(>F)

rocktype 2 2.8350 1.4175 1.4392 0.2753

Residuals 12 11.8189 0.9849

The first column in an ANOVA table represents the source of variation, with a row for the explained variation (the among-group variation, also called among-treatments variation), a row for the unexplained variation (the within-group variation, also called among-replicates variation), and a bottom row for the total variation. R omits this bottom row for the total variation.

The second column is the degrees of freedom. In a one-way ANOVA, the degrees of freedom for the among-group (explained) variation is m-1, where m is the number of groups. The degrees of freedom for the unexplained (within group) variation is N-m, where N is the total number of observations, which is equal to m x n (if all the groups have the same sample size, n). The total degrees of freedom is the sum of the first two rows, or N-1.

The third column is the sum of squares for each source of variation. The sum of squares is the sum of the squared deviation of each observation from the mean. For the among-groups row, this is the mean of each group compared to the mean of all the data. For the within-groups row, this is each observation in a group compared to the mean of that group. The sums of squares in the first two rows will total to equal the total sum of squares.

The fourth column is the mean squares, calculated by dividing the sum of squares for each row by the degrees of freedom for that row. In other words, this is a column of variances. Therefore, the variance in the first row is the among-group variance, and the variance in the second row is the within-group variance.

The fifth column is the F-ratio, with the among-group variance in the numerator and the within-group variance in the denominator. If there are no differences among groups, the among-group variance will be an estimate of the within-group variance, so their ratio ought to be close to one. If the means of the groups are different, then the among-group variance will grow and the ratio will exceed one. In an ANOVA, the F-test is a one-tailed (right-tailed) test because we expect the ratio to be close to one if the null hypothesis is true and greater than one if the null hypothesis is false.

R (and some other programs) add a fifth column that lists the p-value for the F-ratio.

Here is an example of a one-way ANOVA in R. First, import the data, then run head() to see the variables and the set-up of the data. Finally, attach() the data frame to simplify accessing the variables.

airQuality <- read.table(file='airQuality.txt', header=TRUE, sep=',')

head(airQuality)

attach(airQuality)

We should always visualize our data before beginning, partly so that we can evaluate the assumptions of the test. In this case, those assumptions are random sampling, normally distributed data within each group, and equal variances among the groups. A strip chart is a good way of doing this.

stripchart(pollutant ~ city, pch=16)

Calling it this way will create a row of pollutant values corresponding to each city. Read pollutant ~ city as pollution *as a function of* city.

Notice in the plot that the data at Elberton might be left-skewed, and this might be cause for caution, though the sample size is small. The variation at Elberton may be less than at Athens, so we should test for homoscedasticity. We therefore precede our ANOVA with a Bartlett’s test to make sure that the variances of each group are equal.

bartlett.test(pollutant, city)

The Bartlett test is not significant (i.e., the variances are not statistically different). Seeing that the assumptions appear valid (random sampling, normal distribution, equal variances), we run the ANOVA with a combination of the anova() and lm() functions. The lm() function stands for *linear model*, and we will return to this when we cover regression.

anova(lm(pollutant ~ city))

The results are statistically significant, meaning that we can reject the null hypothesis that the means of all groups are the same. At least one mean is different, and to determine which ones might be different, we can perform a t-test on each pair of cities (i.e., *pairwise t-tests*) to identify which ones are different. The pairwise.t.test() function simplifies what could otherwise be a tedious operation by running all possible t-tests in one command. The p.adj="bonferroni" parameter should always be used as it guards against type I errors when you perform many tests.

pairwise.t.test(pollutant, city, p.adj="bonferroni")

A second way to perform an ANOVA is with the aov() function, which has a similar syntax to the lm() function. One advantage of aov() over anova() is that, if you find a significant difference in your ANOVA, you can run a simpler test of pairwise differences in means called Tukey‘s range test. Tukey‘s test is also known as Tukey‘s honest significance test, Tukey‘s honestly significant different test, or just Tukey‘s HSD. In R, you can run it with the TukeyHSD() command.

myANOVA <- aov(pollutant ~ city)

summary(myANOVA)

TukeyHSD(myANOVA, "city", ordered=TRUE)

Finally, a third way of running a one-way ANOVA is the Welch test, which relaxes the assumption of equal variances in all of the groups. The Welch test is run with the oneway.test() command, with the same syntax as the anova() and aov() commands.

oneway.test(pollutant ~ city)

We use a two-way ANOVA when we have two factors that categorize our sample, such as rock type and geographic region. For a two-way ANOVA, we now have three potential sources of variation. First, there is variation caused by the first factor (rock type), which is one source of explained variation. We also have variation among our second factor (geographic region), which is another source of explained variation. Finally we have variation within any of our factor combinations, which is the unexplained variation. The two-way ANOVA makes the same assumptions of a one-way ANOVA.

There are two versions of the two-way ANOVA. The simple version assumes that there is no interaction effect, that variation in one factor (e.g., rock type) does not change with respect to the other factor (e.g., geographic region). The complicated version allows for interactions between the factors, such as rock type having a different effect in some regions than in others. You can test for interactions between factors only if you have replicates for each of your factor combinations, that is, you must have replicates for every combination of region and rock type.

For a two-way ANOVA, you need to have a balanced design, that is, the same sample size for each factor and each combination of factors. Results from unbalanced designs can be hard to interpret and you should always plan your sampling to achieve a balanced design. Balanced design is less of an issue for one-way ANOVA, but it is critical for two-way and multiway ANOVA.

The two-way ANOVA table is similar to a one-way ANOVA, but it has a row for each source of explained variation. The only substantial difference is that there are two F-ratios; each has an among-group variance in the numerator and the within-group variance in the denominator. If there is an interaction factor, it will be shown on its own line. Each F-ratio has a corresponding p-value, because each source of variation is tested separately. Note that in the example below, the city factor is statistically significant, but the source factor is not. R uses a consistent scheme of asterisks to indicate statistical significance, and these codes are listed at the bottom of the ANOVA table.

Analysis of Variance Table

Response: pollutant

Df Sum Sq Mean Sq F value Pr(>F)

city 2 30.012 15.006 12.1849 0.0001163 ***

source 1 0.511 0.511 0.4153 0.5239050

Residuals 32 39.409 1.232

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Running a two-way ANOVA is similar to a one-way ANOVA, except that the linear model is that pollutant is a function of city and source. The plus sign indicates that there is no interaction term. Read this model as pollutant as a function of city *and* source, not city *plus* source.

anova(lm(pollutant ~ city + source))

The interaction effect can be investigated by replacing the plus sign with an asterisk. Read this model as pollutant as a function of city *and* source *and their interaction*, not as city *times* source.

anova(lm(pollutant ~ city * source))

The functions aov() and TukeyHSD() will also work with a two-way ANOVA.

There are also ANOVAs for more than two factors, called multiway ANOVAs.

You can perform an ANOVA on a regression, with the regression being the source of explained variation and scatter around the line being the unexplained source of variation. You can think of this as a variation of a one-way ANOVA, but instead of having discrete groups, you have variation as a function of a continuously varying variable.

The ANOVA for a regression will report an R-squared value, called the Coefficient of Determination. This is the ratio of the regression sum of squares to the total sum of squares; the latter is not actually shown in the table. This is also equal to the correlation coefficient squared. The ANOVA also reports an Adjusted R-squared. Since R-squared will always increase as you add regressors (independent variables), an increase in R-squared does not necessarily tell you that additional variables improve the model. The corrected R-squared includes a penalty for additional parameters, so that an increase in R-squared is informative: it will increase only if the additional parameter explains more variation than expected by chance. Report R-squared when you want to convey the percentage of explained variation. Report the adjusted R-squared when comparing models that have different numbers of parameters.

ANOVA is a common statistical test, especially in ecology, but there is a growing backlash against its use because it reports only a p-value: there is no parameter estimation and no estimation of uncertainty in the parameter. As we have seen, statistically significant results can be achieved simply through large sample size, even when the effect size is small. Many ecologists eschew the use of ANOVA in favor of methods that estimate parameters and their uncertainty. But not all ecologists agree with this, so tread carefully.

Like many non-parametric tests, the Kruskal-Wallis and Friedman tests are performed on ranks and is therefore far less influenced by outliers. The use of ranks, however, leads to considerable loss of power. These tests assume random sampling, but make no assumptions about how the data are distributed.

The Kruskal-Wallis test is used for one-way analysis of variance. It is performed in R with the kruskal.test() function, with a syntax much like that used in a regression.

kruskal.test(pollutant ~ city)

Two-way non-parametric analysis of variance is performed with the friedman.test() function. No replicates are permitted with the Friedman test. Here is an example of how to perform a Friedman test in R, although it returns an error with this particular data set because of the replicates.

friedman.test(pollutant ~ city | source)