Two-Way Analysis of Variance (ANOVA)

Introduction to Data Science Probability Distributions Two-Way Analysis of Variance (ANOVA)

Introduction

Analysis of Variance or ANOVA is a statistical procedure of inference of the acceptability of the null hypothesis when three or more population means are involved. ANOVA is a tool that splits the observed aggregate variability into systematic factors and random factors.

ANOVA was developed by the famous statistician Ronald Fisher, and it is based on the Law of Total Variance. It uses the F-Test method to conclude the significant difference in the means of the population.

Two-way ANOVA

The two-way ANOVA determines the hypothesis acceptability of two factors or variables among the given number of populations. We use two-way ANOVA when we want to jointly test the effects of the given two factors (i.e., both column-wise and row-wise) in the population on a dependent variable.

For example, suppose a researcher wishes to test the effects of two different types of plant food and two different types of soil on the growth of certain plants. The two independent variables are the type of plant food and the type of soil, while the dependent variable is plant growth. Other factors, such as water, temperature, and sunlight, are held constant.

Two-way ANOVA follows a Randomized Block Design where the experimental units are grouped and each group (or block) gets a random treatment. This minimizes the systematic error that occurs during conclusion of significant difference.

Algorithm

The following are the steps to use one-way ANOVA to test the significant difference among various populations:

Step 0: Initialize the null hypothesis and the alternative hypothesis.

Step 1: Find the total number of observations, N.

Step 2: Calculate the total of all observations, T.

Step 3: Find the correction factor, T^2/N.

Step 4: Calculate the total sum of squares, SST.

Step 5: Calculate the column sum of squares, SSC.

Step 6: Calculate the row sum of squares, SSR.

Step 7: Calculate the error sum of squares, SSE = SST – SSC – SSR.

Step 8: Find the degrees of freedom between columns as well as between rows, and for the error.

Step 9: Calculate the mean sum of squares MSC (for column), MSR (for row), and MSE (for error).

Step 10: Calculate the F-ratio and compare it with the table value of F-test to infer whether the null hypothesis can be accepted or rejected.

Note: In steps 4, 5, and 6, we must subtract the sum of squares with the correction factor.

Implementation

Let us consider an example where five breeds of cattle are kept under three different kinds of nutritional diets. The values given below indicates the gain in weight by the cattle:

Using the two-way analysis of variance, we can find whether there is a relation between cattle and diet types. We can initialize our null hypothesis to conclude two facts:

1) There is no significant difference between diet types.

2) There is no significant difference between cattle.

The alternative hypothesis will conclude otherwise.

Note: Before we create a table to tabulate the given table and find its sum of squares, it is advisable to use the coding method of solving the problem. In the coding method, we can take an arbitrary number which when subtracted from each value will make the further solving of the problem very simple without changing the value of the answer. The following table calculates the sum of elements as well as the sum of squares, when each value is subtracted with the number 30 (since most of the numbers are within the range of 30):

It is evidently visible that if we use the coding method on this data, then our calculations will get much simpler. We can now follow the next steps of the algorithm:

Step 1: N = No. of rows(R) x No. of columns(C), i.e., 5 x 3 = 15.

Step 2: T = 0 – 5 + 14 = 9.

Step 3: Correction factor = 81/15 = 5.4.

Step 4: SST = 90 + 79 + 102 – 5.4 = 265.5.

Step 5: SSC = (0)^2/5 + (-5)^2/5 +(14)^2/5 – 5.4 = 38.8.

Step 6: SSR = (4)^2/3 + (-9)^2/3 +(2)^2/3 + (7)^2/3 + (5)^2/3 – 5.4 = 52.92.

Step 7: SSE = 265.5 – 38.8 – 53.92 = 173.78.

Step 8: Degrees of freedom between the rows = R – 1 = 4.

Degrees of freedom between the columns = C – 1 = 2. Error degrees of freedom = (C-1)*(R-1) = 8.

We can tabulate the answers yielded to easily continue with steps 9 and 10.

From the table above:

Step 9: MSC = 19.1, MSR = 13.23, and MSE = 21.72.

Step 10: F-ratio between columns = 1.12.

F-ratio between rows = 1.64.

The table value of df = (8,2) at 5% significance is 4.4590 and df = (8,4) at 5% significance is 3.8379. The F-test table values for various levels of significance can be found here.

Since the F-ratio for both cases is less than the table values, we accept the null hypothesis. This means that there is no significant difference between cattle and diet types.

Let us try to code the example above by creating a function that can return both of the final F-ratio values.

				
					fun two_anova(arr: Array): DoubleArray 
{
    val c = arr.size
    val r = arr[0].size
    val n = r * c
    var t = 0.0
    var sst = 0.0
    var ssc = 0.0
    var ssr = 0.0
    var fc = 0.0
    var fr = 0.0
    val csum = DoubleArray(c)
    val csqarr = DoubleArray(c)
    val rsum = DoubleArray(r)
    for (i in 0 until c) 
    {
        for (j in 0 until r) 
        {
            t += arr[i][j]
            csum[i] += arr[i][j]
            csqarr[i] += arr[i][j] * arr[i][j]
        }
    }
    for (i in 0 until r) 
    {
        for (j in 0 until c) 
        {
            rsum[i] += arr[j][i]
        }
    }
    val cfac = t * t / n
    for (i in 0 until c) 
    {
        sst += csqarr[i]
        ssc += csum[i] * csum[i]
    }
    for (i in 0 until r) 
    {
        ssr += rsum[i] * rsum[i]
    }
    sst -= cfac
    ssc = ssc / r - cfac
    ssr = ssr / c - cfac
    val sse = sst - ssc - ssr
    val msc = ssc / (c - 1)
    val msr = ssr / (r - 1)
    val mse = sse / ((c - 1) * (r - 1))
    fc = if (msc &gt; mse) 
        {
            msc / mse
        } 
        else 
        {
            mse / msc
        }
    fr = if (msr &gt; mse) 
        {
            msr / mse
        } 
        else 
        {
            mse / msr
        }
    return doubleArrayOf(fc, fr)
}

The table values can be taken as an input using 2-dimensional arrays wherein each 1-dimensional array represents the values of a population. Even for building a function, we can pass the array after the coding method is applied to it.

Also, the table value of F for the given degrees of freedom can be found using the statistical tables.

				
					import java.util.Arrays
val arr = arrayOf(
            doubleArrayOf(0.0, -6.0, 3.0, 6.0, -3.0),
            doubleArrayOf(-4.0, -1.0, -6.0, 1.0, 5.0),
            doubleArrayOf(8.0, -2.0, 5.0, 0.0, 3.0))
val row_exp = 19.4
val col_exp = 6.04
val ans = two_anova(arr)
println("F values: " + Arrays.toString(ans))
if (ans[0] &gt; row_exp || ans[1] &gt; col_exp)
{
    println("Significant difference")
}
else 
{
    println("No significant difference")
}

Output:

				
					F values: [1.120274914089347, 1.6423173803526447]
No significant difference

Previous Topic

Back to Lesson

Next Lesson