Chapter 11

F Distributions

The previous distributions that we have looked at are the z and t. They both have uses, primarily with 1 or 2 samples. The new distribution for this chapter is the F distribution. It is used when:

1. You are comparing 2 population variances

2. You are testing assumptions

3. You have more than 2 groups

The F distribution is a positively skewed distribution that ranges from 0 to infinity. It is a one-tailed distribution. There are a family of these distributions which are determined by two separate parameters (degrees of freedom; just like t).

The first use of the F distribution is when you are comparing two population variances. Let's do an example:

A company is interested in purchasing machines that make screws. They sample 10 Brand X machines and find that the mean screw diameter is 12 cm (s=5). They sample 10 Brand Y machines and find the mean screw diameter is 12 cm (s=6). They want to know if the variances are different.

Note that this would indicate a difference in the reliability of the machine. If the machine has a high variance than the machine is putting out screws with widely differing diameters. It is in the best interest of the company to have all of the screws have similar diameters. So let's do the test:

 

1. Ho: Variances are equal

H1: Variances are not equal

2 Alpha=.10 (I will explain later why we use such a high alpha)

3. F= Variance of One population / Variance of the other population

F=36/25 = 1.44 *note that you put the higher variance in the numerator

4. Now we need an F critical. Since this is a two tailed test, we split the alpha level into the two tails and find the critical value. But, F is a one-tailed distribution, so we have to divide our alpha in half (.10/2=.05). This is the ONLY time we will ever do this.

Next step, we look in the F table for our critical value....we need two seperate df, one for the numerator, one for the denominator. The df-numerator= n1-1 The df-denominator =n2-1, so both are equal to 9. Looking up 9 dfnum and 9 dfdenom we obtain a critical value of 3.18

5. Since our Fcalc<Fcrit, we fail to reject the null and conclude the variances are not significantly different

 

The second use of the F distribution is to test assumptions. Namely we can test the assumption that our population variances are the same (required for independent measures t-tests). Basically this is the same as the previous use of F. You are testing the variance of one population against the variance of another population.

The third use of the F distribution is the most common use. This is when we have 3 groups. The calculations are numerous, so it becomes important to remember that we are doing a 5 step hypothesis test, so just keep on track.

There are two types of variance in ANOVA...between group variance and within group variance. Between group variance is the difference in groups due to the treatment that you are applying. If group 1 gets a certain drug and group 2 gets a certain drug, than you would assume that some to the difference in the groups is due to the effect of the drug - this is between group variance. But, the groups will also have variance within. The individuals in group 1 will differ from each other due to sampling error. So we consider the within group variance to be error.

Let's try an example: We administer 3 different drugs to individuals with depression, then we measure their depression level. We should make a distinction here...there are two types of variables, independent variables and dependent variables. Independent variables are grouping variables, how we divide our subjects. So in this case our dependent variable is the type of drug. Dependent variables are measuring

variables, or in this case, the depression score.

Drug A Drug B Drug C

5 5 9

2 4 6

2 3 9

To conduct the hypothesis test:

1. Ho: All means are equal

H1: All means are not equal

2. alpha = .05

3. F=Between Variance/ Within Variance

Here's where it become tricky. The easiest first step is to get the sum of the Xsquared and the sum of the Xs squared. If we square all of the numbers and add them up we get 281. If we add all of the Xs and square that number we get 2025 (45*45).

Ok, now we need to create what is known as a source table. It looks like this:

Source

SS

df

MS

F

Between

       

Within

       

Total

       

 

And we will begin filling in the table:

Starting with SS (sum of squares)

Ssbetween = Sum (Column total squared/number in that column) - Sum of the Xs squared/total N

= (81/3) + (144/3) + (576/3) - (2025/9) = 27+48+192-254.7=12.3

Sswithin= Sum of Xsquared - Sum (Column total squared/number in the column)

= 281 - 267 = 14

Dfbetween = k-1, where k is the # of groups = 3-1=2

Dfwithin=N-k = 9-3=6

Msbetween =SS between / df between = 12.3/2=6.15

Mswithin = SS within / df within = 14/6= 2.33

F=MS between / MS within = 6.15/2.33=2.64

Now we can fill in that table:

Source

SS

df

MS

F

Between

12.3

2

6.15

2.64

Within

14

6

2.33

 

Total

26.3

8

   

 

4. Now we need a critical value for F. We need Dfnumerator, which is simply DfBetween or 2

We need Dfdenom which is simply DfWithin or 6

Looking up the critical F for these parameters, we find the Fcrit=5.14

5. Since our Fcalc (2.64) < Fcrit (5.14) we fail to reject the null and conclude the groups were not significantly different.

 

***NOTE: If we had rejected and concluded the groups were different, we would NOT have known which groups were significantly different from each other. This requires post-hoc testing which is incorrectly done in the book, so you are not responsible for it. The only conclusion you can draw at this point is that the groups are or are not significantly different from each other.