Degrees of Freedom (df)

Degrees of Freedom denotes the number of samples that a statistician has the freedom to choose. Degrees of Freedom is based on a concept that one could not have exercised his /her freedom to select all the samples.

The concept can be explained by an analogy:

X+Y = 10 (1)

In the above equation you have freedom to choose a value for X or Y but not both because when you choose one, the other is fixed. If you choose 8 for X, then Y has to be 2. So the degree of freedom here is 1.

X+Y+Z = 15 (2)

In the formula (2), one can choose values for two variables but not all. You have freedom to choose 8 for X and 2 for Y, If so, then Z is fixed, So the degree of freedom is 2. df is calculated by subtracting 1 from the size of each group. The methods of df calculation may vary with the test used.

“Degrees of freedom” refers to the number of scores that are free to vary. It is required when one is working on estimates of population values (sample statistics). It is often abbreviated as df. It is the number of observations in the data collection that are free to vary after the sample statistics have been calculated. For example, in calculating a sample standard deviation, we must subtract the sample mean from each of the n data observations in order to get the deviations from the mean. But once we have completed the next to the last subtraction, the final deviation is automatically determined, since the deviations prior to squaring must sum to zero. Therefore, the last deviation is not free to vary; only n-1 are free to vary. As a rule of thumb, every estimate of a population value in a formula equals 1 degree of freedom.

The practical impact of using degrees of freedom is found when one considers small samples. Consider the t distribution, for example. With small samples it gets flatter, or another way of looking at it is that more area underneath the curve (region of rejection) is farther away from the mean. This makes using degrees of freedom a more conservative and accurate, since small samples tend to underestimate the population value.

Practical Example:

Consider a situation in which the scores that make up the distribution are unknown, and I tell you to guess at the first score in the distribution of 5 numbers. Your guesses will be wild because that score could be any number. The same is true for the second score, third score, and the fourth score. All of these scores have complete freedom to vary. But, if I tell you that the first four scores in the distribution are 3,4,5, and 6; and I tell you what the mean of the distribution (5, in this case), then the last score in this distribution has to be 7. In other words, if the mean is known, the missing score is determined by your knowledge of the other four. Therefore n-1 scores are free to vary. The (-1) is often called a “restriction.” Note that by making the numerator smaller, the standard deviation becomes larger. A larger standard deviation means that the sampling distribution is flatter; and flatter means more values are farther from the mean. It’s tougher to find significance.

In conclusion, when the statistical formula is concerned with description, degrees of freedom is n. When the formula is concerned with inference, some restrictions apply. The idea is to adjust for a small sample size’s tendency to underestimate the population parameter. As n gets larger, this becomes less of a problem because the distribution becomes less flat and more normal, but we still use the sample formula to calculate the statistic.