Examination of Normality

Most of the parametric tests follow the assumption of normality. Normality means that the distribution of the test is normally distributed with 0 mean, with 1 standard deviation and a symmetric bell shaped curve. As the normal distribution is very important for statistical inference point of view so it is desired to examine the assumption to test whether the data is from a normal distribution. You can use a statistical test and or statistical plots to check the sample distribution is normal.
The normality can be tested by plotting a normal plot. In a normal probability plot each observed value is paired with its expected value from the normal distribution. In a situation of normality, it is expected that points will fall on straight line. In addition to this a plot of deviation from straight line can also be plotted as detrended normal plot. A structure-less detrended normal plot confirms normality.

Histogram: Histogram gives the rough idea of whether or not data follows the assumption of normality.

Q-Q plot: Most researchers use Q-Q plot to test the assumption of normality. In this method, observed value and expected value are plotted on a graph. If the value varies more from a straight line, then the data is not normally distributed. Otherwise data will be normally distributed.

Box plot: Box plot test is used to test if there are outliers present in the data. Outliers and skewness show the violation of the assumption of normality.

Besides these visual displays, the statistical tests are Shappiro-Wilks and the Lilliefors. The Lilliefors test is based on the modification of the Kolmogorov-Smirov test for the situation when means and variances are not known but are estimated from the data. The Shapiro-Wilks test is more powerful in many situations as compared to other tests.

• Kolmogorov-Smirnov test

Test based on the largest vertical distance between the normal cumulative distribution function (CDF) and the sample cumulative frequency distribution (commonly called the ECDF – empirical cumulative distribution function). It has poor power to detect non-normality compared to the tests below.
• Anderson-Darling test

Test similar to the Kolmogorov-Smirnov test, except it uses the sum of the weighted squared vertical distances between the normal cumulative distribution function and the sample cumulative frequency distribution. More weight is applied at the tails, so the test is better able to detect non-normality in the tails of the distribution.

• Shapiro-Wilk test

A regression-type test that uses the correlation of sample order statistics (the sample values arranged in ascending order) with those of a normal distribution.

How to interpret the normality test

For each test, the null hypothesis states the sample has a normal distribution, against alternative hypothesis that it is non-normal. The p-value tells you the probability of incorrectly rejecting the null hypothesis. When it’s significant (usually when less-than 0.10 or less than 0.05) you should reject the null hypothesis and conclude the sample is not normally distributed. When it is not significant (greater-than 0.10 or 0.05), there isn’t enough evidence to reject the null hypothesis and you can only assume the sample is normally distributed. However, as noted above, you should always double-check the distribution is normal using the Normal Q-Q plot and Frequency histogram.