When results of studies or research are reported, important decisions are made on the basis of these results. For example, new varieties are often tested against standard varieties to determine if the new varieties is more effective. Several methods of manufacturing may be compared to select the best technique to manufacture the best product. Several evidence may be examined to determine if there is a possible link between one activity and a result. In such kind of studies, results are summarized by a statistical test, and a decision about the significance of the result is based on a p-value. Therefore, it is important for the reader to know what the p-value is all about.
To describe how the p-value works, we'll use a common statistical test as an example, the Student's t-test for independent groups. For this test, subjects are randomly assigned to one of two groups. Some treatment is performed on the subjects in one group, and the other group acts as a control where no treatment or a standard treatment is given. For this example, suppose group one is given a new drug and group 2 is given then standard drug. Time to relief is measured for both groups. The outcome measurement is assumed to be a continuous variable which is normally distributed, and it is assumed that the population variance for the measure is the same for both groups.
For this example the sample mean for group one is 10 and the sample mean for group two is 12. The sample standard deviation for group one is 1.8 and the sample standard deviation for group two is 1.9. The sample size for both groups is 12. Entering this data into a statistical program will produce a t-statistic and a p-value. Calculated t = -2.65 with 22 degrees of freedom, and a p-value of 0.0147. This means that you have evidence that the mean time to relief for group one was significantly different than for group two.
To interpret this p-value, you must first know how the test was structured. In the case of this two-sided t-test, the hypotheses are:
Ho: u1 = u2 (Null hypothesis: means of two groups are equal)
Ha: u1 <> u2 (Alternative: means of the two groups are not equal)
A low p-value for the statistical test points to rejection of the null hypothesis because it indicates how unlikely it is that a test statistic as extreme as or more extreme than the one given by this data will be observed from this population if the null hypothesis is true. Since p=0.015, this means that if the population means were equal as hypothesized (under the null), there is a 15 in 1000 chance that a more extreme test statistic would be obtained using data from this population. If you agree that there is enough evidence to reject the null hypothesis, you conclude that there is significant evidence to support the alternative hypothesis.
The researcher decides what significance level to use i.e., what cutoff point will decide significance. The most commonly used level of significance is 0.05. When the significance level is set at 0.05, any test resulting in a p-value under 0.05 would be significant. Therefore, you would reject the null hypothesis in favor of the alternative hypothesis. Since you are comparing only two groups, you can look at the sample means to see which is largest. The sample mean of group one is smallest, so you conclude that medicine one acted significantly faster, on average, than medicine two. This would be reported in an article using a phrase like this: "The mean time to relief for group one was significantly smaller than for group two. (two sided t-test, t(22) = -2.65, p=0.015)."
P-values do not simple provide you with a Yes or No answer, they provide a sense of the strength of the evidence against the null hypothesis. Lower the p-value, the stronger the evidence.