Testing of Statistical Hypothesis

Understanding the Results

The null hypothesis in probability and statistics is the starting assumption that nothing other than random chance is operating to create the observed effect that you see in a particular set of data. Basically it assumes that the measured effects are the same across the independent conditions being tested. There are no differences or relationships between these independent variables and the dependent outcomes—equal until proven otherwise.
The null hypothesis is rejected if your data set is unlikely to have been produced by chance. The significance of the results is described by the confidence level that was defined by the test (as described by the acceptable error "alpha-level"). For example, it is harder to reject the null hypothesis at 99% confidence (alpha 0.01) than at 95% confidence (alpha 0.05).
Even if the null hypothesis is rejected at a certain confidence level, no alternative hypothesis is proven thereby. The only conclusion you can draw is that some effect is going on. But you do not know its cause. If the experiment was designed properly, the only things that changed were the experimental conditions. So it is logical to attribute a causal effect to them.
What if the null hypothesis is not rejected? This simply means that you did not find any statistically significant differences. That is not the same as stating that there was no difference. Remember, accepting the null hypothesis merely means that the observed differences might have been due simply to random chance, not that they must have been.
Some concepts involved in testing of hypothesis.

In applied investigations or in experimental research, one may wish to estimate the yield of a new hybrid line of corn, but ultimate purpose will involve some use of this estimate. One may wish, for example, to compare the yield of new line with that of a standard line and perhaps recommend that the new line replaces the standard line if it appears superior. This is the common situation in research. One may wish to determine whether a new method of sealing light bulbs will increase the life of the bulbs, whether a new germicide is more effective in treating a certain infection than a standard germicide, whether one method of preserving foods is better than the other so far as the retention of vitamin is concerned, which one among the six available varieties of any crop is best in terms of yield per hectare.

Using the light bulb example as an illustration, let us suppose that the average life of bulbs made under a standard manufacturing procedure is about 1400 hours. It is desired to test a new procedure for manufacturing the bulbs. Here, we are dealing with two populations of light bulbs: those made by the standard process and those made by the proposed process. From the past investigations, based on sample tests it is known that the mean of the first population is 1400 hours. The question is whether the mean of the second population is greater than or less than 1400 hours? This we have to decide on the basis of observations taken from a sample of bulbs made by second process.

In making comparisons of above type, one cannot rely on the mere numerical magnitudes of the index of comparison such as mean, variance, etc. This is because each group is represented only by a sample of observations and if another sample were drawn, the numerical value would change. This variation between samples from the same population can at best be reduced in a well designed experiment but can never be eliminated. One is forced to draw inference in the presence of the sampling fluctuations which affect the observed differences between the groups, clouding the real differences. Hence, we have to devise some statistical procedure, which can test whether those difference are due to chance factors or really due to treatment.

The tests of hypothesis are such statistical procedures which enable us to decide whether the differences are attributed to chance or fluctuations of sampling.

Sample space: The set of all possible outcomes of an experiment is called sample space. It is denoted by S. For example in an experiment of tossing two coins simultaneously, the sample space is S = {HH, HT, TH, TT}; where ‘H’ denotes the head and ‘T’ denotes the tail outcomes. In testing of hypothesis, we are concerned with drawing inferences about the population based on a random sample. Let there are ‘N’ units in a population and we have to draw sample of size ‘n’. Then the set of all possible samples of size 'n' is the sample space and any sample x=(x₁, x₂,…,x_n) is the point of the sample space.

Parameter: A function of population values is known as parameter For example, population mean (m) and population variance(σ²).

Statistic: A function of sample values say, (x₁, x₂,…,x_n) is called a statistic. For example, sample mean
(

),sample variance (s²), where

A statistic does not involve any unknown parameter.

Statistical Hypothesis: A Statistical Hypothesis is an assertion or conjecture (tentative conclusion) either about the form or about the parameter of the distribution. For example

i) The normal distribution has mean 20.

ii) The distribution of process is Poisson.

iii) Effective life of a bulb is 1400 hours.

iv) A given detergent cleans better than any washing soap.

In a statistical hypothesis, all the parameters of a distribution may be specified completely or partly. A statistical hypothesis in which all the parameters of a distribution are completely specified is called simple hypothesis, otherwise, it is known as composite hypothesis. For example, in case of normal population, the hypothesis

i) Mean(μ) = 20, variance(σ²) = 5(Simple hypothesis)

ii) μ = 20, σ²>1 (composite hypothesis)

iii) μ = 20 (composite hypothesis)
Null Hypothesis: The statistical hypothesis under sample study is called null hypothesis. It is usually that the observations are the result purely of chance. It is usually denoted by H₀.

Alternative Hypothesis: In respect of every null hypothesis, it is desirable to state, what is called an alternative hypothesis. It is complementary to the null hypothesis. Or "The desirable attitude of the statistician about the hypothesis is termed as alternative hypothesis". It is taken usually that the observations are the result of real effect plus chance variation. It is usually denoted by H₁. For example, if one wishes to compare the yield per hectare of a new line with that of standard line, then, the null hypothesis:

H₀: Yield per hectare of new line (μ₁)=Yield per hectare of standard line (μ₂)

The alternative hypothesis corresponding to H₀, can be the following:

i) H₁_:m₁> m₂ (Right tail alternative)

ii) H₁_:m₁< m₂(left tail alternative)

iii) H₁_:µ₁≠ µ₂ (Two tailed alternative)

(i) and (ii) are called one tailed test and (ii) is a two tailed test. Whether one sets up a one tailed test or a two-tailed test depends upon the conclusion to be drawn if H₀ is rejected. The location of the critical region will be decided only after H₀ has been stated. For example in testing a new drug, one sets up the hypothesis that it is no better than similar drugs now on the market and tests this hypothesis against the alternative hypothesis that the new drug is superior. Such an alternative hypothesis will result in a one tailed test (right tail alternative).

If we wish to compare a new teaching technique with the conventional classroom procedure, the alternative hypothesis should allow for the new approach to be either inferior or superior to the conventional procedure. Hence the test is two-tailed.

Critical Region: It is region of rejecting null hypothesis when it is true if sample point belongs to it. Hence ‘C’ is the critical region.

Suppose that if the test is based on a sample of size 2, then the outcome set or sample space is the first quadrant in a two dimensional space and a test criterion will enable us to separate our outcome set into two complementary subsets, C and C_bar If the sample point falls in the subset C, H₀is rejected, otherwise, H₀is accepted.

The terms acceptance and rejection are, however, not to be taken in their literal senses. Thus acceptance of H₀ does not mean that H₀ has been proved true. It means only that so far as the given observations are concerned, there is no evidence to believe otherwise. Similarly, rejection of H₀ does not disprove the hypothesis; if merely means that H₀ does not look plausible in the light of given observations.

It is now known that, in order to establish the null hypothesis, we have to study the sample instead of entire population. Hence, whatever, decision rule we may employ, there is every chance of committing errors in the decision for rejecting or accepting the hypothesis. Four possible situations, which can arise in any test procedure, are given in the following table.

From the table, it is clear that the errors committed in making decisions are of two types.

Type I error: Reject H₀ when H₀ is true.

Type II error: Accept (does not reject) H₀when H₀ is false.

For example, a judge, who has to decide whether the person has committed the crime. The statistical hypothesis in this case is:

H₀: Person is innocent; H₁: Person is criminal.

In this situation, two types of errors which the judge may commit are:

Type I error: Innocent person is found guilty and punished.

Type II error: A guilty person is set free.

Since, it is more serious to punish an innocent than to set free a criminal. Therefore, Type I error is more serious than the Type II error.

Probabilities of the errors:

Probability of Type I error = P (Reject H₀ / H₀is true) = α

Probability of Type II error = P (Accept H₀ / H₁ is true) = β

In quality control terminology, Type I error amounts to rejecting a lot when it is good and Type II error may be regarded as accepting a lot when it is bad.

P (Reject a lot when it is good) = α (producer’s risk)

P (Accept a lot when it is bad) = β (consumer’s risk)

Level of significance: the probability of Type I error (α) is called the level of significance. It is also known as size of the critical region.

Although 5% level of significance has been taken as a rough line demarcation in which deviations due to sampling fluctuations alone will be interpreted as real ones in 5% of the cases. Hence, yet the inferences about the population based on samples are subject to some degree of uncertainty. It is not possible to remove this uncertainty completely, but it can be reduced by choosing the level of significance still lesser like 1%, in which chances of interpreting the deviations due to sampling fluctuations, as real one, is only one in 100 cases.

Power function of a Test: The probability of rejecting H₀when H₁is true is called power function of the test.

Power function = P(Reject H₀ / H₁is true)

= 1-P(Accept H₀ / H₁ is true)

= 1- β.

The value of this function plays the same role in hypothesis testing as the mean square error plays in estimation. It is usually used as our standard in assessing the goodness of a test or in comparing two tests of same size. The value of this function for a particular point is called the power of the test.

In testing of hypothesis, the ideal procedure is to minimize the probabilities of both type of errors. Unfortunately, for a fixed sample size n, both the probabilities cannot be controlled simultaneously. A test which minimizes the one type of error, maximizes the other type of error. For example, if there is a critical region which makes the probability of Type I error zero, will be of the form always accept H₀ and that the probability of Type II error will be one. It is, therefore, desirable to fix the probability of one of the error and choose a critical region which minimizes the probability of the other error. As Type I error is consider to be more serious than Type II error. Therefore, we fix the probability of Type I error (α) and minimize the probability of Type II error (β) and hence, maximize the power function of test.

Steps in solving testing of hypothesis problem:

i) Explicit knowledge of the nature of the population distribution and the parameters of interest, i.e., about which the hypothesis is set up.

ii) Setting up of null hypothesis H₀ and alternative hypothesis H₁ in terms of the range of parameter values, each one embodies.

iii) The choice of a suitable test statistic, say, t=t(x₁, x₂,…,x_n)called the test statistic which will best reflect on H₀ and H₁.

iv) Partition the simple space (set of possible values of test statistic, t) into two disjoint and complementary subsets C and C_bar= A (say) and framing be test, such as

(a) Reject H₀ if value t ε C

(b) Accept H₀ if value t ε A.

After framing the above test, obtain the experimental sample observations, compute the test statistic and take action accordingly.