A Statistical-Why?

Q1. Why do we study samples when we want to know about populations?
Samples that representing the population are preferable because:
Cost: Cost is one of the main arguments in favor of sampling, because often a sample can furnish data of sufficient accuracy and at much lower cost than a census.
Accuracy: Much better control over data collection errors is possible with sampling than with a census, because a sample is a smaller-scale undertaking.
Timeliness: Another advantage of a sample over a census is that the sample produces information faster. This is important for timely decision making.
Amount of Information: More detailed information can be obtained from a sample survey than from a census, because it take less time, is less costly, and allows us to take more care in the data processing stage. Destructive Tests: When a test involves the destruction of an item under study, sampling must be used. Statistical sampling determination can be used to find the optimal sample size within an acceptable cost.
Q2. Why do we study random sample instead of just any sample?
Random sampling provides equal chance to each individual member of the population to be selected for investigation.  Random samples therefore are unbiased in their being representative of the population under investigation.
Q3. Why is the median sometimes better than the mean as an indicator of the central tendency?
The central tendency is often measured by the mean because the other two measures namely median and node are almost the same for a homogeneous population having symmetric distribution. However, if the distribution is severely skewed, then one must use the median as a single value representing population, such as salary in your organization.

Q4. Why is standard deviation a better measurement of data variation than the range?
Standard deviation uses the entire data, while the range uses only the two extreme values.  Therefore, range is sensitive not only to the outliers but less stable than standard deviation.

Q5. Why is P(A and B) = P(A)P(B|A) = P(B)P(A|B)?
 It is by definition that P(A|B) = P(A and B)/P(B) provided P(B) is non-zero.
Similarly, P(B|A) = P(A and B)/P(A) provided P(A) is non-zero.
The rest follows. Right?

Q6. Why is P(A or B or both) = P(A) + P(B) – P(AÇB)?
 P(A or B or both) = P(only A) + P(only B) + P(both) = 
 [P(A) - P(both)] + [P(B) -  P(both)] + P(both) =
 P(A) + P(B) - P(both) = P(A) + P(B) – P(AÇB

Q7. If in an experiment there are three possible outcomes (a, b, c) and their probabilities are P(a) = .3, P(b) = .4, and P(c) = .5, why must at least two of the three outcomes not independent of each other?
 Since the sum of the probabilities is not equal to one, it implies that these three events are not Simple Events.  That is, at least one of the events is a composite event depending on at least one of the other events. 
Q8. Why do we use S(x – x bar)2 to measure variability instead of S(x - xbar)?
Because, if we add up all positive and negative deviations, we get always zero value, i.e., S(x – x bar) = 0. So, to deal with this problem, we square the deviations. Why not using power of four (three will not work)? Squaring does the trick; why should we make life more complicated than it is?
Notice also that squaring also magnifies the deviations; therefore it works to our advantage to measure the quality of the data.
Q9. To approximate the binomial distribution, why do we sometimes use the Poisson distribution and sometimes use the normal distribution?
Poisson approximation to binomial is a discrete-to-discrete approximation; therefore it is preferable to the normal approximation.  However, just as binomial table is limited, the Poisson table is limited too in its scope; therefore one may have to approximate both by normal.

Q10. Why is the (1 – a)100% confidence interval equal to x ± za/2sx?
It is the case of Single Observation, i.e., n=1. Therefore, if the population is normal with known standard deviation sx then the above confidence interval is correct.
Q11. Why are stratified random samples “random”?
Whenever we have a mixture of population, no standard statistical technique is applicable.  In such a case one must take sample from each stratum randomly and then apply statistical tools to each sub-population.  Never mix apples with oranges.

Q12. Why are cluster samples “random”?
It is similar to the stratified sampling in its intents, however often cluster sample are within each cluster randomly.

Q13. Why do we usually test for Type I error instead of Type II error in hypothesis testing?
Because the null hypothesis is always specified in exact form with (=) sign.  Therefore one can talk about rejecting or not rejecting the null hypothesis.  However, if the alternative is also specified in exact form with (=) sign, then one in able to compute both types of errors.

Q14. Why the “margin of error” is often used as a measure of accuracy in estimation?

When estimating a parameter of a population based on a random sample, one has to provide the degree of accuracy.  The accuracy of the estimate is often expressed by a confidence interval with specific confidence level. 

The half-length of the confidence interval is often referred to as absolute error, absolute precision, and even margin of error.  However, the usual usage of the “marginal of error” is referred to the half-length of confidence interval with 95% confidence.

Q15. Why there are so many statistical tables? Which one to use?
Statistical tables are used to construct confidence interval in estimation, as well as reaching reasonable conclusions in test of hypotheses.  Depending on application areas, one may, for example classify the two major statistical tables as follows:
T - Table: expected value of population(s), regression coefficients, and correlation(s).
Z - Table: Similar to the T-table, with large-size (say over 30).

Q16.  Why do we use the p-value? What is it?
The p-value is the tail probability of the test statistic value given that the null hypothesis is true. Since the p-value is a function of a test statistic, which is a function of sample data, therefore it is a statistic as well as a conditional probability.This is analogous to the method of maximum likelihood parameter estimation wherein we consider the data to be fixed and the parameter to be variable.

Q17. Why is linear regression a good model when the range of the independent variable is small?
Most statistical models are not linear, however if we are interested in a small range then, almost all non-linear function can be approximated by a straight line.

Q18. Why does high correlation not imply causality?
Determination of cause-and-effect is not in the statistician’s job description.
Any specific cause-and –effect belongs to specific areas of knowledge subject to rigorous experimentation. Correlation measures the strength of linear numerical relation, called function.  A function simply converts something into something else.    Your coffee grounder is a function.  The cause in this example is mechanical force in grounding the coffee bins.     

Q19. Why would ANOVA and performing t-test for each pair of samples not necessarily give the same conclusion at the same confidence level?
It is because any pair-wise comparison of means is never a substitute for the simultaneous comparison of all means. Moreover, it is not an easy task to compute the exact confidence level from the pair-wise confidence levels.