Statistics provides justifiable answers to the following concerns for every consumer and producer:
- What is your or your customer's, Expectation of the product/service you sell or that your customer buys? That is, what is a good estimate for µ?
- Given the information about your or your customer's, expectation, what is the Quality of the Product/service you sell or that your customers buy from your products or services? That is, what is a good estimate for quality (e.g., σ, or C.V.)?
- Given the information about your or your customer's expectation, and the quality of the Product/service you sell or you customer buy, how does the product/service compare with other existing similar types? That is, comparing several µ's, and several σ's.
1. Statistical techniques are methods that convert data into information. Descriptive techniques describe and summarize; inferential techniques allow us to make estimates and draw conclusions about populations from samples.
2. We need a large number of techniques because there are numerous objectives and types of data. There are three types of data: quantitative (real numbers), qualitative (categories) and ranked (rating). Each combination of data type and objective requires specific techniques.
3. We gather data by various sampling plans. However, the validity of any statistical outcome is dependent on the validity of the sampling.
4. The sampling distribution is the source of statistical inference. The interval estimator and the test statistic are derived directly from the sampling distribution.
5. All inferences are actually probability statements based on the sampling distribution. Because the probability of an event is defined as the proportion of times the event occurs in the long run, we must interpret confidence interval estimates in these terms.
6. All tests of hypotheses are conducted similarly. We assume that the null hypothesis. We than compute the value of the test statistic. If the difference between what we have observed (and calculate) and what we expect to observe is too large, we reject the null hypothesis. The standard that decides what is "to large" is determined by the probability of a Type I error.
7. In any test of hypothesis (and in most decisions) there are two possible errors, Type I and Type II errors. The relationship between the probabilities of these errors helps us decide where to set the standard. If we set the standard so high that the probability of a Type I error is very small, we increase the probability of a Type II error. A procedure designed to decrease the probability of a Type II error must have a relatively large probability of a Type I error.
8. The sampling distributions that are used for quantitative data are the Student-t, chi-square statistic and the F. We can use the analysis of variance in place of the t-test of two means. We can use regression analysis with indicator variables in place of the analysis of variance. We often build a model to represent relationships among quantitative variables, including indicator variables.
9. When you take a sample from a population and compute the sample mean, it will not be identical to the mean you would have gotten if you had observed the entire population. Different samples result in different means. The distribution of all possible means values of the mean, for sample of a particular size, is called the sampling distribution mean.
10. The variability of the distribution of sample means depends on how large your sample is and on how much variability there is in the population from which the samples are taken. As the size of the sample increases, the variability of the sample means decreases. As variability in a population increases, so does the variability of the sample means.
11. A normal distribution is bell-shaped. It is a symmetric distribution in which the mean, median, and mode all coincide. In the population, many variables, such as height and weight, have distributions that are approximately normal. Although normal distributions can have different means and variances, the distribution of the cases about the mean is always the same. You use standard Z scores to locate an observation within a distribution. The mean of standard Z scores is 0, and the standard deviation is 1.
12. The Central Limit Theorem states that for samples of a sufficiently large size, the distribution of sample means is approximately normal. (That's why the normal distribution is so important for data analysis.)
13. A confidence interval provides a range of values that, with a designated likelihood, contains e.g. the population mean, or σ.
14. To test the null hypothesis that two population mean are equal, you must calculate the probability of seeing a difference at least as large as the one you've observed in your two samples, if there is no difference in the populations.
15. The probability of seeing a difference at least as large as the one you've observed, when the null hypothesis is true, is called the observed significance level, or P-value. If the observed significance level is small, usually less than .05, you reject the null hypothesis.
16. If you reject the null hypothesis when it's true, you make a Type 1 error. If you don't reject the null hypothesis when it's false, you make a Type 2 error.
17. The techniques used on qualitative data require that we count the number of times each category occurs (e.g., the Runs test). The count is then used to compute statistics. The sampling distributions we use for qualitative data are the Z (i.e., the standard normal) and the chi-squared distributions.
18. The techniques used on ranked data are based on a ranking procedure. Statisticians call these techniques nonparametric. Because the requirements for the use of nonparametric techniques are less than those requirements for a parametric procedure, we often use nonparametric techniques in place of parametric ones when any of the required conditions for the parametric tests is not satisfied.
19. We can obtain data through experimentation or by observation. Observational data lend themselves to several conflicting interpretations. Data gathered by an experiment are more likely to lead to a definitive interpretation.
20. Statistical skills enables to intelligently collect, analyze and interpret data relevant to their decision-making. Statistical concepts and statistical thinking enables to: Solve problems in a diversity of contexts, Add substance to decisions, Reduce guesswork.