Statistical Terminology

We often make use of statistical techniques in the analysis of findings. Some are highly sophisticated and complex, but those most often used are easy to understand. The most common are measures of central tendency (ways of calculating averages) and correlation coefficients (measures of the degree to which one variable relates consistently to another). There are three methods of calculating averages, each of which has certain advantages and shortcomings. Take as an example the amount of personal wealth (including all assets such as houses, cars, bank accounts and investments) owned by 13 individuals. Suppose they own the following amounts:

1. £000 (zero)
2 £5,000
3. £10,000
4. £20,000
5. £40,000
6. £40 ,000
7. £40,000
8. £80,000
9. £100,000
10. £150,000
11. £200,000
12. £400,000
13. £10,000,000

The mean corresponds to the average, arrived at by adding together the personal wealth of all 13 people and dividing the result by 13. The total is £11,085,000; dividing this by 13, we reach a mean of £852,692.31. This mean is often a useful calculation because it is based on the whole range of data provided. However, it can be misleading where one or a small number of cases are very different from the majority. In the above example, the mean is not in fact an appropriate measure of central tendency, because the presence of one very large figure, £ 1 0,000,000 skews the picture. One might get the impression when using the mean to sumnarize this data that most of the people own far more than they actually do. In such instances, one of two other measures may be used. The mode is the figure that occurs most frequently in a given set of data. In our example, it is £40,000. The problem with the mode is that it does not take into account the overall distribution of the data - i.e., the range of figures covered. The most frequently occurring case in a set of figures is not necessarily representative of their distribution as a whole and thus may not be a useful average. In this case, £40,000 is too close to the lower end of the figures. The third measure is the median, which is the middle of any set of figures; here, this would be the seventh figure, again £40,000. Our example gives an odd number of figures 13. If there had been an even number - for instance, 12 - the median would be calculated by taking the mean of the two middle cases. Like the mode, the median gives no idea of the actual range of the data measured.

Sometimes a researcher will use more than one measure of central tendency to avoid giving a deceptive picture of the average. More often, he will calculate the standard deviation for the data in question. This is a way of calculating the degree of dispersal, or the range, of a set of figures - which in this case goes from zero to £10,000,000. Correlation coefficients offer a useful way of expressing how closely connected two (or more) variables are. Where two variables correlate completely, we can speak of a perfect positive correlation, expressed as 1. Where no relation is found between two variables - they have no consistent connection at all, the coefficient is zero. A perfect negative correlation, expressed as -1, exists when two variables are in a completely inverse relation to one another. Correlations of the order of 0.6 or more, whether positive or negative, are usually regarded as indicating a strong degree of connection between whatever variables are being analysed. Positive correlations on this level might be found between, say, social class background and voting behaviour.

Statistical Concepts and Analytics Explained

Statistical Terminology - Explained