## Describing data

The purpose of descriptive statistics is to give the user an impression of the location of the data and its spread. The statistics used are frequency, percentage, average and dispersion.

Frequency

The simplest kind of statistical description is a straight frequency count of the number of responses in each category.

For example, 13,337,000 adults drink mineral water in Great Britain. These data can also be represented as a frequency distribution, which shows how frequencies (i.e. numbers in the category) are distributed across a number of categories (in this case, frequency of drinking mineral water). This is shown as a histogram in Figure 10.3._

4000 -i

1000

3023

347B

3565

2795

Once Once Once Lessthen a day a week amantfi once a month

Figure 10.3: Frequency of drinking mineral water. Source: Mintel/TGI

### Percentage

While straight frequency counts are useful to give an idea of the absolute values involved, percentages will indicate what shares of particular markets are concerned, and are useful for making comparisons. In the example in Figure 10.3, the 13.4 million mineral water drinkers in Great Britain were 29.5 per cent of all adults. This is of interest to those involved in the business, particularly when compared with other percentage statistics, such as 47 per cent of buyers bought mineral water because it was natural, and 38 per cent because it was good for them. Information like this can help to frame all kinds of business decisions, from new product development to advertising and promotional messages.

Averages

Averages are a useful device for indicating with just one number roughly where the data as a whole is located.

For example, in October 2001 about 77 per cent of homes bought lamb and the average household consumption of lamb was 4.3 kg a year. Multiplying average consumption by the number of households buying lamb will give the total retail market size for lamb in the UK.

The average used in this example is the arithmetic mean, which is the most commonly used and colloquially understood measure of average.

Another useful measure of average is called the mode. This refers to the most frequently occurring figure. The mode is useful in market studies concerning brand usage, or when it is important to know the most frequently mentioned brand in a brand recall survey.

A less frequently used average is the median. This is the middle value when all responses are arranged in order. In considering income, the mean average is likely to be a higher value than the median. This is because the mean will be drawn upwards by very high salaries at the top end of the range, whereas the median will indicate what the middle earner is earning.

### Dispersion

Measures of dispersion give more information about the mean because they indicate the range of values around it. In the example above, the mean consumption of lamb was 4.3 kg a year. A measure of dispersion would give an idea of how much variation exists around this figure.

The simplest measure of dispersion is called the range and is simply the highest minus the lowest value in the data.

In the case of lamb consumption, the range was from 0 to 25 kg per year, i.e. the range value was 25 kg. This indicates that if the mean was only 4.3 kg then most meat eaters are not buying very much lamb since the top end of the range is 25 kg. However, it could be that only one household in the sample actually consumed 25 kg of lamb in a year and the next nearest level of consumption was 15 kg. If this were the case then our interpretation about the variety of consumption behaviour would be inaccurate._

To avoid this problem, where the absolute values at each end of the range may be extremely unrepresentative of the actual spread of the data, a standard measure of spread has been calculated. This measure indicates the values within which large proportions of the data lie, and is called the standard deviation. The same measure, when calculated from sample data rather than population data, is called the standard error. The usefulness of this measure lies in the fact that once it has been calculated it is known that (roughly) 68 of all values lie within 1 standard deviation/standard error of the mean, 95% of all values lie within 2 standard deviations/standard errors of the mean and 99.7% of all values lie within 3 standard deviation/standard errors of the mean. For most business purposes it is common to work at the level of 2 standard deviations/standard errors of the mean, that is, where 95% of all the values lie. In the lamb consumption example, the standard error was 1 kg. This means that 95% of all households consuming lamb were eating between 2.3 and 6.3 kg per year. This gives a much better feeling for average consumption behaviour than either the mean on its own or the use of the range.

## Online Survey Champion

There are people all over the world trying to find ways to make money online. From stay at home moms looking to make a few extra dollars to college students and entrepreneurs, the allure of making your own hours and working from home or from the local coffee shop is very appealing.

Get My Free Ebook