Random sampling

Random sampling, based on statistical probability theory, has two characteristics that make it extremely useful in practice to the decision maker. These are that it is possible to calculate the level of confidence and limits of accuracy of the results.

The 'level of confidence' refers to the fact that from a randomly drawn sample it is possible to work out the statistical probability that the sample is a good one. Very much like that famous toothpaste 'ring of confidence', it is heartening for a decision maker to hear from the researchers, 'these results are correct to the 95 per cent level of confidence'. What they are in effect saying is, 'there is only a one in twenty chance that this sample is a bad one'. The 95 per cent level is the most commonly used level of confidence in research and most decision makers in organizational settings are satisfied with data in which they can be assured of having this level of confidence, simply because so much other uncertainty surrounds the decision-making environment.

The second practical outcome of sound sampling practice referred to is the 'limits of accuracy'. Common sense dictates that when 1000 respondents are used to calculate the views of the 3,000,000 members of the population whom they represent, then the calculation is likely to be only an approximate one. This is, sample statistics can be used to calculate population parameters only within certain limits of accuracy, rather than with spot-on precision. It also makes sense that the more respondents included in the sample, the more accurate the calculation of the population parameters is likely to be, i.e. the larger the size of the sample the narrower the range of limits of accuracy. The relationship, however, is not in direct proportion. From a sample of 1000, results may be within limits of accuracy of + or - 10 per cent. To reduce that by half, i.e. to results of + or - 5 per cent, it would be necessary to include four times as many individuals in the sample, i.e. 4000 respondents. This point about the relationship between sample size and accuracy is considered further in Section 7.5.

The third, and important, practical advantage of sampling theory is that the level of confidence and limits of accuracy required in the results can be decided in advance of the survey. Use of the appropriate statistical calculations makes it possible to determine what size of sample will be required to produce findings to those specifications. While it is not essential to be able to perform personally the statistical calculations referred to, it is important to note that the quality and validity of the findings from large-scale quantitative research surveys are determined by the appropriateness with which these calculations are carried out and statistical concepts applied.

This book aims only to introduce marketing research and does not expect the reader to be equipped with a statistical background. Nor is it felt feasible or desirable to attempt to teach that statistical background in an introductory book. In the authors' experience it is those horrifying pages of statistical calculations that cause managers who started off with an interest in learning more about marketing research to decide that perhaps the subject is not for them. However, the authors are also aware of the desire of managers in very many types of organization to undertake their own research. Advice to the would-be 'do-it-yourself' researcher is that he or she needs to know far more about sampling theory and practice than the introduction given in this chapter will provide. The same advice holds for students of market research. The purpose of the following explanation of sample selection procedures is to provide a basis for understanding why particular selection techniques are used in sample surveys and to judge their appropriateness. It will also give an appreciation of the advantages and limitations of each method.

To summarize, the great advantage of random sampling techniques is that they allow statistical calculation of the appropriate sample size for predetermined levels of confidence (usually set at the 95 per cent level) and limits of accuracy (set to meet requirements of the decision to be made). It must be stressed that these calculations are only possible when random sampling techniques are used because random sampling is the only one of the three methods discussed in this chapter that is based on probability theory, from which the appropriate calculations derive. So, random sampling may be called 'probability sampling'.

The sampling frame

A randomly drawn sample is one in which every member of the population has a calculable chance of being included in the sample. If every member of the population is to have a chance of being included in the sample then it follows that every member of the population must be known in order to have that chance. So, the first step in drawing a random sample is to make a list of all the members of the population; this is referred to as the 'sampling frame'. The term 'frame' is used rather than 'list' because for certain types of sample the frame may be a map rather than a list. It is at this point that many attempts to apply random sampling techniques will flounder, simply because a sampling frame cannot be constructed. Commonly used frames are: the electoral register, for sampling households or adults over 18 years of age; trade directories, for sampling manufacturing, retailing and other organizations; and customer lists, for sampling customers of a particular firm. Difficulties arise, however, when the population to be sampled cannot be listed; for example, owners of digital personal assistants, buyers of office supplies or people who have taken more than one holiday in the past 12 months. Even when a sampling frame can be constructed it may turn out to be unusable in practice because of out-of-date addresses, incompleteness, duplication of entries, lack of geographical clustering or poor list organization. The greater any of these problems, the less reliable the frame as a basis for sampling. If the frame is not representative of the population of interest then any sample drawn from it will not be.

Simple random sampling

The point of any random sampling procedure is that there should be no personal influence in the selection of individuals to be interviewed. The simplest way to achieve this in theory is to cut up the population list into separate individuals, put all the strips of paper into a large container or tombola drum, give them a good mix round and then draw out the number required. Although straightforward in theory, this procedure is less so in practice, particularly on a windy day! Nowadays the most common method of drawing a simple random sample is to assign a number to every item on the list and to select the required number at random by using random number tables. (These are available in books and electronically. The numbers in them are produced by electronic means and checked for randomness.) There is also a number of computer packages that will generate random numbers.

Systematic (quasi-) random sampling

The most commonly used method of systematic random sampling is to select every nth number from the frame by dividing the number of items on the list by the number required in the eventual sample. In the earlier example, if the sampling frame contained 3,000,000 names and if we required 1000 respondents, then dividing 3,000,000 by 1000 would indicate that every 3000th name on the list must be selected. This is achieved by selecting the first name at random out of the first 3000 names and thereafter selecting every 3000th name. So, using a simple random method the first number selected might be 111; thereafter, the second number would be 3111, the third number 6111, and so on, and by the time the end of the list was reached 1000 names would have been drawn from it.

Random sample interviewing

Since the whole point of random sampling is that respondents are selected without any personal bias creeping into the process, it is important that the individuals selected are actually included in the samples by being interviewed. In practice, this means that interviewers must call back on the address they are given until they make contact with the individual named, and it is normal procedure to insist that three callbacks are made by the interviewer before giving up on an individual. The need for callbacks in the field is one of the reasons why random sampling is the most expensive form of sampling to use, but if the named individuals are not contacted then the whole point of using a random sampling selection procedure is lost.

Practical limitations of simple random sampling

While simple random sampling methods may be the most elegant in theory, they often turn out to be the most expensive to apply in practice, for several reasons. First, there is the time and expense involved in drawing up the sampling frame in the first place, particularly if it is a large one which has to be compiled from a number of different sources. Second, for a national sample it would be reasonable to expect that the respondents selected would be randomly distributed on a national basis. This will make the cost of fieldwork very high, since separate trips will need to be made to each location and, as has already been mentioned, up to three callbacks may have to be made in each location. To overcome these limitations two refinements in methods of random sampling have been developed. They are widely used in commercial research practice and have been found to yield results of acceptable accuracy, bearing in mind that most research is done to represent the views of typical members of the population under consideration, rather than its unusual members.

Random route sampling

In this form of sampling the district within which an interviewer will work is determined by random methods. However, within that district the interviewers must follow a prescribed 'route'. That is, they are given a set of instructions such as, 'Take the first turning on the right, the third on the left', and so on. The instructions contain details about where to start trying to obtain interviews and how many houses to leave before trying again after a successful interview has been achieved. The practical advantage of this system is that interviewers do not make a callback if they get no reply and this therefore reduces the time and cost involved in the survey. The possibility of bias arising from some individuals being more likely to be found at home than others can be overcome by setting controls on the numbers of, say, working women who must be included in each interviewer's required number of interviews.

Random location sampling

In simple and systematic random sampling procedures the final sampling unit selected is the individual to be interviewed. In random location sampling the sample is selected in such a way that the final unit of selection is a geographical unit rather than an individual. The interviewer must complete a given number of interviews within the geographical area, which is usually an enumerator district of about 150-200 homes. Which actual respondents should be selected within that location is determined by giving the interviewer a target number of individuals meeting specified age, gender, class and other relevant characteristic requirements. These targets are called 'quotas' and are further explained in Section 7.4.2. Random location sampling is a hybrid between random sampling and quota sampling which attempts to combine the best aspects of both sampling methods, i.e. the objectivity of random sampling combined with the cost-efficiency of quota sampling.

Stratification

The point of sampling is to represent the characteristics of the population of interest in the correct proportions but, because a sample is being used, only an estimate of the characteristics of interest can be derived from it rather than a precise value. However, it is often the case that certain characteristics of the population of interest are already known. In the case of the general population these characteristics are known from census data. In industrial and trade research certain characteristics of the population may be known from existing secondary sources or from previous original research data. When the proportions of certain important and relevant characteristics in the population being surveyed are known with certainty, then it makes sense to use this information as a way of improving the quality of the sample. The technique used to do this is 'stratification'. For stratified samples, the sampling frame is rearranged so that particular attributes of the population are grouped together in their known proportions. The sample is then selected by a random method from each group or 'stratum' in the same proportion in which the stratum exists in the population. One of the most common stratification systems used is to stratify by geographical regions, taking account of the population density. The proportion of survey respondents within each region is then calculated in proportion to the percentage of the population living in that region. By ensuring that each known segment of the population of interest is correctly reflected in the make-up of a sample, one possible source of inaccuracy in the sample is avoided.

The method of stratification is commonly applied in sampling wherever possible since it improves the accuracy of the sample. Age, gender, region and social class are four commonly used stratification variables in commercial market research. In industrial research, manufacturers may be stratified by standard industrial classification (SIC) grouping, number of employees or size of turnover. In trade research, retail outlets may be stratified by size of turnover, square footage of floor space, and so on. Whenever the proportions of relevant characteristics about the population are known with certainty it makes sense to apply them. This method of determining the allocation of a sample of respondents is also referred to as 'proportionate sampling'. This is because it has the effect of ensuring that important characteristics of the population are proportionately represented in the sample.

Multi-stage sampling

The method of stratification referred to above is often used as a basis for multi-stage sampling. This is a refinement of the random sampling technique which attempts to reduce the cost of random sampling at both the selection and fieldwork stages, without losing the element of randomness. In the case of sampling UK households, for example, at the first stage of the process a simple random sample may be taken of all parliamentary constituencies in the UK. For each constituency selected at the first stage a list of wards is compiled and a simple random sample selected of wards within each constituency. At the third stage each ward is divided into groups of streets known as polling districts and a simple random sample taken of these streets. This technique forms the basis for random location sampling, since the group of streets selected at the third stage forms the area used for interviewers to carry out their quota of interviews. The process could go one stage further and the selection of individual respondents be made from lists of names and addresses for the polling districts identified at the previous stage.

An advantage of multi-stage sampling is that compilation of the sampling frame is very much reduced. Even when the individual sampling unit is of names and addresses this does not arise from the need for a complete listing of all addresses in the UK in the first place. At the first stage the frame is restricted simply to a list of constituencies, electoral registers of names and addresses only being required for a limited number of areas at the final stage. A second advantage of the technique is that the final interviews end up being geographically clustered. This considerably reduces the administrative and travelling costs of carrying out the fieldwork in these areas.

Weighting

It may be that the research user is not equally interested in the views of members of all subgroups of the population, or that the user is particularly interested in analysing the views of just one small subgroup.

In a general survey about financial services, some information was required about those individuals using pension services. It was estimated that only 5 per cent of individuals in the target sector used pension services, so in the sample of 1000 respondents only 50 could be expected to be within this subgroup, which would not have been sufficient for detailed analysis of their views in isolation. In order to have a minimum of 100 respondents in this minority group for detailed analysis, it would be necessary to start off with an original sample twice as big, that is to say of 2000. This would have obviously undesirable effects on the timing and costs of the survey. An alternative way of overcoming the problem was simply to increase the number of respondents in the minority group to 100 without changing the rest of the sample. So the total sample size was actually 1050 respondents, of whom 100 were users of pension services.

For the purpose of analysing the results of the whole sample, the subgroup of users of pension services would have twice the representation of any other group. To correct this imbalance when analysing the content of the survey as a whole, results from this group were weighted downwards by a factor of two. However, with 100 respondents there was a sufficient number for a limited amount of analysis within the responses of that group. They could therefore be used for separate consideration as a group._

The procedure described is known as 'weighting'. It may be used, as in the example, to explore the particular views of minority groups without overinflating the total size of the sample. It can also be used to reduce the number of respondents selected from particularly large subgroups of the population, so as to reduce the overall cost of the survey. Results from this group are then weighted up by the factor with which they have been underrepresented in the original sample.

A trade survey of grocers' attitudes was carried out using Nielsen data. This indicated that cooperative and multiple shops accounted for 6 and 14 per cent, respectively, of all grocers, the remaining 80 per cent being independent grocers. The sample included 100 respondents from each type of outlet. In analysing the results for grocers generally, weighting factors were applied to reduce the input from co-operative and multiple shop respondents and to increase that of independent grocers. Each subgroup could also be analysed separately to identify differences between them. Had the sample wished to take account of differences in turnover rather than number of shops, the weighting factors would be based on Nielsen data showing that for share of turnover co-operatives represent 11 per cent, multiples 73 per cent and independents 16 per cent._

Your Retirement Planning Guide

Your Retirement Planning Guide

Don't Blame Us If You End Up Enjoying Your Retired Life Like None Of Your Other Retired Friends. Already Freaked-Out About Your Retirement? Not Having Any Idea As To How You Should Be Planning For It? Started To Doubt If Your Later Years Would Really Be As Golden As They Promised? Fret Not Right Guidance Is Just Around The Corner.

Get My Free Ebook


Post a comment