Difference between numbers in a series of arrays

In this case a comparison is made between the answers given by people with different sets of characteristics to a particular question. Characteristics of the people include demographics, such as geography, household size, the presence of children, income, sex and age. These are cross-tabulated against such things as awareness or the use of a product category, brand and purchase intent.

EXAMPLE

Question: Are you aware of brand X washing machines? Table 10.1 shows the results.

Awareness results

Yes N o Tota l

Men

50

200

250

Women

150

300

450

Total

200

500

700

Simple inspection of the data would seem to suggest there is a difference between men and women in terms of awareness level. However, we really want to know if the difference observable in cross-tabulating the data is statistically significant: in other words, what the probability is that observed differences could have occurred by chance. In this particular instance we can use the chi-square test to see if this is the case (we should, however, consult standard statistical texts to appreciate the suitability of this test and to understand its theoretical underpinning).

The chi-square statistic is calculated by the formula:

where 0 , = the observed value and E, = the theoretically expected value, assuming in this case that there is no difference between the occupational backgrounds of respondents (see Table 10.2).

We hypothesise that there is no relationship. However, x2 at 2 degrees of freedom at the 5% level = 3.84 and because 13.5 > 3.84 the difference noted in the sample is statistically significant. We must reject the null hypothesis and conclude that the difference in levels of awareness in men and women did not occur by chance.

Breakdown of brand awareness

Yes

No

Total

Men (0 = observed)

50

200

250

(250 x 200)/700

(250 x 500)/700

(E = expected)

= 71

= 179

Women (0 = observed)

150

300

450

(450 x 200)/700

(450 x 500)/700

(E = expected)

= 129

= 321

Total

200

500

700

Note: Expected values are rounded off to the nearest whole number.

X2 = (50 - 71)2/71 + (200 -179)2/179 + (150 - 129)2/129 + (300 - 321)2/321 = 13.5 with degrees of freedom (r - 1)(k - 1) = 1

Note: Expected values are rounded off to the nearest whole number.

X2 = (50 - 71)2/71 + (200 -179)2/179 + (150 - 129)2/129 + (300 - 321)2/321 = 13.5 with degrees of freedom (r - 1)(k - 1) = 1

where r = number of rows, k = number of columns

In the example, using a 2 x 2 table, it is usually preferable to use Fisher's exact test or include Yates' correction where there are a small number of cases. In the latter case the formula is modified to:

Where the sample size is so small that the expected value is less than 5, then chi-square should not be used.

Chi-square is a statistical tool used to evaluate the statistical significance of differences between sets of data. Basically, it compares one or more frequency distributions of data to indicate whether there is a real difference. It compares an actual set of data against a theoretical one to show what would be expected by chance alone.

EXAMPLE

Chi-square

A market researcher has completed a study of soft drinks. The following table shows the brand purchased most often, broken down by male versus female. The researcher wants to know if there is a relationship between the gender of the purchaser and the brand purchased.

Drink

Male

Female

Coke

SS

52

Pepsi

S?

48

7-Up

35

38

Dr Pepper

34

21

Sprite

32

41

Lilt

33

31

Blackjack

18

25

Tango

34

24

Chi-square 'male' 'female'

Expected counts are printed below observed counts:

Male

Female

Total

1

SS

52

118

S2.84

55.1S

2

S?

48

115

S1.24

53.?S

3

35

38

73

38.88

34.12

4

34

21

55

29.29

25.?1

5

32

41

73

38.88

34.12

S

33

31

S4

34.08

29.92

?

18

25

43

22.90

20.10

8

34

24

58

30.89

27.11

Total

319

280

599

Use of similarities between numbers to show cause and effect

=-0.159 + 0.181 + 0.541 + 0.616 + 0.387 + 0.440 + 0.757 + 0.863 + 1.216 + 1.386 + 0.034 + 0.039 + 1.048 + 1.194 + 0.314 + 0.357

The tabular value of at 0.05 level of significance and (8 - 1)(2 - 1) = 7 degrees of freedom is 14.07. The calculated value (9.533) is lower than the tabular value (14.07). There would not seem to be a significant relationship between gender of the purchaser and the brand purchased.

You are Hired

You are Hired

Have You Managed To Land The Job Yet? Are You Fed Up To The Eyeballs And Sick To Death Of Applying For No End Of Jobs And Still Turning Up Empty Handed? Do You Wonder Why Others Are Getting All The Jobs And Not You?

Get My Free Ebook


Post a comment