By Ruben Geert van den Bergunder Chi-Square exam & Statistics A-Z

A chi-square goodness-of-fit check examines if a categorical variablehas some hypothesized frequency distribution in some population.The chi-square goodness-of-fit check is likewise known as

Example - Testing car Advertisements

A automobile manufacturer desires to beginning a project for a new car. They'll display advertisements -or “ads”- in 4 different sizes. For advertisement each size, they have 4 ads that try to convey some blog post such together “this auto is environmentally friendly”. They then asked N = 80 human being which advertisement they favored most. The data thus obtained are in this Googlesheet, partly presented below.

You are watching: Chi square goodness of fit spss


So i beg your pardon ads performed finest in our sample? Well, we deserve to simply look up which advertisement was desired by many respondents: the advertisement having the greatest frequency is the setting for each ad size. For this reason let's have actually a look in ~ the frequency circulation for the an initial ad dimension -ad1- together visualized in the bar chart presented below.

Observed Frequencies and also Bar Chart


The observed frequencies displayed in this graph are

Safe and also Family Friendly: 6 Luxurious and Masculine: 29 Environmentally Friendly: 16 Spacious and Convenient: 29

Note that ad1 has actually a bimodal distribution: ads 2 and 4 are both winners v 29 votes. However, ours data only host a sample the N = 80. Socan we conclude that ads 2 and 4also perform ideal in the entire population?The chi-square goodness-of-fit answers simply that. And for this example, the does for this reason by trying to reject the null theory that every ads execute equally well in the population.

Null Hypothesis

Generally, the null hypothesis for a chi-square goodness-of-fit test is simply

$$H_0: P_01, P_02,...,P_0m,\; \sum_i=0^m\biggl(P_0i\biggr) = 1$$

where \(P_0i\) denote population proportions because that \(m\) categories in part categorical variable. Girlfriend can pick any collection of proportions as lengthy as they include up to one. In plenty of cases, every proportions being equal is the most likely null hypothesis.For a dichotomous variable having only 2 categories, you're much better off using

Anyway, because that our example, we'd prefer to show that some ads perform much better than others. So we'll try to refute that our 4 populace proportions space all equal and also -hence- 0.25.

Expected Frequencies

Now, if the 4 population proportions really space 0.25 and we sample N = 80 respondents, climate we suppose each ad to be preferred by 0.25 · 80 = 20 respondents. The is, all 4 intended frequencies space 20.We need to know these expected frequencies because that 2 reasons:

computing our test statistic calls for expected frequencies andthe assumptions for the chi-square goodness-of-fit check involve intended frequencies together well.


The chi-square goodness-of-fit test calls for 2 assumptions2,3:

independent observations;for 2 categories, each meant frequency \(Ei\) should be at the very least 5. Because that 3+ categories, each \(Ei\) should be at least 1 and also no much more than 20% of all \(Ei\) may be smaller sized than 5.

The monitorings in ours data space independent since they are unique persons that didn't connect while perfect our survey. We also saw that all \(Ei\) room (0.25 · 80 =) 20 because that our example. So this second assumption is met together well.


We'll first compute the \(\chi^2\) test statistic as

$$\chi^2 = \sum\frac(O_i - E_i)^2E_i$$


For ad1, this results in

$$\chi^2 = \frac(16 - 20)^220 + \frac(29 - 20)^220 + \frac(9 - 20)^220 + \frac(29 - 20)^220 = 18.7 $$

If all assumptions have to be met, \(\chi^2\) approximately follows a chi-square distribution with \(df\) degrees of freedom where

$$df = m - 1$$

for \(m\) frequencies. Since we have 4 frequencies because that 4 different ads,

$$df = 4 - 1 = 3$$

for our example data. Finally, we can simply look increase the definition level as

$$P(\chi^2(3) > 18.7) \approx 0.00032$$

We ran this calculations in this Googlesheet displayed below.


So what go this mean? Well, if all 4 ads space equally desired in the population, there's a 0.00032 chance of finding our observed frequencies. Since p Right, therefore it's safe to assume that the populace proportions room not every equal. Yet precisely how different are they? We can express this in a single number: the effect size.

Effect dimension - Cohen’s W

The result size because that a chi-square goodness-of-fit check -as well as the chi-square freedom test- is Cohen’s W. Part rules that thumb1 room that

Cohen’s W = 0.10 shows a small effect size;Cohen’s W = 0.30 shows a medium impact size;Cohen’s W = 0.50 indicates a large effect size.

Cohen’s W is computed as

$$W = \sqrt\sum_i = 1^m\frac(P_oi - P_ei)^2P_ei$$


\(P_oi\) represent observed proportions and\(P_ei\) represent expected proportions under the null hypothesis for\(m\) cells.

For ad1, the null hypothesis says that all expected proportions room 0.25. The observed proportions are computed indigenous the observed frequencies (see screenshot below) and result in

$$W = \sqrt\frac(0.2 - 0.25)^20.25 +\frac(0.3625 - 0.25)^20.25 +\frac(0.075 - 0.25)^20.25 +\frac(0.3625 - 0.25)^20.25 = $$

$$W = \sqrt0.234 = 0.483$$

We ran this computations in this Googlesheet presented below.


For ad1, the impact size \(W\) = 0.483. This suggests a huge overall difference between the observed and expected frequencies.

Power and also Sample dimension Calculation

Now that us computed our impact size, we're all set for our critical 2 steps. Very first off, what about power? What's the probability demonstrating an effect if

we test at α = 0.05;we have actually a sample the N = 80;df = 3 (our outcome variable has actually 4 categories);we don't recognize the populace effect dimension \(W\)?

The chart below -created in G*Power- answers simply that.


Some an easy conclusions are that

power = 0.98 for a large effect size;power = 0.60 for a medium result size;power = 0.10 for a small impact size.

These outcomes are not as well great: us only have actually a 0.60 probability the rejecting the null theory if the population effect dimension is medium and also N = 80. However, we deserve to increase power by increasing the sample size. For this reason which sample sizes execute we need if

we test at α = 0.05;we desire to have power = 0.80;df = 3 (our result variable has 4 categories);we don't recognize the population effect size \(W\)?

The chart below shows how forced sample size decrease through increasing impact sizes.


Under the abovementioned conditions, we have power ≥ 0.80

for a large result size if N = 44;for a medium impact size if N = 122;for a small result size if N = 1091.

See more: First Time Using Affirm Monthly Payments Reddit, Purchasing With Affirm


Cohen, J (1988). Statistical Power analysis for the Social scientific researches (2nd. Edition). Hillsdale, brand-new Jersey, Lawrence Erlbaum Associates. Siegel, S. & Castellan, N.J. (1989). Nonparametric Statistics because that the behavior Sciences (2nd ed.). Singapore: McGraw-Hill. Warner, R.M. (2013). Applied Statistics (2nd. Edition). Thousands Oaks, CA: SAGE.