The Chi square (χ²) test is one of the most widely used non-parametric statistical tests in biostatistics. It is applied to categorical data to determine whether the observed frequencies differ significantly from expected frequencies. Because it does not require assumptions of normal distribution, it is extremely useful in clinical, pharmaceutical, and epidemiological research involving qualitative variables.

When to Use the Chi Square Test?

  • When data is categorical (nominal or ordinal).
  • When sample size is reasonably large (expected frequency ≥ 5 in most cells).
  • When comparing frequencies, proportions, or distributions.
  • When testing relationships between variables.

Types of Chi Square Tests

  • Chi-square Test for Independence – checks association between two categorical variables.
  • Chi-square Goodness-of-Fit Test – checks if observed frequencies match expected frequencies.

1. Chi Square Test for Independence

This test determines if there is a significant association between two variables (e.g., smoking status and lung disease).

Hypotheses

  • H₀: No association exists between the variables (they are independent).
  • H₁: There is an association between the variables.

Formula

χ² = Σ (O − E)² / E

  • O = Observed frequency
  • E = Expected frequency

Calculating Expected Frequency

E = (Row Total × Column Total) / Grand Total


2. Chi Square Goodness-of-Fit Test

This test compares observed frequencies with expected frequencies based on a theoretical distribution (e.g., equal proportions, historical data).

Hypotheses

  • H₀: Observed data fits the expected distribution.
  • H₁: Observed data does not fit the expected distribution.

Formula

Same as before:
χ² = Σ (O − E)² / E


Assumptions of the Chi Square Test

  • Data should be in the form of frequencies (not percentages).
  • Observations must be independent.
  • Expected frequency in each cell should ideally be ≥ 5.
  • Sample size should be adequate to ensure validity.

Step-by-Step Procedure

  1. State the null and alternative hypotheses.
  2. Make a contingency table (for independence) or list categories (for goodness-of-fit).
  3. Calculate expected frequencies using appropriate formulas.
  4. Compute χ² = Σ (O − E)² / E.
  5. Determine degrees of freedom (df).
  6. Find the critical value from the χ² distribution table.
  7. Compare calculated value with critical value.
  8. Draw a conclusion (reject or fail to reject H₀).

Degrees of Freedom

  • Goodness-of-Fit: df = k − 1 (where k = number of categories)
  • Test of Independence: df = (r − 1)(c − 1)

Interpreting the Chi Square Value

If the calculated χ² value is:

  • Greater than the critical value → Reject H₀ (significant difference/association).
  • Less than the critical value → Fail to reject H₀ (no significant difference).

Example 1: Chi Square Test for Independence

A study investigates the association between gender (male/female) and medication adherence (yes/no). After constructing the contingency table and calculating expected values, χ² is compared with the critical value to determine whether gender influences adherence.


Example 2: Chi Square Goodness-of-Fit Test

A researcher tests whether the distribution of blood groups (A, B, AB, O) in a sample matches the known distribution of the population. Differences between observed and expected frequencies indicate whether the sample distribution fits the population.


Advantages of the Chi Square Test

  • Simple and easy to compute.
  • Useful for categorical data analysis.
  • Does not require normal distribution.
  • Applicable to large datasets.

Limitations

  • Not valid for very small expected frequencies.
  • Cannot be used for continuous data unless grouped.
  • Does not measure strength or direction of association.

Detailed Notes:

For PDF style full-color notes, open the complete study material below:

Share your love