Skip to main content

Learning outcomes

  • understand association between one categorical and one numerical variable
  • compare group centers and spreads
  • choose sensible summaries for grouped comparison
  • interpret differences without forcing causal claims

What does this case look like?

  • One variable is categorical.
  • The other variable is numerical.
Examples:
  • department and marks
  • gender and height
  • hostel/day-scholar status and attendance

How to study this association

  • Compare the numerical variable across categories using:
    • mean
    • median
    • spread
    • grouped plots if provided

Questions to ask

  1. Which group has larger average?
  2. Which group has more spread?
  3. Are the group differences large or small?
  4. Is the comparison affected by outliers?

Example interpretation

  • “Students in Group A have a higher average score than students in Group B.”
  • “The spread of marks is larger in Group B.”

Exam hints and traps

  • Do not summarize the categorical variable with a mean.
  • The numerical variable is what gets averaged or compared.
  • Difference between groups does not itself prove cause.
  • Median may be more useful than mean when outliers are present.

Quick practice

  1. In “department and marks”, which variable is numerical?
  2. Can you compute a mean for department names?
  3. What summary is useful for marks across departments?

Answer key

  1. Marks
  2. No
  3. Mean, median, and spread by department