Skip to main content
Anime study buddy

What is statistics?

Statistics is the science of learning from data. It includes:
  • collecting data
  • organizing and summarizing data
  • analyzing patterns
  • interpreting results to make decisions

Major branches of statistics

Descriptive statistics

Descriptive statistics focuses on organizing, summarizing, and presenting data. Examples:
  • class average marks
  • highest and lowest temperature this week
  • bar chart of students by branch

Inferential statistics

Inferential statistics uses sample data to draw conclusions about a population. Because inference involves uncertainty, probability is essential. Examples:
  • surveying 200 voters to estimate support in a city
  • testing a sample of bulbs to estimate defect rate in full production

Population and sample

  • Population: the complete set of all units of interest.
  • Sample: a subset of the population used for study.
Example:
  • Population: all students in your college
  • Sample: 120 students chosen for a survey

Census vs sample survey

  • Census: data from every unit in the population
  • Sample survey: data from part of the population
Sample surveys are faster and cheaper, but must be designed carefully.

Parameter vs statistic

  • Parameter: numerical summary of a population (usually unknown)
  • Statistic: numerical summary from a sample (used to estimate parameter)
Example:
  • Population mean height = parameter
  • Sample mean height = statistic

Purpose of statistical analysis

  • If the goal is to describe and summarize observed information, the study is descriptive.
  • If the goal is to use a sample and make conclusions about a population, the study is inferential.
  • A descriptive study can be done on a sample or a population.

What is data?

Data are facts and figures collected for analysis, presentation, and interpretation. Data can be:
  • numbers (exam marks, income)
  • labels (department, blood group)
  • text or categories (feedback type)

Why do we collect data?

We collect data to understand characteristics of groups such as people, places, or things. Typical goals:
  • comparison (which section performed better)
  • prediction (next month sales)
  • decision-making (admit/reject, pass/fail)

Data collection

  • Primary data: collected first-hand for current study
  • Secondary data: already collected by someone else (reports, government data, publications)
Primary data methods:
  • surveys
  • experiments
  • observations
Secondary data sources:
  • census reports
  • institutional records
  • research articles

Cases and variables

  • Case (observation): a unit from which data are collected.
  • Variable: a characteristic or attribute that can vary across cases.
Example (school dataset):
  • Cases: each student.
  • Variables: name, date of birth, marks, board, etc.
In a data table:
  • Rows represent cases.
  • Columns represent variables.
Note: a missing value (“not available”) and a value of 0 are not the same. Also note:
  • variable names should be clear (attendance_percent, final_marks)
  • keep units consistent in one column

Categorical and numerical variables

Categorical data (qualitative)

  • Represents labels or groups.
Examples:
  • gender
  • branch (CSE, ECE, ME)
  • grade (A, B, C)
Common types:
  • Nominal: categories with no natural order (blood group)
  • Ordinal: categories with order (poor < average < good)

Numerical data (quantitative)

  • Describes numerical properties of cases.
  • Uses measured units.
Examples:
  • age, height, weight
  • number of siblings
  • salary
Common types:
  • Discrete: countable values (0, 1, 2, ...)
  • Continuous: measurable values on a scale (height, time, temperature)

Measurement units

The unit gives meaning to numerical values (for example, kilograms for weight, rupees for price, centimeters for height).
Values in a numerical variable should be recorded in a common unit.
Bad practice: mixing cm and m in one height column without conversion.

Scales of measurement (important)

  • Nominal: labels only
  • Ordinal: rank/order, unequal gaps
  • Interval: equal gaps, no true zero (temperature in Celsius)
  • Ratio: equal gaps, true zero (height, weight, income)

Data classification

  • Categorical
  • Numerical
    • Discrete
    • Continuous

Cross-sectional and time-series data

  • Cross-sectional data: data observed at one point in time across cases.
  • Time-series data: data recorded over time for one case or unit.
  • Time plot: a graph of time-series values in chronological order.
Examples:
  • Cross-sectional: income of 50 households in March 2026
  • Time-series: monthly electricity bill of one hostel from Jan-Dec 2025

Common mistakes to avoid

  • confusing population with sample
  • treating category labels as numbers for arithmetic
  • ignoring units while comparing values
  • assuming sample results are exact for population without uncertainty