Learning outcomes
- define the basic language of statistics
- distinguish population from sample
- distinguish parameter from statistic
- identify cases and variables in a dataset
What is statistics?
- Statistics is the study of collecting, organizing, analyzing, and interpreting data.
- It helps us make sense of information and support decisions.
Two major branches of statistics
Descriptive statistics
- Organizes and summarizes data already collected.
- Uses:
- tables
- averages
- percentages
- charts and graphs
- average marks in a class
- bar chart of students by department
- minimum and maximum rainfall this month
Inferential statistics
- Uses sample data to draw conclusions about a larger population.
- Because a sample may not match the population exactly, inference includes uncertainty.
- estimating average income in a city from a survey
- checking a sample of bulbs to estimate factory defect rate
What is data?
- Data are facts, observations, or measurements collected for analysis.
- Data can be:
- numbers
- categories
- labels
- recorded responses
- age
- exam marks
- blood group
- gender
- branch
Population and sample
- Population: the complete set of all units of interest.
- Sample: a subset selected from the population.
- Population: all students in a college
- Sample: 200 students surveyed from that college
Census and sample survey
- Census: information is collected from every unit in the population.
- Sample survey: information is collected from only a part of the population.
- faster
- cheaper
- easier to manage
Parameter and statistic
- Parameter: numerical summary of a population
- Statistic: numerical summary of a sample
- population mean = parameter
- sample mean = statistic
- If the question mentions “all students”, “all households”, or “entire production”, think population and parameter.
- If it mentions “surveyed 100”, “sample of 50”, or “selected units”, think sample and statistic.
Cases and variables
- Case or observation: one individual unit from which data are collected.
- Variable: a characteristic measured or recorded for each case.
- In a student dataset:
- case = one student
- variables = marks, age, department, attendance
Data table structure
- Rows represent cases.
- Columns represent variables.
| Student | Age | Marks | Department |
|---|---|---|---|
| A | 18 | 82 | CSE |
| B | 19 | 75 | ECE |
Common mistakes to avoid
- confusing sample with population
- calling every numerical result a parameter
- thinking descriptive statistics is “less important”
- confusing case with variable
Exam hints and traps
- Population means the whole target group, not just the available group.
- A sample is used to learn about the population.
- Statistic comes from a sample, parameter belongs to the population.
- A dataset may describe a sample even if only descriptive measures are computed.
Quick practice
- A company tests 80 batteries out of 5000. Identify population and sample.
- “Average lifetime of the 80 batteries” is parameter or statistic?
- In a class record sheet, identify one case and three variables.
Answer key
- Population: all 5000 batteries; sample: 80 tested batteries
- Statistic
- Case: one student; variables: roll number, marks, attendance
