Grade 9 Mathematics Unit 9 : Statistics and Probability
Grade 9 Mathematics · Unit 9 · Chapter 9.1 · QuickNotes

Statistical Data

These QuickNotes give you the most important points to remember for the ESSLCE. They are based on the MoE Grade 9 Mathematics Textbook, Unit 9, Chapter 9.1.

~8 min read
Summary
  • Statistics is the science of collecting, organizing, presenting, analyzing and interpreting data.
  • A population is the whole group you want to study, and a sample is the small part of it that you actually examine.
  • A frequency distribution table organizes raw data, and a histogram draws that table as touching bars.
  • The mean, median and mode describe the center of the data. The range, variance and standard deviation describe how spread out it is.

Key Words

  • Statistics: the science of collecting, organizing, presenting, analyzing and interpreting data.
  • Raw data: data that has been collected but not yet organized.
  • Population: the complete collection of people or objects that share a common characteristic.
  • Sample: a small part of the population that you examine instead of the whole group.
  • Frequency: the number of times a value appears in the data.
  • Frequency distribution table: a table that lists each value (or class) together with its frequency.
  • Histogram: a graph of a frequency distribution where the variable is on the x-axis and the frequency is on the y-axis.
  • Mean: the sum of all values divided by how many values there are.
  • Median: the middle value after the data is put in order.
  • Mode: the value that appears most often.
  • Range: the largest value minus the smallest value.
  • Variance: the mean of the squared deviations of each value from the mean, written σ².
  • Standard deviation: the square root of the variance, written σ.

What Is Statistics?

  • Statistics is a science that collects data and turns it into useful information for making decisions.
  • Every statistical study follows the same five stages, and the definition lists them in order.

Collect → Organize → Present → Analyze → Interpret

  • Governments use statistics every day. For example, the Ministry of Education records school dropout rates to plan better policies, and health offices track disease data to direct treatment to the right places.
  • Organizing data into a table of rows and columns is called tabulation. A tabulated table is compact and easy to compare.
  • Presenting data means showing it with graphs and diagrams, such as bar graphs, pie charts and histograms.
  • Interpreting data means drawing a conclusion about a large group (the population) from a small group (the sample).

Types of Data

  • Qualitative data describes a quality whose values are not numbers, such as color, sex, religion or home town.
  • Quantitative data uses numbers that can be measured, such as exam scores, height, weight, age or income. Statistics works mainly with quantitative data.
  • Quantitative data splits again into two kinds. Continuous data can take almost any value on a scale, such as height, temperature or time, because you can always measure it more precisely.
  • Discrete data comes from counting, so it usually takes whole-number values, such as the number of students in a school.
Data Type Why
Religion of a person Qualitative The values are not numbers.
Amount of rainfall in a year Quantitative (continuous) It is measured on a scale.
Monthly income of a person Quantitative It is a measured number.
Number of academic months in a year Quantitative (discrete) It comes from counting.
Sex of a student Qualitative The values are categories, not numbers.
  • Where data comes from. Primary data is data the researcher collects himself or herself, through surveys, interviews, experiments or direct observation.
  • Secondary data is data that someone else already collected, such as records kept by government institutions or health centers, and the researcher only extracts it.
  • An interview gathers detailed information by talking with people. Direct observation collects data by systematically watching and recording behavior.

Population, Sample and Census

  • The population is every member of the group you want to study. Examining the whole population is usually too expensive and too slow.
  • A census is a count of the whole population, such as a national population census. Because a census costs so much, most countries take one only every 10 years.
  • Instead of the whole group, a researcher studies a sample, which is a small part of the population.
  • In a simple random sample, every member of the population has an equal chance of being chosen.
A sample is a small part taken from the population. Population: the whole group Sample: the part you study

Worked example: a researcher randomly selects 1,000 students from different primary schools to find how long Ethiopian primary students study per day. What are the population and the sample?

The population is all primary school students in Ethiopia, because that is the whole group the researcher wants to know about.

The sample is the 1,000 selected students, because they are the part that is actually examined.

Frequency Distribution Tables

  • Data that is collected but not yet organized is called raw data.
  • A frequency distribution table organizes raw data by listing each value once, together with its frequency, which is the number of times it appears.
  • For example, the weights in kg of 12 people are 55, 62, 49, 67, 55, 62, 62, 49, 67, 62, 55, 49. The organized table looks like this.
Weight in kg (V) 49 55 62 67
Number of people (f) 3 3 4 2
  • The weight is called the variable, and the number of people is the frequency. A quick check: the frequencies 3 + 3 + 4 + 2 add up to 12, the number of people.
  • When the data has many different values, we group it into classes. Each interval, such as 0 ≤ x < 10, is called a class, and the table is called a grouped frequency distribution table.
  • For example, the ages of 20 shop customers grouped into classes of 10 years give frequencies 2, 3, 3, 6, 3, 1 and 2 for the classes from 0 up to 70.

Presenting Data with Graphs

  • Bar charts. A bar chart shows categorical data with separate bars. The height of each bar is the frequency of that category, and the bars have gaps between them.
Red Blue Green Yellow Pink Favorite color 2 4 6 8 10 Number of children
  • Pie charts. A pie chart is a circle divided into sectors. The size of each sector is in direct proportion to the frequency of that group.
  • The sector does not show the actual frequency, but you can calculate it from the percentage.

Worked example: 120 children chose their favorite color. White took 50%, purple took 30% and blue took 20%. How many children chose each color?

White: 120 × (50/100) = 60 children.

Purple: 120 × (30/100) = 36 children.

Blue: 120 × (20/100) = 24 children. The three answers add back to 120, which is a good check.

White: 50% (60 children) Purple: 30% (36 children) Blue: 20% (24 children)
  • Histograms. A histogram is the graph of a frequency distribution. The variable (the classes) goes on the x-axis and the frequency goes on the y-axis.
  • The bars of a histogram touch each other, because the classes are continuous intervals with no space between them.
  • To draw one, mark the classes on the x-axis, choose a y-axis scale that fits the highest frequency, and draw one bar for each class with the bar height equal to the class frequency.
  • The histogram below shows mathematics scores in classes of 5 marks, with frequencies 2, 2, 5, 10, 6, 2 and 3. The tallest bar is the class 40 ≤ s < 45.
25 30 35 40 45 50 55 60 Score 2 4 6 8 10 12 Frequency

Mean: the Arithmetic Average

  • The arithmetic mean (the average) is the most popular measure of central tendency. A measure of central tendency is a single value that describes the center of a data set.
  • To find the mean, add all the values and divide by how many values there are. The mean is written x, read as “x bar”.

x = (x1 + x2 + … + xn) / n

  • This formula tells us that the mean is the value each member would get if the total were shared equally.
  • When the data is in a frequency table, multiply each value by its frequency first, then divide by the total frequency.

x = (f1x1 + f2x2 + … + fnxn) / (f1 + f2 + … + fn)

Worked example: seven children have 5, 4, 6, 4, 3, 6 and 7 textbooks. Find the mean.

x = (5 + 4 + 6 + 4 + 3 + 6 + 7) / 7 = 35 / 7 = 5.

On average, each child has 5 textbooks.

Worked example: ten people have 6, 5, 6, 5, 3, 6, 6, 10, 10 and 3 pencils. Use a frequency table to find the mean.

The value 3 appears 2 times, 5 appears 2 times, 6 appears 4 times and 10 appears 2 times.

x = (3×2 + 5×2 + 6×4 + 10×2) / 10 = 60 / 10 = 6.

  • Properties of the mean. The sum of the deviations of all values from the mean is always zero. A deviation is the difference between a value and the mean.
  • If you add the same number p to every value, the new mean is x + p. If you subtract p from every value, the new mean is x − p.
  • If you multiply every value by a nonzero number p, the new mean is px.
  • For example, a factory makes 1,000 cars per day on average. After a new machine doubles production, the new mean is 2 × 1,000 = 2,000 cars per day.

Median: the Middle Value

  • The median is the middle value of a data set that has been arranged in order of size. Always order the data first.
  • If the number of values is odd, the median is the single middle value. If it is even, the median is the average of the two middle values.
  • The median is not affected by extreme values, and it is unique for a given data set.

Worked example: find the median of 65, 55, 89, 56, 35, 14, 56, 55, 87, 45, 92.

Ordered: 14, 35, 45, 55, 55, 56, 56, 65, 87, 89, 92. There are 11 values, so the median is the 6th value, which is 56.

With only the first 10 values, the two middle values are 55 and 56, so the median is (55 + 56) / 2 = 55.5.

Mode: the Most Frequent Value

  • The mode is the value that appears most often in the data.
  • For the data 5, 10, 20, 20, 30, 35, 35, 35, 40, 50, 70, 70, the mode is 35 because it appears three times.
  • The mode is not always unique. A data set can have two modes, and if no value repeats, there is no mode at all.
  • The mode is the only measure of central tendency that also works for qualitative data, such as the most common favorite color.
  • On a histogram, the mode belongs to the tallest bar.
Question Mean Median Mode
How is it found? Sum of values divided by n Middle value after ordering Most frequent value
Affected by extreme values? Yes No No
Always unique? Yes Yes No: two modes or none are possible
Works for qualitative data? No No Yes

Range: the Simplest Measure of Spread

  • Measures of dispersion tell us how scattered or spread out the data is around the center. Two data sets can have the same mean but very different spreads.
  • The range R is the simplest measure of dispersion. It is the largest value minus the smallest value: Range = Largest value − Smallest value.
  • For the test scores 6, 7, 5, 8, 3, 8, 9, 5, 4, 5.5, the range is 9 − 3 = 6.
  • Because the range uses only two values, one extreme value can change it completely. That is why it is called a crude measure.

Variance and Standard Deviation

  • The variance σ² is the mean of the squared deviations of each value from the mean. It measures how far the values sit from the mean on average.
  • The standard deviation σ is the square root of the variance: σ = √(variance).
  • Follow four steps. First, calculate the mean x. Second, find each deviation x − x. Third, square each deviation. Fourth, average the squared deviations to get the variance, then take the square root for the standard deviation.

σ² = [(x1x)² + (x2x)² + … + (xnx)²] / n

Worked example: a shop sold 13, 8, 4, 9, 7, 12 and 10 televisions on the seven days of a week. Find the variance and the standard deviation.

Mean: x = (13 + 8 + 4 + 9 + 7 + 12 + 10) / 7 = 63 / 7 = 9.

Deviations from 9: 4, −1, −5, 0, −2, 3, 1. Squared: 16, 1, 25, 0, 4, 9, 1.

Variance: σ² = (16 + 1 + 25 + 0 + 4 + 9 + 1) / 7 = 56 / 7 = 8.

Standard deviation: σ = √8 ≈ 2.83.

  • Properties. If you add (or subtract) the same constant k to every value, the variance and the standard deviation do not change. The whole data set slides along, but its spread stays the same.
  • If you multiply every value by a constant c, the new variance is c² times the old variance, and the new standard deviation is |c| times the old standard deviation.
Measure Add k to every value Multiply every value by c
Mean Becomes x + k Becomes cx
Variance Does not change Becomes c²σ²
Standard deviation Does not change Becomes |c|σ

Common Mistakes to Avoid

  • Finding the median without ordering the data first. The middle of the unordered list is usually the wrong answer.
  • Thinking the mode must be one single value. A data set can have two modes, and a data set where nothing repeats has no mode.
  • Confusing a bar chart with a histogram. Bar-chart bars have gaps because the categories are separate, but histogram bars touch because the classes are continuous.
  • Changing the variance after adding the same number to every value. Adding k moves the mean but leaves the variance and standard deviation exactly as they were.
  • Stopping at the variance when the question asks for the standard deviation. Always take the square root at the end.

Easy Ways to Remember

  • Mode = MOst. Both words start with the same letters, and the mode is the most frequent value.
  • Median sounds like medium, the middle size. The median is the middle value after ordering.
  • The mean is the fair share: the total divided equally among all members.
  • Adding the same number moves the center but not the spread. Multiplying changes both: the mean by c, the variance by c², the standard deviation by |c|.
  • Histogram bars hold hands (they touch); bar-chart bars keep their distance (they have gaps).

Quiz

Tap an answer to check it.

1. Seven children have 5, 4, 6, 4, 3, 6 and 7 textbooks. What is the mean number of textbooks?

2. What is the median of the data 5, 7, 3, 10, 1, 5, 9?

3. What is the mode of the data 13, 10, 16, 15, 12, 14, 13, 16?

4. The weights in kg of a family of five people are 35, 40, 85, 65 and 70. What is the range?

5. The variance of a data set is 8. If you add 5 to every value, what is the new variance?

Remember: Statistics collects, organizes, presents, analyzes and interprets data. The mean, median and mode locate the center; the range, variance and standard deviation measure the spread. Adding a constant never changes the spread, but multiplying by c scales the variance by c² and the standard deviation by |c|.