[If some values are taken multiple times, calculations may be simplified be noting that (2+1+3+5+2+3+3)/7 = (1+2×2+3×3+5)/(1+2+3+1).]

Information in histograms

When data is displayed in a histogram, the exact values of the data are lost. It is reasonable to ask how much we know about the original data. For the histogram which we drew from the weights of students,

```
10_|                           _______
|                          |       |
|                   _______|       |
|           _______|       |       |
|          |       |       |       |
5_|          |       |       |       |
|          |       |       |       |
|   _______|       |       |       |
|  |       |       |       |       |
|  |       |       |       |       |_______ _______
__|__|_______|_______|_______|_______|_______|_______|____
|       |       |       |       |       |
100     125     150     175     200     225

weights of students in pounds

Weights of Students in Statistics Course
```
we shall ask four questions:

What is the least (greatest) possible mean of data which could have produced this histogram?

The above histogram specifies that three data are in the range (87.5,112.5), seven are in the range (112.5, 137.5), eight are in the range (137.5, 162.5), ten are in the range (162.5, 187.5), one is in the range (187.5, 212.5), and one is in the range (212.5, 237.5). If the data were as small as possible consistent with these constraints (and assuing integer values), three data would be equal to 88, seven data would be equal to 113, eight data would be equal to 138, ten data would be equal to 163, one datum would be equal to 188, and one datum would be equal to 213. These values yield the mean 139.67, which is the least possible mean consistent withthe above histogram. (The greatest possible mean can be calculated in a similar manner.)

What is the least (greatest) possible median of data which could have produced this histogram?

There are 30 data represented in the above histogram, the median is halfway between the values of the 15th and 16th in rank order. Three of the data are in the first class, seven of the data are in the second class, and eight of the data are inthe third class; in particular the 15th and 16th data must both be in the third class. Therefore the 15th and 16th data are at least 138, and the median must be at least 138. (Similarly, the median is at most 162.)

What is the "best" estimate for the mean of the data which produced this histogram?

For the "best" estimate, we shall assume that the data is uniformly spread within each class. (There are other ways to define "best", which will produce different results.) This is equivalent for purposes of calculating the mean to putting all the individuals in each class at the class mark. Hence the "best" estimate for the mean is obtained by assuming three individuals have weight 100, seven individuals have weight 125, eight individuals have weight 150, ten individuals have weight 175, one individual has weight 200, and one individual has weight 225. This provides a mean of 151.67.

What is the "best" estimate for the median of the data which produced this histogram?

A histogram is constructed so that area is proportional to the number of individuals, hence seeking the value with half the individuals above it and half the individual below it is seeking the value where the area under the histogram to the right of it is equal to the area under the histogram to the left of it. Each of the rectangles in the histogram has a base of 25, and the heights are 3, 7, 8, 10, 1, and 1. Thus the total area of the histogram is 3x25+7x25+8x25+10x25+1x25+1x25=750. We want the value with area 750/2=375 to the right of it and 375 to the left of it. The area of the first rectangle is 75, which is less than 375; the area of the first two rectangles is 75+175=250, which is still less than 375; the area of the first three rectangles is 75+175+200=450 which is greater than 375. Therefore we need an area of 375-(75+175)=125 from the thid rectangle to get the middle. Since the height of the third rectangle is 8, we must use 125/8=15.62 of its base to get the requisite area. The third rectangle begins at 137.5; upon adding 15.62 to that we get 153.12, which is the "best" estimate for the median.

Competencies: Give upper and lower bounds, and the best estimates, for the mean and median of the data represented in the following histogram:

```
10_|
|
|
|           _______
|          |       |_______
5_|          |       |       |
|          |       |       |_______
|          |       |       |       |
|   _______|       |       |       |
|  |       |       |       |       |
__|__|_______|_______|_______|_______|
|       |       |       |
125     150     175     200

weights of students in pounds

Weights of Students in Mathematics Course
```

Reflection: What can you say about the maximum, minimum, Q1, and Q3 weights?

Challenge: What can you say about the variance and standard eviaton of the weights?