Median

The median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it can be thought of as the middle value. Mathematically, it is the 50% quantile (\(Q(0.5)\))

While the arithmetic mean is often used to measure central tendencies, it is easily influenced by a small proportion of extremely large or small values. For skewed distributions like income, where few people’s income is significantly higher than most people’s, the mean may not represent the typical middle and robust statistics such as median may provide a better description of central tendency. 

Income distribution is positively skewed. From the figure above, we can see that the mean is higher than the median due to outliers having much higher income than most people and may not truly reflect the middle income of most people. 

Code

In code, we can use the Quantile class we learned about earlier to find the median since the median is the 50% quantile. In NM Dev, it can be done as follows.

				
					// create an array of doubles for our dataset
val values = doubleArrayOf(0.0, 1.0, 2.0, 3.0, 3.0, 3.0, 6.0, 7.0, 8.0, 9.0)

// APPROXIMATELY_MEDIAN_UNBIASED
var median = Quantile(values, Quantile.QuantileType.APPROXIMATELY_MEDIAN_UNBIASED)
println("APPROXIMATELY_MEDIAN_UNBIASED Median: " + median.value(0.5))

// NEAREST_EVEN_ORDER_STATISTICS
median = Quantile(values, Quantile.QuantileType.NEAREST_EVEN_ORDER_STATISTICS)
println("NEAREST_EVEN_ORDER_STATISTICS Median: " + median.value(0.5))

				
			
				
					APPROXIMATELY_MEDIAN_UNBIASED Median: 3.0
NEAREST_EVEN_ORDER_STATISTICS Median: 3.0

				
			

Thinking Time

When do we use median over mean to measure central tendency? Would it be advisable to use mean to measure the central tendency of a very skewed distribution?

When do we use median then?

Here’s a few examples:

  • Finding the average age of a class
  • Finding the poverty line