Mean, median, and mode come up a lot when working with statistics. I was curious as to how we can calculate these values using SQL Server and T-SQL.
Khan Academy gives these definitions for mean, median, and mode.
Mean: The average value of a group of numbers.
Median: When ordering a group of numbers, the “middle” value, where half of the the numbers are greater than that value, and half are less.
Mode: The most frequently occurring number in a group of numbers.
Calculating With T-SQL:
We’ll set up a temp table and add some numbers to use in our calculations:
CREATE TABLE #temp(number decimal(4,2) NOT NULL); INSERT INTO #temp(number) VALUES (1),(1),(2),(3),(4),(4),(5),(6);
Mean:
Mean is straightforward, we use the AVG function to calculate the average. For our example, the mean is 3.25.
SELECT AVG(number) as mean FROM #temp;
Median:
With an odd number of values to evaluate, we would pick the “middle” value as the Median. Since we have an even number of values, the Median will be the mean of the two numbers in the middle. In this example, with 8 numbers, the values in position 4 and 5 are the values 3 and 4. So the median would be the mean of those two, so 3.5.
In T-SQL, we’ll use the PERCENTILE_CONT function. Using 0.5 as the input parameter will return the value at the 50% point of the sorted values.
SELECT DISTINCT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY number) OVER () as median_cont FROM #temp;
There is a PERCENTILE_DISC function as well, but it will only return a column value. (The DISC in the function name is for Discrete, CONT for Continuous).
Mode:
For Mode, we need to return the most frequently occurring value. In this table, 1 and 4 both occur twice. Since there is a tie, we want to return both values as the Mode.
SELECT number FROM (
SELECT number, RANK()
OVER (ORDER BY COUNT(*) DESC) as rank_no
FROM #temp
GROUP BY number
) as t
WHERE rank_no = 1
ORDER BY number;
We’ll order the values in descending order of how many times they occur. We’ll take the value(s) in the #1 position. Using the RANK function, it will return the same rank for any values that are tied for the same position.
Links: