One of the first warnings an aspiring analyst hears is, “never take an average of an average”. The reason for this warning may not be intuitive, but it’s important to understand why it’s generally a bad thing to average a bunch of averages. In today’s post, I’ll attempt to explain the problems with using simple averages and how Weighted Averages can help you avoid inaccurate conclusions about your data.

.

**Where Straight Averages can Go Wrong
**

Imagine you manage two sales reps (Jim and Tim). These two sales reps have been working for you for three months. Over the last three months, you’ve collected data on their win rates vs. sales calls. After three months of selling

, you’d like to see which sales rep has been more effective based on win rate.

.

At first glance, you see that Jim’s win rate has been 65%, 85%, and 86% over the last three months. Tim’s win rate has been 65%, 84% and 83%. You’d like to get an average win rate for each sales rep; so you simply average the win rates. From the looks of it, Jim seems to be the better performer. Not only does he have better win rates each month, but his straight average win rate is 78% (slightly better than Tim).

.

But wait. Look at what happens when you look that their numbers from the perspective of total wins for the entire three-month period. Taking the total wins vs. total sales calls, you see that Tim, **with a 76% win rate**, is actually more effective. What the heck!? How can Jim have a better win rate each month, but still trail behind Tim when calculating the total win rate for the month?

.

This strange observation has a name in statistics. It’s called Simpson’s Paradox.

No – not that Simpson.

.

Edward H Simpson was a British statistician who published a paper in 1951 detailing a phenomenon in which a trend present in different groups is reversed when the groups are combined. In other words, one conclusion may be reached when data is analyzed in smaller groups, but an opposite conclusion may be reached when data is analyzed in aggregate.

.

The reason this paradox rears its head is because, often times, there is a “lurking” variable in sample data that isn’t considered in the analysis. In the case of our two sales reps, they have an unequal pool of sales calls (Jim has 89 total sales calls, while Tim has 128). So while each month’s win rate is correct, you can’t simply average the win rates. Doing so, would ignore an important variable (the total number of sales calls during each month).

.

**Weighting Values
**

Often times, the correct analysis is not a straight average, but an average that weighs each value based on another variable in the sample data. When you “weigh” a particular value, you give that value significance in relation to the other values in your sample. That is to say, some values count more than other values. This is similar to the way Final Exams in school would count for a larger part of your grade than Quizzes.

.

A weighted average calculation is essentially:

.

For instance, in the case of Jim (our sales rep), we need to weight his win rates so that they have more significance in months that involved more sales calls. This way, we get a more accurate representation of his overall average win rate.

.

**Using the SUMPRODUCT Function to Calculate Weighted Averages in Excel
**

Using weighted averages in Excel is as simple as entering in the weighted average calculation.

In this example, we are calculating the average price per unit across all transactions.

You can see that the simple average gives us a price per unit of $200. But this number is misleading; given that over 70% of all units were sold at the $50 rate. The weighted average price takes into account the number of units sold at each price, giving us a more accurate representation of our overall average price.

.

An easier way to enter this formula is to use the SUMPRODUCT function. “Product” is another name for the value returned from a multiplication operation. So the SUMPRODUCT returns the sum of multiplication products.

Instead of typing out:

**= (C3*B3) + (C4*B4) + (C5*B5)
**

You can enter:

**= SUMPRODUCT (C3:C5, B3:B5)
**

.

The SUMPRODUCT function will multiply the values in C3:C5 with the values in B3:B5, then return the sum of the products.

.

Here is the same weighted average using the SUMPRODUCT function.

.

Well, that’s it.

I’m off to the buffet.

I can’t wait to increase my weighted average weight while listening to Tom Waits.

Nice post, Mike. All that great good-looking work deserves at least one comment, so here it is:

You should consider putting this series in a book.

Here, I’ve got a title for you:

Excel for Constipated Mathematicians: please don’t work this out with a pencilðŸ˜‰

Thanks Jeff for putting a dollar in my guitar case as I play my happy little tune in the subway.

I enjoyed this. Thank you for posting. And I get to apply this lesson to my daily tasks.

Nice clear explanation of the importance of using weighted averages.

Could you comment on using percents of percents? For example, if profitability went from 5 out of a 100 or 5% to 6 out of a 100 or 6% is it meaningful or misleading to say profitability increased 20% – from 5% to 6%?

Mathematically 6%/5%-1 does indeed amount to a 20% variance. I don’t think it’s incorrect to say that profits increased 20%.

However, it could be misleading. I personally would say that profits increased by 1 percentage point, resulting in X dollars of additional profit.

Superb explanation and examples. I am an alien when it comes to statistics, but these topics makes me very confident!!!