Standard Deviation is one of those statistical terms thrown around the corporate world with vague abandon. Many business analysts don’t truly understand the concept of Standard Deviation. If you’re one of those folks, you can stop living the lie. In today’s Pulitzer worthy post, you’ll learn how this underestimated statistical measure can help you better understand the data you’re working with.

.

**What Standard Deviation Measures**

Imagine you supervise two deli managers who sell bacon. Since you don’t want any bacon to be wasted, it’s important that these two managers hold a steady inventory. In an effort to measure how well they manage inventory, you decide to analyze the boxes of bacon each manager ordered in the last six weeks. Taking the average of the last 6 weeks shows that each manager orders an average of around 32 boxes of bacon per week. On the surface, the averages make it look like they are performing equally.

But if you look closer, you’ll see that one of the managers has weekly orders of 22, 34, 58, 52, 10 and 21 boxes. For this manager, the average may be mathematically correct, but it hides the volatility of his weekly orders. In other words, sometimes the average of a dataset doesn’t do a good job representing the data. This is where Standard Deviation comes in.

.

Standard Deviation gives you a sense of how dispersed (spread out) the data in your sample is from the Mean (Average). Said another way – it lets you know if you can rely on the Mean to give you a meaningful representation of the data.

.

In our example, we use the **STDEV** function in Excel to give us Standard Deviation along with our Mean.

In the case of the first manager, the Standard Deviation is 2. This tells us that each data point in the sample sits an average distance of 2 statistical data points from the Mean (Average). Is that good? Well, think of it this way – a Standard Deviation of 0 would say every data point is exactly equal to Mean of the sample (32.3 in this case). So a Standard Deviation of 2 is not far off from that, indicating that a majority of data points are positioned extremely close to the Mean. The closer the Standard Deviation is to 0, the more reliable the Mean is. More than that though, Standard Deviation close to 0 tells us that there is very little volatility in the sample. With a Stanadard Deviation of 2, the first manager’s weekly orders are remarkably consistent.

.

In the case of the second manager, the Standard Deviation is 18.9. The average distance each data point is from the Mean is 18.9 statistical data points. That’s a huge spread! The further away a Standard Deviation is from 0, the less accurate the Mean for that sample. In this case, a Standard Deviation of 18.9 alerts us that the Average shown for this manager (32.8 boxes per week) is just not reliable. It also indicates that this manager’s weekly orders are extremely volatile. Of course, with only six data points, you can confirm the volatility with your eyeball.

.

That is basic Standard Deviation in a nutshell. Although it doesn’t get the attention afforded to other statistical measures (Mean, Median, Mode, etc.), Standard Deviation is actually critical to many statistical calculations. An understanding of how Standard Deviation works will pave the way for you to do things like: determine the volatility of a stock, normalize comparisons between datasets, identify outliers, create standardized z-scores, and much more.

.

**How Standard Deviation is Calculated**

OK, we know what Standard Deviation shows. Let’s now take a look at how it’s actually calculated.

We’ll start with this set of numbers (in black). As you can see, I’ve already used the STDEV function to calculate the Standard Deviation of 21.6 (in orange).

.

Here are the steps Excel took to calculate 21.6 as the Standard Deviation.

Note these steps are purposefully visual to better relay what’s going on. In reality, all this stuff happens instantly behind the scenes.

.

First, Excel calculates the Mean (the Average) for the Sample. In this case, the Mean is 40. It then calculates the difference between each data point and 40. For instance, the difference between 50 and 40 is +10 statistical data points. The difference between 10 and 40 is -30.

.

.

The next thing Excel does is Squares those differences so that all the differences will be a positive number (+10 would become 100; -30 would become +900).

.

.

Excel then adds up all the Squared Differences to get the **Total Squared Differences**.

Next, Excel uses the Total Squared Difference to calculate the **Sample Variance**. This is done by dividing the Total Squared Differences by the count of data points in the sample minus 1. In this example the count of data points is 7, so we divide the Total Squared Differences by 7-1.

.

.

Finally, Excel calculates the square root of the Sample Variance. This square root becomes our Standard Deviation.

.

.

**Population Standard Deviation vs Sample Standard Deviation**

In statistics, you’ll often hear the terms Population and Sample. These terms refer to the completeness of the data in your possession. The differences between the two are sometimes not all that clear.

.

If you’re using a complete dataset, you’re using a **population**. An example of a population would be if you were analyzing the time in service for all the sales reps in your company. You would have the data for all the sales reps that exist in your company; a complete population.

.

If you are using a partial set of data, or a subset of data, you’re dealing with what is called a **sample**. An example would be if you were analyzing sales data for one quarter of a year. A quarter is merely a subset of an entire year’s data, so in this case, you’re working with a sample.

.

The reason this population and sample designation matters is because the calculation for Standard Deviation changes slightly depending on the nature of the data you’re dealing with. Specifically, the way you calculate the Sample Variance changes. Remember in Step 4 of the calculation steps above? We said that Excel divides the Total Squared Differences by the count of data points in the sample minus 1. Well, this only applies if your data is a sample (a subset of a bigger data pool).

.

If your data is made up of the entire population, the calculation in Step 4 changes to divide the Total Squared Differences by the complete count of data points. In other words, there is no need to subtract 1.

.

This difference in calculation will obviously yield different Standard Deviations.

.

So why the difference in calculation? Well, the difference is not an Excel quirk. It’s an actual statistics tenent called Bessel’s Correction. Bessel’s Correction states that when you use a sample dataset instead of a population, you need to subtract 1 from the count of data points used (written in the statistics world as N*-1*). This correction accounts for the bias introduced by estimating a Mean using a subset of data instead of using the true population Mean. The reasoning behind the need for this correction is admittedly a bit difficult for us non-math geeks to wrap our brains around. I’ll try to explain it in my simplified understanding.

.

When you use a sample, you’re using a subset of data chosen from the true population. In this case, you won’t have the benefit of using the true population Mean. You will need to estimate a new Mean based on data you grabbed in the sample. This estimated Mean is already biased towards fitting the data chosen in the sample, so you need to exclude that one point (the estimated Mean) from calculating the Sample Variance. This is called losing a degree of freedom. Another way to look at it is this. If I said you could have five variables in an Excel formula, but one of them would have to be used to calculate the Mean, you would have only four variables available to you (5-1). The fact that you have to calculate Mean loses you a degree of freedom.

.

If you use a population set instead of a sample set, you have the benefit of the true population Mean. So you’re not losing a degree of freedom by being forced to introduce an estimated Mean. Thus there is no need to subtract 1.

.

.

**Standard Deviation Formulas in Excel**

Excel has the ability to handle Standard Deviation calculations for both population and sample datasets. Simply click in any cell and start to enter a =STDEV. You’ll see a tool-tip dropdown that gives you, what seems to be, a ridulous number of Standard Deviation functions.

Here’s a quick rundown of what each function does.

**STDEV:** Calculates Standard Deviation for a sample using Bessel’s Correction (*N-1*).
**STDEVP:** Calculates Standard Deviation for a population.
**STDEV.S**: Calculates Standard Deviation for a sample using Bessel’s Correction (*N-1*). This function technically replaces the STDEV function.
**STDEV.P**: Calculates Standard Deviation for a population. This function technically replaces the STDEVP function.
**STDEVA**: Calculates Standard Deviation for a sample using Bessel’s Correction (*N-1*). Allows for text and TRUE/FALSE values.
**STDEVPA**: Calculates Standard Deviation for a population. Allows for text and TRUE/FALSE values.

.

I can’t imagine any scenario where I would use the STDEVPA and STDEVA functions. I think you can safely ignore those.

.

As far as I can tell, there is no discernible difference between STDEV.S and STDEV. Microsoft says you should move toward the newer STDEV.S function, but you can technically use those two functions interchangably.

.

Likewise, I don’t see any discernible difference between STDEV.P and STDEVP. Although STDEV.P is the newer function, you can use those two functions interchangably.

.

.

Until next week…Happy Mathing!