Understanding Standard Deviation

Standard Deviation is one of those statistical terms thrown around the corporate world with vague abandon. Many business analysts don’t truly understand the concept of Standard Deviation. If you’re one of those folks, you can stop living the lie. In today’s Pulitzer worthy post, you’ll learn how this underestimated statistical measure can help you better understand the data you’re working with.

.

What Standard Deviation Measures

Imagine you supervise two deli managers who sell bacon. Since you don’t want any bacon to be wasted, it’s important that these two managers hold a steady inventory. In an effort to measure how well they manage inventory, you decide to analyze the boxes of bacon each manager ordered in the last six weeks. Taking the average of the last 6 weeks shows that each manager orders an average of around 32 boxes of bacon per week. On the surface, the averages make it look like they are performing equally.


But if you look closer, you’ll see that one of the managers has weekly orders of 22, 34, 58, 52, 10 and 21 boxes. For this manager, the average may be mathematically correct, but it hides the volatility of his weekly orders. In other words, sometimes the average of a dataset doesn’t do a good job representing the data. This is where Standard Deviation comes in.

.

Standard Deviation gives you a sense of how dispersed (spread out) the data in your sample is from the Mean (Average). Said another way – it lets you know if you can rely on the Mean to give you a meaningful representation of the data.

.

In our example, we use the STDEV function in Excel to give us Standard Deviation along with our Mean.

In the case of the first manager, the Standard Deviation is 2. This tells us that each data point in the sample sits an average distance of 2 statistical data points from the Mean (Average). Is that good? Well, think of it this way – a Standard Deviation of 0 would say every data point is exactly equal to Mean of the sample (32.3 in this case). So a Standard Deviation of 2 is not far off from that, indicating that a majority of data points are positioned extremely close to the Mean. The closer the Standard Deviation is to 0, the more reliable the Mean is. More than that though, Standard Deviation close to 0 tells us that there is very little volatility in the sample. With a Stanadard Deviation of 2, the first manager’s weekly orders are remarkably consistent.

.

In the case of the second manager, the Standard Deviation is 18.9. The average distance each data point is from the Mean is 18.9 statistical data points. That’s a huge spread! The further away a Standard Deviation is from 0, the less accurate the Mean for that sample. In this case, a Standard Deviation of 18.9 alerts us that the Average shown for this manager (32.8 boxes per week) is just not reliable. It also indicates that this manager’s weekly orders are extremely volatile. Of course, with only six data points, you can confirm the volatility with your eyeball.

.

That is basic Standard Deviation in a nutshell. Although it doesn’t get the attention afforded to other statistical measures (Mean, Median, Mode, etc.), Standard Deviation is actually critical to many statistical calculations. An understanding of how Standard Deviation works will pave the way for you to do things like: determine the volatility of a stock, normalize comparisons between datasets, identify outliers, create standardized z-scores, and much more.

.

How Standard Deviation is Calculated

OK, we know what Standard Deviation shows. Let’s now take a look at how it’s actually calculated.

We’ll start with this set of numbers (in black). As you can see, I’ve already used the STDEV function to calculate the Standard Deviation of 21.6 (in orange).

.

Here are the steps Excel took to calculate 21.6 as the Standard Deviation.

Note these steps are purposefully visual to better relay what’s going on. In reality, all this stuff happens instantly behind the scenes.

.

First, Excel calculates the Mean (the Average) for the Sample. In this case, the Mean is 40. It then calculates the difference between each data point and 40. For instance, the difference between 50 and 40 is +10 statistical data points. The difference between 10 and 40 is -30.

.

.

The next thing Excel does is Squares those differences so that all the differences will be a positive number (+10 would become 100; -30 would become +900).

.

.

Excel then adds up all the Squared Differences to get the Total Squared Differences.

 

 

Next, Excel uses the Total Squared Difference to calculate the Sample Variance. This is done by dividing the Total Squared Differences by the count of data points in the sample minus 1. In this example the count of data points is 7, so we divide the Total Squared Differences by 7-1.

.

.

Finally, Excel calculates the square root of the Sample Variance. This square root becomes our Standard Deviation.

.

.

Population Standard Deviation vs Sample Standard Deviation

In statistics, you’ll often hear the terms Population and Sample. These terms refer to the completeness of the data in your possession. The differences between the two are sometimes not all that clear.

.

If you’re using a complete dataset, you’re using a population. An example of a population would be if you were analyzing the time in service for all the sales reps in your company. You would have the data for all the sales reps that exist in your company; a complete population. 

.

If you are using a partial set of data, or a subset of data, you’re dealing with what is called a sample. An example would be if you were analyzing sales data for one quarter of a year. A quarter is merely a subset of an entire year’s data, so in this case, you’re working with a sample.

.

The reason this population and sample designation matters is because the calculation for Standard Deviation changes slightly depending on the nature of the data you’re dealing with. Specifically, the way you calculate the Sample Variance changes. Remember in Step 4 of the calculation steps above? We said that Excel divides the Total Squared Differences by the count of data points in the sample minus 1. Well, this only applies if your data is a sample (a subset of a bigger data pool).

.

If your data is made up of the entire population, the calculation in Step 4 changes to divide the Total Squared Differences by the complete count of data points. In other words, there is no need to subtract 1.

.

This difference in calculation will obviously yield different Standard Deviations.

.

So why the difference in calculation? Well, the difference is not an Excel quirk. It’s an actual statistics tenent called Bessel’s Correction. Bessel’s Correction states that when you use a sample dataset instead of a population, you need to subtract 1 from the count of data points used (written in the statistics world as N-1). This correction accounts for the bias introduced by estimating a Mean using a subset of data instead of using the true population Mean. The reasoning behind the need for this correction is admittedly a bit difficult for us non-math geeks to wrap our brains around. I’ll try to explain it in my simplified understanding.

.

When you use a sample, you’re using a subset of data chosen from the true population. In this case, you won’t have the benefit of using the true population Mean. You will need to estimate a new Mean based on data you grabbed in the sample. This estimated Mean is already biased towards fitting the data chosen in the sample, so you need to exclude that one point (the estimated Mean) from calculating the Sample Variance. This is called losing a degree of freedom. Another way to look at it is this. If I said you could have five variables in an Excel formula, but one of them would have to be used to calculate the Mean, you would have only four variables available to you (5-1). The fact that you have to calculate Mean loses you a degree of freedom.

.

If you use a population set instead of a sample set, you have the benefit of the true population Mean. So you’re not losing a degree of freedom by being forced to introduce an estimated Mean. Thus there is no need to subtract 1.

.

.

Standard Deviation Formulas in Excel

Excel has the ability to handle Standard Deviation calculations for both population and sample datasets. Simply click in any cell and start to enter a =STDEV. You’ll see a tool-tip dropdown that gives you, what seems to be, a ridulous number of Standard Deviation functions.

Here’s a quick rundown of what each function does.

  • STDEV: Calculates Standard Deviation for a sample using Bessel’s Correction (N-1).
  • STDEVP: Calculates Standard Deviation for a population.
  • STDEV.S: Calculates Standard Deviation for a sample using Bessel’s Correction (N-1). This function technically replaces the STDEV function.
  • STDEV.P: Calculates Standard Deviation for a population. This function technically replaces the STDEVP function.
  • STDEVA: Calculates Standard Deviation for a sample using Bessel’s Correction (N-1). Allows for text and TRUE/FALSE values.
  • STDEVPA: Calculates Standard Deviation for a population. Allows for text and TRUE/FALSE values.

.
I can’t imagine any scenario where I would use the STDEVPA and STDEVA functions. I think you can safely ignore those.
.
As far as I can tell, there is no discernible difference between STDEV.S and STDEV. Microsoft says you should move toward the newer STDEV.S function, but you can technically use those two functions interchangably.
.

Likewise, I don’t see any discernible difference between STDEV.P and STDEVP. Although STDEV.P is the newer function, you can use those two functions interchangably.

.

.

Until next week…Happy Mathing!

36 thoughts on “Understanding Standard Deviation

  1. Roger Govier

    Very nice article Mike.
    I wish this had been around when I did Statistics as part of my Economics at University. I did manage to grasp to grasp the techniques back then – but this would have made life so much easier.
    Of course, life would have been much easier if I had Excel available to me back in 1963!!!

  2. Doris Choo

    Sorry using 2 thumbs typing with my IPAD. Click wrong button.
    I meant – put my old stat teacher to shame. All the other Stat books used in college are full of mumbo jumbo, make the subject more complicating than it seems, most all hard to absorb.
    Thanks Mike you are the Best!

  3. Peter Bedson

    Generally, if N> about 20 you can ignore Bessel’s Correction completely because the sample mean and the population mean are effectively the same.

  4. john k

    Very nice explanation – simple yet effective.

    One thing I would add to the standard deviation section is Relative St Dev, where
    RSD(%) = (stdev/mean)*100
    This makes the stdev more meaningful with respect to your mean. Sure, stdev = 2 is close to zero, but if your mean is around 5 (RSD = 40%), your data is much more disperse than if your mean is around 40 (RSD = 5%).

  5. Brad J

    Excellent summary, wish I had you at my school.

    One thing to considering your example is you have to understand the context of the data. The Deli managers are different in how they order with the similar average is what the data tells us. If you know more context about the operation the data may tell you more. Maybe the sales target is 32 boxes and one manager just orders close to that without knowing if it will impact inventory and the other manager is compensating for the first by balancing sales and inventory and doing a better gob of forecasting sales. Alternatively maybe the second manager orders wildly for other reason and isn’t the better performer. Without knowing the context of the data all we can say is they order significantly different but not sure of the reasons or which is better.
    Excellent Summary once again.

  6. Nick Leifeld

    Very nice explanation, I wish Excel had a STDEV function for calculating Cpm in which the average variation around he Target is calculated and not the average variation around the Mean

  7. Gabby

    Thanks for the explanation.
    It helps me a lot to differentiate STDEV from MEAN and how to relate it to my job.

  8. Sivakumar Viswanathan

    This is the simplest & clearest way to explain SD. Wish I had this 2 decades ago. But I come across SD on a daily basis as a quality evaluation tool for our incoming raw materials. I hope many of the technical experts in manufacturing refer to SD without actually grasping the underlying inference

    Now I understand exactly what I am talking about. Also wonder how I topped Statistical Quality Control paper during my engineering days 22 years back. Thanks for this article

  9. Anna

    Million thanks for this great explanation, you made my day! I am just writing my Master’s thesis and was in trouble understanding basic statistics, now I am a step further down on the road! I will share this article with my classmates! Wish you good work and looking forward reading more great articles!

  10. Lilian Salan

    Now I understand the STANDARD DEVIATION. The discussion, explanation, and examples are very clear and simple. Its give real situation for the example.

    Thank you so much!

  11. Nandy

    Bravo, this is the brief and clear explanation i have seen. straight to the point. i wish you could be my stat teacher. i give you five stars.

  12. Samuel David

    Very simple and easy understand example to comprehend Standard Deviation.
    Thanks a lot.
    Regards,
    Samuel David

  13. Samuel David

    Very simple and easy to understand example to comprehend Standard Deviation.
    Thanks a lot.
    Regards,
    Samuel David

  14. Ignacio

    say for example i’ve determined the standard deviation of my samples, and found out the values that are in the extremes (greater than or less than the: mean+ or – stdev), what am i going to do next? delete the extreme values? because by doing so it will affect my existing standard deviation..

  15. Ouattara Sitafa

    Do you have any post like this one on Analyse of Variance (ANOVA) and t-test?

  16. Nik Aidi

    Hi very interesting indeed.

    However, out of curiosity, the 2 deli managers sample you gave. What will be the recommendation or advise to the managers?

    To advise the second manager to order consistently, at an order of 30++ boxes at a time?

    The 1st manager manage his ordering better?

Leave a Reply

Your email address will not be published. Required fields are marked *