Why Excel has Multiple Quartile Functions and How to Replicate the Quartiles from R and Other Statistical Packages

I’m finishing up a book and I’ve been busy, but I’m starting to feel guilty about not posting anything new over the last two week. I guess that means my blog ears are burning.

.

Today I want to talk about reasons why Excel has multiple Quartile functions (QUARTILE, QUARTILE.INC and QUARTILE.EXC).  I also want to share with you a User Defined Function that will replicate the Quartile calculations found in R, SAS, Minitab and other statistical packages.

.

Before I get started, I want to point out Jon Peltier’s very excellent posts on the topic:

Hinge Techniques for Determining Quartiles, Interpolation Methods of Determining Quartiles. Unlike Jon, I’m a rock with lips when it comes to statistics. So I lifted heavily from his articles to make myself look smarter that I am. Don’t judge me.

.

Quartile Basics

A quartile is essentially a division of data into four defined intervals based upon the values of the data and how they compare to the entire set. For example, if we had a population of 10 numbers (see screenshot below); we could sort them smallest to largest, and then manually draw lines to separate the data set into four even divisions.

These sections would each contain 25% of the entire population. Values falling below 25% of all the data analyzed is said to be in the 1st quartile. Values falling between 25.1% and 50% are in the 2nd quartile. Values falling between 51% and 75% belong to the in the 3rd quartile. And the remaining values are allocated in the 4th quartile.

.

When you ask Excel to give you a quartile, it returns only one number. That number effectively represents the demarcation between quartiles.

.

In this example, Q2 (quartile 2) is 12. Although we don’t see a value in the dataset called 12, the demarcation line for the middle of the population of number is located directly between 11 and 13 (what would be the number 12 if we could see it – think of it as trying to take even measurements across a strange ruler where some of the numbers are missing). This means that 50% of the values fall before 12 and 50% fall after 12. Q1(quartile 1) is a bit more difficult to see, it would be about 5.5 because the demarcation line for 25% of the population is sitting about where 5.5 would be. This means that 25% of the values are less than or equal to 5.5.

.

This concept of estimating the quartile demarcation value is called interpolation.

..

Why Does Excel have Multiple Quartile Functions?

Obviously, we would never be able to accurately find quartiles by manually drawing silly lines through our data sets. There are interpolation methods that help us find the quartiles mathematically. Let’s take a look at how this works.

.

Imagine that we take our values and place them side by side. Then underneath, we add a kind of index that gives each value a position. This index can be considered to be an interpolation basis. The screenshot below shows a type of interpolation called N-1 basis. N stands for the count of values and -1 indicates where we start indexing. In this scenario we have 10 values, but we start the interpolation at 0. So we go from 0 to 9.

To find out where we draw the line for the First quartile, we can calculate .25*(N-1) or .25*(10-1).

To find out where we draw the line for the Second quartile, we can calculate .50*(N-1) or .50*(10-1).

To find out where we draw the line for the Third quartile, we can calculate .75*(N-1) or .75*(10-1).

We get the answers Q1 = 2.25, Q2=4.5, Q3=6.75.

This means that the Q1 line is drawn at position 2.25 on our basis index, Q2 is drawn at position 4.5, and Q3 is drawn at position 6.75. Once we have our positions, we can estimate which values from the value line most directly sits at those positions.

Of course all of this arithmetic is done programmatically via Functions. In Excel, the QUARTILE and QUARTILE.INC function would give you the answers you see in the screenshot above. Both of these are calculated on an N-1 basis, so they will give you the same answer. Note that QUARTILE.INC technically replaces the QUARTILE function, but QUARTILE has been kept in Excel for backwards compatibility.

 .

QUARTILE.INC was added in 2010 as an alternative to the QUARTILE.EXC function which was also added in 2010...

.

The QUARTILE.EXC function calculates quartiles differently. It uses the N+1 basis. With this basis, the values are offset to start at position 1 and we add 1 to N (the count of values).

To find out where we draw the line for the First quartile, we can calculate .25*(N+1) or .25*(10+1).

To find out where we draw the line for the Second quartile, we can calculate .50*(N+1) or .50*(10+1).

To find out where we draw the line for the Third quartile, we can calculate .75*(N+1) or .75*(10+1).

We get the answers Q1 = 2.75, Q2=5.5, Q3=8.25. Again, these indexes help estimate which values sit at those positions.

.

.

Now let’s compare the results we get using the QUARTILE.EXC function and the QUARTILE function.

A couple of things to note:

1. The 2nd quartile will always be the same between any quartile methods you use. The 2nd quartile is effectively the Median so that is locked.

2. QUARTILE.EXC excludes the median (middle number) from the population. This function results in quartiles that are a further from the center of the whole. This gives us a better estimate of actual population and gives us a potentially more accurate view of what values should be considered outliers.

3. Both the QUARTILE and QUARTILE.INC include the median (middle number) when determining interpolation. These functions are handy if you require relatively symmetrical quartiles. However, these functions shrink your interquartile range (distance between the upper and lower quartiles), making it more difficult to identify true statistical outliers.

 

.

No Standardized Way to Get to a Quartile?

In 1996 the team of Rob J. Hyndman and Yanan Fan published an article in the American Statistician titled “Sample Quantiles in Statistical Packages”. In it, they evaluated various definitions and methodologies for estimating Quantiles (a representation of quartile). Their goal was to point to a standard methodology that would be adopted by statistical software vendors so that “the same answers are produced by different packages and within each package”. In their article, they outlined nine different methods for the calculation quantiles. Of those nine, 6 are used in statistical software packages for continuous datasets. The matrix below shows each statistical software vendor and the Hyndman-Fan methods they employed before Office 2010. Note that R and Maple employ the full range of methods based on optional settings. Also note that Excel was the only vendor who did not offer a mechanism to calculate using Method 6 (N+1).

Before Office 2010          
  

Excel

R

SAS

Minitab

Maple

JMP

Method 4
(Linear CDF)

  

type=4

PCTLDEF=1

  

method=3

  

Method 5
(Piecewise Linear)

  

type=5

  

  

method=4

  

Method 6
(N+1)

  

type=6

PCTLDEF=4

Default

method=5

Default

Method 7
(N-1)

QUARTILE

type=7

  

  

method=6

  

Method 8
(N+1/3)

  

type=8

  

  

method=7

  

Method 9
(N + 1/4)

  

type=9

  

  

method=8

  

.

.

With the release of Microsoft Office 2010, Excel finally offered its users Method 6 (N+1 basis) via the QUARTILE.EXC function. Although it went largely unnoticed by many analysts, this is actually an incredible add. Not only does it offer some flexibility into how quartiles are calculated, but it brings us as close as we’ll ever get to the Hyndman-Fan dream of a standard method to calculate quantiles. Using QUARTILE.EXC, our quartile calculations will now match those in Minitab or JMP, or any other statistical package that has Method 6 available.

.

By the way, when Hyndman-Fan concluded their article, they recommend Method 8 be adopted as the standard. According to the two authors, with Method 8 “quantile estimates are approximately median-unbiased regardless of the distribution, making this method the most viable for calculating quantiles”. Sorry fellas, looks like Method 6 will be the first “standard” method across applications.

 

Replicating the Quartile Calculations for any Method using VBA

I’ll end this post with a User Defined Function that allows you to replicate any of the six Hyndman-Fan methods listed in the table above. No matter what statistical package you encounter, if their quartile calculation is based on any one of these methods, you can replicate their quartile this function. Also, if you have Excel 2007, you won’t have the QUARTILE.EXC function available to you. You can use this User Defined Function and get the same functionality.

.

This User Defined Function is based on a procedure originally posted by Jerry W. Lewis (a former Excel MVP) in 2008. I kept the core math, which I don’t truly understand (Don’t Judge Me), and built around it. Simply copy this code to a standard module.

  1. Function Quantile(MyRange As Range, p As Double, Optional m As Variant)
  2. 'Mike Alexander: www.datapigtechnologies.com
  3. 'Based on Code originally posted by Jerry W. Lewis (former Excel MVP)
  4. '***************************************************************************
  5. 'This function will replicate various quantile calcuations found in statistical software packages.
  6. 'Calculation is determined by the Hyndman-Fan method used.
  7.  
  8. 'Hyndman-Fan Method 4 Replicates: SAS(PCTLDEF=1), R(type=4), Maple(method=3)
  9. 'Hyndman-Fan Method 5 Replicates: R(type=5), Maple(method=4)
  10. 'Hyndman-Fan Method 6 Replicates: Excel(QUARTILE.EXC), SAS(PCTLDEF=4), R(type=6), Minitab, SPSS, BMDP, JMP, Maple(method=5)
  11. 'Hyndman-Fan Method 7 Replicates: Excel (QUARTILE and QUARTILE.INC), R(type=7), S-Plus, Maxima, Maple(method=6)
  12. 'Hyndman-Fan Method 8 Replicates: R(type=8), Maple(method=7)
  13. 'Hyndman-Fan Method 9 Replicates: R(type=9), Maple(method=8)
  14. '***************************************************************************
  15. 'Call function from Excel Spreadhseet by entering
  16. '=Quantile(Range, p, m)
  17.  
  18. 'Enter p as the fraction of the population (.25 for quartile 1, .75 for quartile 3, etc....)
  19. 'Enter m as the Hyndman-Fan Quantile method number (4, 5, 6, 7, 8 or 9)
  20. 'If m is left blank, the function will use method 6 by default
  21. '***************************************************************************
  22.  
  23. Dim n As Long
  24. Dim i As Long
  25. Dim QDef As Double
  26. Dim x As Double
  27.  
  28. 'Identify method and set the interpolation basis used
  29. Select Case m
  30. Case Is = 4
  31. QDef = 0
  32. Case Is = 5
  33. QDef = 0.5
  34. Case Is = 6
  35. QDef = p
  36. Case Is = 7
  37. QDef = 1 - p
  38. Case Is = 8
  39. QDef = (p + 1) / 3
  40. Case Is = 9
  41. QDef = (p + 1.5) / 4
  42. Case Else 'Use Hyndman-Fan 6 by default
  43. QDef = p
  44. End Select
  45.  
  46. 'Count values within MyRange and calculate the required position index
  47. n = WorksheetFunction.Count(MyRange)
  48. x = n * p + QDef
  49. i = WorksheetFunction.Max(WorksheetFunction.Min(Fix(x), n), 1)
  50.  
  51. 'Perform interpolation and return answer
  52. If (x - i) >= 0 And i < n Then
  53. Quantile = (1 - (x - i)) * WorksheetFunction.Small(MyRange, i) + (x - i) * WorksheetFunction.Small(MyRange, i + 1)
  54. Else
  55. Quantile = WorksheetFunction.Small(MyRange, i)
  56. End If
  57.  
  58. End Function

Notes: I tested this function in Excel with Hyndman-Fan Methods 6 and 7. It seems to fully replicate the results I get with QUARTILE.EXC and QUARTILE.INC. I don’t have R, SAS, or any other statistical package so I can’t fully test the other four methods (4, 5, 8, and 9). Be sure to do some self-testing if you plan on using this function. Feel free to let me know if you find that it needs tweaks.

 

One you’ve pasted the code into a standard module in your workbook, you can enter this formula syntax:

=Quantile (Range, p, m)

Enter p as the fraction of the population (to get quartile 1 enter .25, to get quartile 3 enter .75, etc…)

Enter m as the Hyndman-Fan method number (4, 5, 6, 7, 8 or 9)

.

Here’s a table showing the function calculating quartiles using every method. Notice how Method 8 (Hyndman and Fan’s recommended method) calculates quartiles that fall nicely between Method 6 and Method 7. It does indeed, look like Method 8 gives us a balanced set of quartiles.

..

Well I guess that’s it.

Now that I’ve bored the math out of you, have a great weekend!

17 thoughts on “Why Excel has Multiple Quartile Functions and How to Replicate the Quartiles from R and Other Statistical Packages

  1. Jon Peltier

    Watch out for those funny row heights. I couldn’t figure out why most of the methods returned 12.00 even though the 5th of 11 values was 11.00. Finally I realized that row 11 was too short to display its value.

    Method 4 is strange, in that it’s not symmetric like the rest.

  2. datapig Post author

    Jon: I noticed that too. Method 4 and 5 calculate straight linear estimations of quartile. That’s why they seem to be lopsided. I’m sure there’s a use for these, but I can’t imagine what.

  3. Brenda Myers

    Thanks for explaining the difference between the Quartile.exc & Quartile.inc functions. The inscrutable explanation from Microsoft in Excel help is just another example of Microsoft blowing off the customer which is normal these days.

  4. Jason G

    This is great. I often do conditional quartiles on large datasets. Is there a way to speed those up. Should I be taking the data into vba building my array with the conditions and then calculating it.

  5. Luis

    Is it possible to do this in MS Access with the VBA written above? I’m trying to calculate the different quartiles for a numeric field in one of my queries/table. I’ve coded into my database, but I keep getting errors, one for each row in the field. I think the issue is declaring the field as a range. Thanks.

  6. Gagandeep Singh

    Using the code, I am not getting the results as shown in the post.

    I changed the following:

    If (x – i) >= 0 And i < n Then

    to:

    If (x – i) >= 0 And i < n Then

    So,

    Is it that I am missing something here? and not getting the right results?

  7. Ulrike

    Thank you for this very good explanation. Before I read your article, I was ripping my hair off in despair over different results.

  8. MJD

    Thanks for this explanation. I teach an intro math class where the quartiles are defined in the book as the median of the upper and lower halves of the data(excluding the median if it is in the data set) and now see that as I had guessed I need to use the mean of the two quartile functions in Excel to replicate what the book is asking them to do.

  9. Katy

    Thank you for this explanation – really wasn’t clear in Excel what the difference between the two Excel functions was and I was in danger of marking all my students wrong….

  10. DaleW

    Excel has multiple quartiles because their original quartiles disagreed with everyone elses. If your population of interest is 10 numbers, 9 of 10 high school math teachers agree that the two quartiles are simply the median of the upper 5 and of the lower 5 numbers. Thus, the 3rd (5) and the 8th (19) for your ten prime numbers. Simple & intuitive. Excel’s N-1 method instead of an N-based method just muddied the waters.

    Granted, it gets much more complicated when we move from simple descriptive statistics to inferential statistics for population parameters given a sample. Then, to be as precise as possible, the most likely InterQuartile Range gets slightly wider, similarly if not quite as simply as going from an N-based population stdev.p() to an N-1 inferential sample stdev.s(). It turns out that the classic N+1 interpolative method 6 averages out to the correct population IQR for a uniform distribution, but the adjustment is less for most other distributions. Unfortunately when Excel added the useful N+1 interpolative method 6, they further confused everyone by calling them exclusive, when that already refered to a variation on N-based methods for odd population sizes where teachers couldn’t agree whether to exclude the median or not. Thanks, Microsoft.

  11. Pingback: Como o Excel calcula os Quartis? – Aprendendo Gestão

Leave a Reply

Your email address will not be published. Required fields are marked *