Array Formula Heresy

July 22, 2009 by datapig Leave a reply »

If you want to commit Excel heresy, tell people you don’t like Array formulas. Oh the blasphemy – the sacrilege!

The truth is I really do try to avoid using array formulas when there is a reasonable non-array alternative. The long-and-short of it is that I think array formulas are difficult to understand and too easy to break.

I’m man enough to admit that at least part of the reason I avoid array formulas is because I’m not smart enough to understand all of the array formulas I see. In fact, I have to wrap duct tape around my head before looking at some of the fancy array creations I see other gurus employ.

I remember Bill Jelen once told me that Aladin Akyurek creates the most amazing array formulas to solve most problems. He tells people that if they ever get a formula from Aladin, just copy and paste it into the cell. Don’t worry how it works…it will just work.

While I appreciate the charm of Bill’s quip, it would frustrate me on some level to have a formula I don’t fully understand working in my spreadsheet.

Also array formulas can be broken so easily. Enter a cell with an array formula and tab out of it. Poof, it’s gone. In order to restore it, you have to enter Shift+Ctrl+Enter.

Let’s face it, the vast majority of the Excel users don’t know (and probably don’t care) how array formulas work. Chances are they won’t know the magical Shift+Ctrl+Enter maneuver. So why would I fill their spreadsheets with annoyingly delicate array formulas that can be broken with a simple action such as entering the cell.

Let me go through a classic example:

.

I need to get an average of the numbers shown in column B, but the AVERAGE function is including the zeros, giving me 54 as the answer. I need the average of all the numbers that are not zero.

arrayheresy1

.
.
So I decide to use the simple {=AVERAGE(IF(B1:B15,B1:B15))} array formula. This gives me the correct answer. However, for the average Excel user, It’s a mystery exactly how Excel knows to exclude the zeros using this formula. This formula doesn’t even have a zero in it – how does Excel know what you need it to do?
arrayheresy2

.
.
Now let’s say I’m a curious monkey and I decide to enter the cell, just to see how the formula is working.

arrayheresy3

.
.
This looks good so I exit the cell. As soon as I exit the cell, the formula errors out. It seems as though my douchery has broken the formula. Again, this can be fixed re-entering the cell and pressing Shift-Ctrl-Enter, but some people won’t know that.

arrayheresy4

.
.
So my clever array solution is essentially an accident waiting to happen.

.
.
A non-array solution to this problem is to enter the formula:

=SUM(B1:B15)/MAX(1,COUNT(B1:B15)-COUNTIF(B1:B15,0))

This formula provides the same answer and it won’t hand me a turd if I simply enter the cell and exit it – without having to enter the mystical Shift+Ctrl+Enter. Also anyone with reasonable Excel and math skills can break apart this formula to determine exactly how it works.

arrayheresy5

.
.
Another bonus is that my formula won’t break when text is introduced into the range. Not so for the array formula.

arrayheresy6

.
.
All that being said, I have to say I completely respect the things some people can accomplish with array formulas. I’m sure many of you can point out situations where an array formula is the only solution. When I encounter those situations, I’ll put my big-girl panties on and I’ll use the array solution. Until then, I’ll stick to my non-array alternatives.

Now if you’ll excuse me, I’ve got to go buy a helmet.

Advertisement

16 Responses

  1. gary says:

    just wondering why this wouldn’t work.

    =SUM(B1:B15)/COUNTIF(B1:B15,”>0″)

  2. DataPig says:

    Gary: That formula will error out if there are no numbers > 0. Not probable, but possible.

    But you’re right in that I could simplify my formula to :

    =SUM(B1:B15)/MAX(1,COUNTIF(B1:B15,”>0″))

  3. mduff says:

    One thing you forgot to add is that array formulas are Memory hogs add about 10 of them on a sheet and a large data set and watch excel crash…. I try to avoid them for that reason
    BTW I also don’t understand a lot of the ones I have used but I’m not that picky I can live with this qoute (and have) ….
    “He tells people that if they ever get a formula from Aladin, just copy and paste it into the cell. Don’t worry how it works…it will just work”

  4. dermotb says:

    In this case, I would probably use one cell for the total and one cell for the count, then it is easy to follow, especially if you add descriptive labels
    B18= SUMIF(B1:B15,”>0?)
    B19= COUNTIF(B1:B15,”>0?)
    B20 = IF(B19=0,0,B18/B19)

  5. sam says:

    There are situations when you cant do away with array formulas

    Ex: To find an Amount next to the Nth instance of a Name in a Table with Names and Amounts where Names are repeated more than once

  6. Another very important disadvantage of array formulas is that they may be much slower than setting up a few columns with intermediate calcs.
    I have once speeded up a workbook of a client by a factor 10 (!) by converting the array formulas to normal columns with formulas calculating intermediate results.
    The new model was also easier to audit.

    That being said, don’t you love array formulas? So elegant to all the calculations of your entire model in a singe cell!

  7. Colin Banfield says:

    In this simple case, AVERAGEIF(B1:B15,”>0″) returns the correct answer. Array formulas can be powerful, especially for extracting data from a range that meets certain criteria. They can be powerful for difficult lookup problems, although I always try to find a SUMPRODUCT solution first in these instances.

    One way to help in understanding array formulas is using the Formula Evaluator (which I use often to debug tricky formulas). For example, in the Formula Evaluator, {=AVERAGE(IF(B1:B15,B1:B15)} displays =AVERAGE({10,20,30,FALSE,50,70,FALSE,FALSE,90,FALSE,FALSE,120,130,140,150}). The AVERAGE function ignores logical values (FALSE in this case).

  8. Tony Rose says:

    Colin – the averageif formula is only available in Excel 2007. Also, your example using it would not work correctly if there were negative numbers in the data set.

    Array formula, IMHO, are the most powerful functions outside of maybe macros and pivots in Excel. By using them you can drastically cut down on the number of columns you need to add to setup the criteria for a single formula. Basically, you take a 5 step formula and do it all in one.

    Yes they are a PITA and if you don’t do the ctrl + shift + enter, they blow up. I am a big fan.

    Good post Mike!

  9. datapig says:

    Jan Karel: “So elegant to do all the calculations of your entire model in a singe cell!”

    Ha. I do think that some people strive to build the most elelgant formulas they can. Unfortunately this makes it difficult to audit their model.

    Not all array formulas are memory hogs, but I get your meaning. I have seen workbooks grind to a halt because of the number of array formulas present.

    Colin: Good point about the Formula Evaluator.

    Tony: I don’t mind risking a few extra helper columns if it will help in making my data model more stable and transparent.

  10. Colin Banfield says:

    Tony, good point about AVERAGEIF and Excel 2007. To account for negative numbers, the AVERAGEIF formula would be changed to AVERAGEIF(B1:B15,”0″).

    When practical, my preference is to use helper columns. However, there are times when adding helper columns aren’t practical or make no difference to the problem at hand. In such situations, array formulas are great. Like good VBA code, I’ve gotten into the habit of documenting tricky formulas (which include virtually all array formulas), either via cell comments or on a separate worksheet.

  11. Jared says:

    I have to lock the array formula cells so the monkeys I work with can’t mess it up.

  12. Colin Banfield says:

    Hmmm, appears that the modified formula didn’t show up correctly in my previous post, so I’ll spell out the condition in caps. AVERAGEIF(B1:B15,”NOT_EQUAL_TO 0″)

  13. Colin Banfield says:

    “I have to lock the array formula cells so the monkeys I work with can’t mess it up.”

    So it’s OK for the monkeys to mess up your non-array formulas? :^)

  14. Bill McIver says:

    Array formulas themselves are not that difficult a concept; they seem to be the Excel formula version of the Visual Basic “For….Next” loop.
    They get complicated (as in the example here) when they contain shorthand boolean formulas.
    All this example does is run through the cells B2 to B16 replacing zeroes with the logical value false and then counting and averaging the “non false” values.
    This would be clearer if you used the more explicit:
    {=AVERAGE(IF(B2:B16<>0,B2:B16,FALSE))}
    Of course they are too easily broken so, for that reason only, are probably best avoided.

  15. JamesDX says:

    Anyone know how to do things like this?

  16. Usually I don’t browse post on blog, however , I wish to say that this write-up incredibly forced me to try and do it! Your posting style has been shocked me. Many thanks, really excellent article.

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

* Copy this password:

* Type or paste password here:

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>