Generating Normally Distributed Random Numbers – Sample Data that Makes Sense

July 16th, 2014 by datapig 7 comments »

Like many of you, I'm often asked to develop mockups (working demos of dashboards and reports).

In my zeal to make my mockup as realistic as possible, I spend a good bit of time generating sample data.

.

For example, if I'm mocking up an HR dashboard, I'll dedicate some brain power to generating salary data. Now in order to make salary data realistic, there has to be a believable mix of high salary and low salary records. I can't just use the RAND and RANDBETWEEN functions Excel provides. These functions generate random data in a "Uniform Distribution". That is to say, every number has an equal chance of occurring with an equal frequency.

.

Let's say I use this formula and copy it down to 500 rows.

=RANDBETWEEN(1000,200000)

I instantly create 500 salary numbers, but there's a problem.

Plotting my newly created data in a histogram reveals the issue.

In this chart, the columns represent the count of records that fall within each salary range. Notice that there are just as many records in the 191k-200k range as in the 51k-60k range. This is definitely not something that holds true in the real world.

.

In reality, salary data has a distribution closer to the one shown in the histogram below.

There are lots people with salaries between 40k and 80k; and far fewer with salaries above 150k.

So if the RAND and RANDBETWEEN functions fail, how does an aspiring Excel nerd easily create sample data that makes sense for its intended purpose?

.

Well, after years going through the pain of creating sample data myself, I finally had enough and pulled together this function called RandNormalDist. This function, based on code by Donald Knuth, allows you to automatically generate random data that falls within a normal distribution. You can copy the code shown here and paste it into a Standard Module.

.

  1. Function RandNormalDist(Optional sMean As Double = 1, Optional sDeviation As Double = 1, Optional sPosSkew As Single = 0)
  2. 'Mike Alexander: www.datapigtechnologies.com
  3. 'Based on code originally published by Donald Knuth
  4. '***************************************************************************
  5. 'This function will generate a random number that falls within a normal distribution based on a starting Mean and Standard Deviation.
  6.  
  7. 'You can call this function from an Excel Spreadhseet by entering
  8. '=RandNormalDist(sMean, sDeviation, sPosSkew)
  9.  
  10. 'Example Usage 1:
  11. 'This generates a random number that falls within a distribution where the Mean is 20
  12. '=RandNormalDist(20)
  13.  
  14. 'Example Usage 2:
  15. 'This generates a random number that falls within a distribution where the Mean is 20 and the Standard Deviation is 3
  16. '  =RandNormalDist(20, 3)
  17.  
  18. 'Example Usage 3:
  19. 'This generates only a POSITIVE random number that falls in a distribution where the Mean is 20 and the Standard Deviation is 3
  20. '  =RandNormalDist(20, 3, 1)
  21. '***************************************************************************
  22. Dim x   As Double
  23. Dim y   As Double
  24. Dim z   As Double
  25. Dim m   As Double
  26.  
  27. 'Calculate initial coordinate pairs
  28. Start:
  29.     z = 2
  30.     Do Until z < = 1
  31.         x = 2 * Rnd - 1
  32.         y = 2 * Rnd - 1
  33.         z = x ^ 2 + y ^ 2
  34.     Loop
  35.    
  36. 'Do the math to get base multiplier
  37.    z = Sqr((-2 * Log(z)) / z)
  38.    
  39. 'Account for sPosSkew Positive request and factor in the other user inputs
  40.    If sPosSkew = 1 And (sDeviation * x * z + 1) < 0 Then
  41.     GoTo Start
  42.     End If
  43.    
  44.     RandNormalDist = Int(((sDeviation * x * z + 1)) * sMean)
  45.    
  46. End Function

.

Once you have the code in your workbook, you can call the function by entering it into your spreadsheet.

The idea is to enter the desired formula and copy it down to as many samples as you need.

.

This function accepts three optional arguments:

Mean: The average value you would like your sample data to have. If omitted, the function uses 1 as the Mean.

Standard Deviation: The number of standard deviations you would like your sample data values to be from the Mean. If omitted, the function uses 1 as the standard deviation.

Skew Positive: A tag that tells the function to only return positive numbers. If omitted, the function includes negative numbers in the resulting random numbers.

.

Here are a few examples of using the RandNormalDist function.

.

Entering RandNormalDist(20) will give you a tight group of numbers that fall around the number 20. As you can see, both negative and positive numbers are returned in a normal distribution.

.

Entering RandNormalDist(20,3) will give you a more dispersed (spread out) set of numbers. Again, you will see both negative and positive numbers returned in a relatively normal distribution.

.

Entering RandNormalDist(20,2,1) will give you a distributed group of ONLY POSITIVE numbers that taper away from the Mean. This kind of distribution is ideal for creating sample data where the numbers need to start around a Mean and move into a handful of higher positive values. This is ideal for generating random samples such as Salary data, Commission data, Help Desk Ticket data, etc.

.

This function has saved me tons of time and made my mock data so much better.

Hopefully it will do the same for you!

Bonus: "Yo momma" joke for Statisticians:

Lazy Summer Update

July 9th, 2014 by datapig 4 comments »

Hey there….remember me? I've been having a lazy summer; cooking my obscene body in the hot South Carolina sun. I'll be back with regular posts next week. I just wanted to get this update out.

.

New Book in August

Dick Kusleika and I wrote an Excel Formulas book. This book is for anyone who wants to jump head first into formulas. Here, we offer step-by-step instructions for creating and using 101 of the most commonly requested formulas. We include tips, tricks and detailed explanations along with actual working formulas. This is targeted at beginning to intermediate Excel users. Feel free to click on the book to see more details.

.

.

The Next Excel Business Intelligence Boot Camp

I've been getting a ton of emails asking when the next Excel Business Intelligence Boot Camp will be held. I have an answer. April 21 - April 23 in Dallas, Texas. At this event, we'll cover dashboards, visualizations, Excel's Power BI tools, and much more. I'll have more info on that in a future post.

.

.

If You Can't Wait for the Boot Camp

If you can't wait for the next Excel BI boot camp, then do the next best thing. Pick up the book at Amazon. It has an excellent plot and it's based on a true story.


.

.

A Random Summer Song to Finish out the Week

This song is about riding on the back of a Jet-Ski with an orangutan.

8 VBE Tips Every Programmer Should Know

June 19th, 2014 by datapig 13 comments »

I've covered these time-saving VBE (Visual Basic Editor) tricks across several posts over the years. I thought it would be helpful to get them all into a single post.

.

Whether you're a fresh-faced analyst new to programming, or a jaded veteran living on sunflower seeds and Mountain Dew, these tips will make programming with Excel and Access so much easier.

.

Tip 1: Block Comments

Are you still commenting code one line at a time?

Well you can save time by using the Comment Block command on the Edit toolbar.

This command lets you highlight entire sections of code and comment them at the same time.

.

To get to the Edit toolbar, go up to the VBE menu and click View->Toolbars->Edit.

Once you've activated the Edit toolbar, you can dock it to the top of your screen.

.

.

Tip 2: Quick Copy Blocks of Code

Copy whole blocks of code by holding down the Ctrl key on your keyboard while dragging the block where you need it.

This is an old Windows trick that works even when you drag across modules.

.

.

Tip 3: Jumping between Modules and Procedures

Press Ctrl+Tab to quickly move between modules.

Press Ctrl+Page Up and Ctrl+Page Down lets you move between procedures within a module.

.

.

Tip 4: Jumping directly to your Functions/Variables

If you encounter a Function or a Variable that you would like to explore, simply click on the Function/Variable name and press Shift+F2.

This will take you right to it. Pressing Ctrl+Shift+F2 will take you back.

.

.

Tip 5: Staying in Right Procedure

Have you ever tried to scroll through your code only to find that you've scrolled into the next procedure?

You try to remedy this by scrolling up, but you overshoot again into the wrong procedure.

To avoid this nonsense, use the Procedure View button on the low left hand corner of the VBE.

Clicking this button will limit scrolling to only the function or procedure you're in.

The button next to it will take you back to full module view.

.

.

Tip 6: Dragging the Yellow Arrow

You know that yellow arrow you see when you debug your code?

Well, you can drag it up or down to the particular line of code you're interested in running.

You can use this technique to easily skip bits of code while debugging.

.

.

Tip 7: See the Beginning and End of Variable Values

Hovering over your variables (while in debug mode) activates a tooltip that shows you the values that are being passed in and out of them - very useful.

By default, these tooltips will show around 70 or so characters (including the variable name).

If the value in your variable is too long, it gets cut off.

.

To see the last 70 or so characters, simply hold down the CTRL key while you hover.


.

.

Tip 8: Turning off Auto Syntax Check

Often times, while working on some code, you find that you need to go to another line to copy something.

You're not done with the line – you just need to leave it for a second.

But VBE immediately stops you in your tracks to tell you something you already know.

After a half-day of this, you're ready to clean your ears out with a gun.

Well, you can save your bullets and your sanity by turning off Auto Syntax Check.

Go up to the VBE menu and select Tools -> Options.

The options dialog box will activate, exposing the Auto Syntax Check option under the Editor tab.

Simply uncheck the property and the VBE will keep quiet.

The neat thing is that your code still turns red if you goof up, providing a visual indication that something is wrong.

.

Feel free to comment on any other tip I may have missed.

Enjoy your Mountain Dew and Sunflower seeds.

Rounding to Significant Digits

June 16th, 2014 by datapig 6 comments »

I was recently asked to show some financial figures in significant digits. I'll be honest and say I've never had to do this in my many years of reporting, so I didn't exactly know what that meant. Apparently, the idea is when you're dealing with numbers in the millions, there is no need to inundate a report with superfluous numbers for the sake of showing precision down to the tens, hundreds and thousands place.

.

As an example, instead of showing the number 883,788, you could choose to round the number to one significant digit. This would mean displaying the same number as 900,000. Rounding 883,788 to two significant digits would show the number as 880,000.

.

In essence, you're deeming that a particular numbers place is significant enough to show. The rest of the number can be replaced with zeros. This may feel like it could introduce problems, but when dealing with large enough numbers, any number below a certain significance would be inconsequential.

.

The screenshot below demonstrates the formula I used to round my values to a given number of significant digits.

.

How it Works

We all know that Excel's ROUND function is used to round a given number to a specified number of digits (the ROUND function takes two arguments: the original value and the number of digits to round to). What you may not know is that passing a negative number as the second argument tells Excel to round based on significant digits to the left of the decimal point. For instance, the following formula returns 9500.

=ROUND(9489,-2)

Changing the significant digits argument to -3 will returns a value of 9000.

=ROUND(B14,-3)

The works great, but what if you have numbers on differing scales? That is to say, what if some of your numbers are millions while others are hundreds of thousands? If you wanted to show them all with 1 significant digit, you would need to build a different ROUND function for each number to account for the differing scales.

.

To help solve this issue, I replaced the hard-coded significant digits argument with a formula that calculates what that number should be. Imagine that your number is -2330.45. You can use this syntax as the significant digits argument in your ROUND function.

LEN(INT(ABS(-2330.45)))*-1+2

This first wraps your number in the ABS function, effectively removing any negative symbol that may exist. It then wraps that result in the INT function, stripping out any decimals that may exist. It then wraps that result in the LEN function to get a measure of how many characters are in the number without any decimals or negation symbols. In this example, this part of the formula results in the number 4. If you take the number -2330.45 and strip away the decimals and negative symbol, you have 4 characters left. This number is then multiplied by -1 to make it a negative number, and then added to the number of significant digits we you are looking for. In this example, that calculation looks like this: 4*-1+2 = -2.

.

I used this as the second argument for the ROUND function.

Test it out yourself. Enter this formula into Excel, and you'll round -2330.45 to 2300 (2 significant digits).

=ROUND(-2330.45, LEN(INT(ABS(-2330.45)))*-1+2)

.

As you can see in my screenshot, I simply replaced the hard-coded numbers with a cell reference that points to the actual value to be rounded and a cell reference that holds the number of desired significant digits.

=ROUND(B5,LEN(INT(ABS(B5)))*-1+$E$3)

.

Again, I'm not sure I'll ever need to use this significant digits nonsense in the future, but at least I now know how.

5 Years of Bacon Bits

June 9th, 2014 by datapig 18 comments »

On June 10th 2014 the Bacon Bits blog officially hits the 5-year mark.

Yes, this humble blog of mine has been changing the lives of countless nerds for 5 years.

So today, you get to watch me stroke my own ego with this self-congratulatory navel-gazing post. Enjoy!

» Read more: 5 Years of Bacon Bits

Using Pictures to Create Info-Graph Charts

June 6th, 2014 by datapig 4 comments »

I've been playing around with using pictures in my Excel info-graphics. I rarely use pictures in charts, but with an info-graphic, it seems like pictures could be appropriate for this type of visualization.

Here are a couple of examples I pulled together to illustrate the kinds of things you could do.

.

In this example, I created a waffle chart and then filled the two data series with their own pictures. » Read more: Using Pictures to Create Info-Graph Charts

Aligning Custom Formatting Icons

June 3rd, 2014 by datapig 12 comments »

Hello there Excel nerds.

Well it's been a few weeks since my last post. I've been wrapping up my last book of the year and doing some other projects. I'm ready to get back into the blog saddle.

.

Today, I'd like to show you a quick tip for fixing the alignment of Custom Formatting Icons. If you don't know what those are, see the post I wrote on Improving the Default Percent Formatting.

.

In that post, I show » Read more: Aligning Custom Formatting Icons

Creating a Slide-Out Menu in Excel

May 15th, 2014 by datapig 12 comments »

 I've been toying with the idea of an animated Slide-Out Menu in Excel.  As you can see, a Slide-Out Menu is simply a macro-driven way to hide your slicers away, until you need them.

 

 

 

 

 

 

 

 

 

 

 

.

.

.

» Read more: Creating a Slide-Out Menu in Excel

Getting Fancy with your Excel Slicers

April 30th, 2014 by datapig 17 comments »

During last week's BI Boot Camp, I demonstrated how to customize Excel Slicers to better fit into the color scheme and theme of a dashboard. The process of fully customizing a slicer is a bit of a drag. But once you have the hang of it, you'll never use the default slicers again. In this post, I'll give you a few ideas for your slicers and show you how the customization works.

.

First, let's go through a few examples of some fancy slicers.

.

Tabs: With a little customization and a few tricks, I made this slicer look like tabs behind my pivot table.

.

Clean Dashboard Look: I customized this slicer so that any values with no data are kept subdued with no buttons around them. The modern color scheme gives it a clean dashboard feel.

» Read more: Getting Fancy with your Excel Slicers

A Better Way to Show Empty Values in a Pivot Table

April 28th, 2014 by datapig 12 comments »

It's not uncommon to have empty values in a pivot table.

When you do, you get gaps in your pivot report like the ones you see here.

The problem with the empty values is that the gaps can make it difficult to keep track of the row and column you're looking at; especially when showing lots of data.

.

For this reason, Excel lets you replace empty cells with some character. » Read more: A Better Way to Show Empty Values in a Pivot Table