Like many of you, I'm often asked to develop mockups (working demos of dashboards and reports).

In my zeal to make my mockup as realistic as possible, I spend a good bit of time generating sample data.

.

For example, if I'm mocking up an HR dashboard, I'll dedicate some brain power to generating salary data. Now in order to make salary data realistic, there has to be a believable mix of high salary and low salary records. I can't just use the RAND and RANDBETWEEN functions Excel provides. These functions generate random data in a "Uniform Distribution". That is to say, every number has an equal chance of occurring with an equal frequency.

.

Let's say I use this formula and copy it down to 500 rows.

**=RANDBETWEEN(1000,200000)**

I instantly create 500 salary numbers, but there's a problem.

Plotting my newly created data in a histogram reveals the issue.

In this chart, the columns represent the count of records that fall within each salary range. Notice that there are just as many records in the 191k-200k range as in the 51k-60k range. This is definitely not something that holds true in the real world.

.

In reality, salary data has a distribution closer to the one shown in the histogram below.

There are lots people with salaries between 40k and 80k; and far fewer with salaries above 150k.

So if the RAND and RANDBETWEEN functions fail, how does an aspiring Excel nerd easily create sample data that makes sense for its intended purpose?

.

Well, after years going through the pain of creating sample data myself, I finally had enough and pulled together this function called **RandNormalDist**. This function, based on code by Donald Knuth, allows you to automatically generate random data that falls within a normal distribution. You can copy the code shown here and paste it into a Standard Module.

.

- Function RandNormalDist(Optional sMean As Double = 1, Optional sDeviation As Double = 1, Optional sPosSkew As Single = 0)
- 'Mike Alexander: www.datapigtechnologies.com
- 'Based on code originally published by Donald Knuth
- '***************************************************************************
- 'This function will generate a random number that falls within a normal distribution based on a starting Mean and Standard Deviation.
- 'You can call this function from an Excel Spreadhseet by entering
- '=RandNormalDist(sMean, sDeviation, sPosSkew)
- 'Example Usage 1:
- 'This generates a random number that falls within a distribution where the Mean is 20
- '=RandNormalDist(20)
- 'Example Usage 2:
- 'This generates a random number that falls within a distribution where the Mean is 20 and the Standard Deviation is 3
- ' =RandNormalDist(20, 3)
- 'Example Usage 3:
- 'This generates only a POSITIVE random number that falls in a distribution where the Mean is 20 and the Standard Deviation is 3
- ' =RandNormalDist(20, 3, 1)
- '***************************************************************************
- Dim x As Double
- Dim y As Double
- Dim z As Double
- Dim m As Double
- 'Calculate initial coordinate pairs
- Start:
- z = 2
- Do Until z < = 1
- x = 2 * Rnd - 1
- y = 2 * Rnd - 1
- z = x ^ 2 + y ^ 2
- Loop
- 'Do the math to get base multiplier
- z = Sqr((-2 * Log(z)) / z)
- 'Account for sPosSkew Positive request and factor in the other user inputs
- If sPosSkew = 1 And (sDeviation * x * z + 1) < 0 Then
- GoTo Start
- End If
- RandNormalDist = Int(((sDeviation * x * z + 1)) * sMean)
- End Function

.

Once you have the code in your workbook, you can call the function by entering it into your spreadsheet.

The idea is to enter the desired formula and copy it down to as many samples as you need.

.

This function accepts three optional arguments:

**Mean:** The average value you would like your sample data to have. If omitted, the function uses 1 as the Mean.

**Standard Deviation:** The number of standard deviations you would like your sample data values to be from the Mean. If omitted, the function uses 1 as the standard deviation.

**Skew Positive:** A tag that tells the function to only return positive numbers. If omitted, the function includes negative numbers in the resulting random numbers.

.

Here are a few examples of using the **RandNormalDist** function.

.

Entering **RandNormalDist(20) **will give you a tight group of numbers that fall around the number 20. As you can see, both negative and positive numbers are returned in a normal distribution.

.

Entering **RandNormalDist(20,3) **will give you a more dispersed (spread out) set of numbers. Again, you will see both negative and positive numbers returned in a relatively normal distribution.

.

Entering **RandNormalDist(20,2,1) **will give you a distributed group of ONLY POSITIVE numbers that taper away from the Mean. This kind of distribution is ideal for creating sample data where the numbers need to start around a Mean and move into a handful of higher positive values. This is ideal for generating random samples such as Salary data, Commission data, Help Desk Ticket data, etc.

.

This function has saved me tons of time and made my mock data so much better.

Hopefully it will do the same for you!

**Bonus:** "Yo momma" joke for Statisticians: