Ten Simple Tricks to Speed up Your Excel VBA Code

April 13th, 2015 by datapig 9 comments »

I'm currently on a project to improve the performance of a particular workbook.

Part of this task involves speeding up a few of the macros in the workbook.

While I'm thinking about this stuff, I'd like to share a few simple tips for speeding up your Excel VBA code.

Some of you will know all of these and more. So feel free to comment and share any other performance tips you may have.

.

Halt Sheet Calculations

Each time a cell that affects any formula in your spreadsheet is changed or manipulated, Excel recalculates the entire worksheet. In worksheets that have a large amount of formulas, this behavior can drastically slow down your macros. If your workbook is formula intensive, you may not want Excel to trigger a recalculation every time a cell value is altered by your macro. You can use the Application.Calculation property to tell Excel to switch to manual calculation mode. When a workbook is in manual calculation mode, the workbook will not recalculate until you explicitly trigger a calculation. The idea is to place Excel into manual calculation mode, run your code, and then switch back to automatic calculation mode. Setting the calculation mode back to xlCalculationAutomatic will automatically trigger a recalculation of the worksheet.

Application.Calculation = xlCalculationManual

'Place your macro code here

 

Application.Calculation = xlCalculationAutomatic

.

Disable Sheet Screen Updating

You may notice that when your macros run, your screen does a fair amount of flickering. This flickering is Excel trying to redraw the screen in order to show the current state the worksheet is in. Unfortunately, each time Excel redraws the screen, it takes up memory resources. In most cases, you don't need Excel using up resources to redraw the screen each time your macro performs some action. In addition to setting the calculation mode to manual, you can use the Application.ScreenUpdating property to disable any screen updates until your macro has completed. This saves time and resources, allowing your macro to run a little faster. Once you macro code is done running, you can turn screen updating back on.

Application.Calculation = xlCalculationManual

Application.ScreenUpdating = False

'Place your macro code here

Application.Calculation = xlCalculationAutomatic

Application.ScreenUpdating = True

.

Turn Off Status Bar Updates

The status bar normally displays the progress of certain actions in Excel. For example, if you copy/paste a range, Excel will show the progress of that operation on the status bar. Often times, the action is performed so fast, you don't see the status bar progress. However, if your macro is working with lots of data, the status bar will take up some resources. It's important to note that turning off screen updating is separate from turning off the status bar display. That is to say, the status bar will continue to be updated even if you disable screen updating. You can use the Application.DisplayStatusBar property to temporarily disable any status bar updates, further improving the performance of your macro.

Application.Calculation = xlCalculationManual

Application.ScreenUpdating = False

Application.DisplayStatusBar = False

'Place your macro code here

Application.Calculation = xlCalculationAutomatic

Application.ScreenUpdating = True

Application.DisplayStatusBar = True

.

Tell Excel to Ignore Events

Let's say you have a Worksheet_Change event implemented for Sheet1 of your workbook. Any time a cell or range is altered on Sheet1, the Worksheet_Change event will fire. So if you have a standard macro that manipulates several cells on Sheet1, each time a cell on that sheet is changed, your macro has to pause while the Worksheet_Change event runs. You can imagine how this behavior would slow down your macro. You can You can add another level of performance boosting by using EnableEvents property to tell Excel to ignore events while your macro runs. Simply set the EnableEvents property False before running your macro. Once you macro code is done running, you can set the EnableEvents property back to True.

Application.Calculation = xlCalculationManual

Application.ScreenUpdating = False

Application.DisplayStatusBar = False

Application.EnableEvents = False

'Place your macro code here

Application.Calculation = xlCalculationAutomatic

Application.ScreenUpdating = True

Application.DisplayStatusBar = True

Application.EnableEvents = True

.

Although disabling events can indeed speed up your macros, you may actually need some events to trigger while your macro runs. Be sure to think about our specific scenario and determine what will happen if your worksheet or workbook events are turned off while your macro runs.

.

Hide Page Breaks

Another opportunity for a performance boost can be found in Page Breaks. Each time your macro modifies the number of rows, modifies the number of columns, or alters the Page Setup of a worksheet, Excel will be forced to take up time recalculating where the page breaks are shown on the sheet. You can avoid this by simply hiding the page breaks before starting your macro. Set the DisplayPageBreaks sheet property to False in order to hide page breaks. If you want to continue to show page breaks after your macro runs, you can set the DisplayPageBreaks sheet property back to True.

Application.Calculation = xlCalculationManual

Application.ScreenUpdating = False

Application.DisplayStatusBar = False

Application.EnableEvents = False

Activesheet.DisplayPageBreaks = False

'Place your macro code here

Application.Calculation = xlCalculationAutomatic

Application.ScreenUpdating = True

Application.DisplayStatusBar = True

Application.EnableEvents = True

Activesheet.DisplayPageBreaks = True

.

Suspend Pivot Table Updates

If your macro manipulates pivot tables that contain large data sources, you may experience poor performance when doing things like dynamically adding or moving pivot fields. This is because each change you make to the structure of the pivot table requires Excel to recalculate all values in the pivot table for each pivot field your macro touches. You can improve the performance of your macro by suspending the recalculation of the pivot table until all your pivot field changes have been made. Simply set the PivotTable.ManualUpdate property to True to defer recalculation, run your macro code, and then set the PivotTable.ManualUpdate property back to False to trigger the recalculation.

ActiveSheet.PivotTables("PivotTable1").ManualUpdate=True

'Place your macro code here

ActiveSheet.PivotTables("PivotTable1").ManualUpdate=False

.

Steer Clear of Copy and Paste

It's important to remember that while the Macro Recorder saves time by writing VBA code for you, it doesn't always write the most efficient code. A prime example of this is how the Macro Recorder captures any copy and paste action you perform while recording. If you were to copy cell A1 and paste it into cell B1 while recording a macro, the Macro Recorder would capture this:

Range("A1").Select

Selection.Copy

Range("B1").Select

ActiveSheet.Paste

While this code will indeed copy from cell A1 and paste into B1, it forces Excel to utilize the clipboard which adds a kind of middle man where there does not need to be one. You can give your macros a slight boost by cutting out the middle man and performing a direct copy from one cell to a destination cell. This alternate code uses the Destination argument to bypass the clipboard and copy the contents of cell A1 directly to cell B1.

Range("A1?).Copy Destination:=Range("B1?)

If you only need to copy values (not formatting or formulas), you can improve performance even more by avoiding the Copy method all together. Simply set the value of the destination cell to the same value found in the source cell. This method is about approximately 25 times faster than using the Copy method.

Range("B1?).Value = Range("A1?).Value

If you need to copy only the formulas from one cell to another, (not values or formatting), you can set the formula of the destination cell to the same formula contains in the source cell.

Range("B1?).Formula = Range("A1?).Formula

.

Use the With Statement

When recording macros, it's not uncommon to manipulate the same object more than once. For example, your code may change the formatting of cell A1 so that it is underlined, italicized, and formatted bold. If you were to record a macro applying these formatting options to cell A1, you would get something like this.

Range("A1").Select

Selection.Font.Bold = True

Selection.Font.Italic = True

Selection.Font.Underline = xlUnderlineStyleSingle

Unfortunately, this code is not as efficient as it could be because it forces Excel to select and then change each property separately. You can save time and improve performance by using the With statement to perform several actions on a given object in one shot. The With statement utilized in this example tells Excel to apply all the formatting changes at one time. Getting into the habit of chunking actions into With statements will not only keep your macros running faster, but it will also help to more easily read your macro code.

With Range("A1").Font

.Bold = True

.Italic = True

.Underline = xlUnderlineStyleSingle

End With

.

Don't Explicitly Select Objects

The Macro Recorder is quite fond of using the Select method to explicitly select objects before taking actions on them. If you were to record a macro while entering the value 1000 in cell A1 for multiple sheets, you would end up with code that looks similar to this.

Sheets("Sheet1").Select

Range("A1").Select

ActiveCell.FormulaR1C1 = "1000"

Sheets("Sheet2").Select

Range("A1").Select

ActiveCell.FormulaR1C1 = "1000"

Sheets("Sheet3").Select

Range("A1").Select

ActiveCell.FormulaR1C1 = "1000"

While this code will run fine, it's not at all efficient. It forces Excel to take the time to explicitly select each object that is being manipulated. There is generally no need to select objects before working with them. In fact, you can dramatically improve macro performance by not using the Select method. Make it a habit to remove the Select method from any generated code. In this case, the optimized code would look like this. Note that the nothing is being selected. The code simply uses the object hierarchy to apply the needed actions.

Sheets("Sheet1").Range("A1").FormulaR1C1 = "1000"

Sheets("Sheet2").Range("A1").FormulaR1C1 = "1000"

Sheets("Sheet3").Range("A1").FormulaR1C1 = "1000"

.

Avoid Excessive Trips to the Worksheet

Another way to speed up your macros is to limit the amount of times you reference worksheet data in your code. It is always less efficient to grab data from the worksheet than from memory. That is to say, your macros will run much faster if they do not have to repeatedly interact with the worksheet. For instance, this simple code forces VBA to continuously return to Sheets("Sheet1").Range("A1") to get the number needed for the comparison being performed in the If statement.

For ReportMonth = 1 To 12

If Range("A1").Value = ReportMonth Then

MsgBox 1000000 / ReportMonth

End If

Next ReportMonth

 

A much more efficient way is to save the value in Sheets("Sheet1").Range("A1") into a variable. This way, the code references the variable instead of the worksheet.

Dim MyMonth as Integer

MyMonth = Range("A1").Value

For ReportMonth = 1 To 12

If MyMonth = ReportMonth Then

MsgBox 1000000 / ReportMonth

End If

Next ReportMonth

.

Hope that helps. Again, feel free to add your own tips to the comments.

VBA to See Who Last Updated a Workbook

April 7th, 2015 by datapig 3 comments »

Have you always wanted to prove that your co-workers are morons intent on making you look bad? Well today, I'll share a trick that will make all you paranoid schizophrenics very happy.

.

A friend of mine asked if there is a way to easily see who last updated a particular workbook.

Apparently, he works in an environment where multiple people are editing workbooks on a share drive.

Of course, he knows that the workbook properties will show who last saved changes, but he wants to see that info as soon has he opens the file.

.

My answer to him was to hijack the Excel Title Bar. The Title Bar usually only shows the name of the workbook. But with a bit of simple VBA, you can have it also show the last person who updated the file and the date of the last save.

.

The basis for this code is relatively straight forward.

We can use the document properties (last author and last save time) to augment the Title Bar with more useful information.

There is a workbook property called BuiltinDocumentProperties which lets you get to the document properties for your workbook.

.

To get the name of the username of the last person who updated and saved the file, you can use:

ThisWorkbook.BuiltinDocumentProperties("Last Author")

.

To get the date of the last save, you can use:

ThisWorkbook.BuiltinDocumentProperties("Last Save Time")

.

As you can see in the code below, we are using these two statements to add these properties to the window caption of "ThisWorkbooks".

You can either paste this code in a standard module that you can run on demand, or you can get fancy and place this code in the workbook's OPEN event so that it automatically runs on open.

  1. Private Sub Workbook_Open()
  2.  
  3. ThisWorkbook.Windows(1).Caption = ThisWorkbook.Windows(1).Caption & _
  4. "   Last Updated By: " & _
  5. ThisWorkbook.BuiltinDocumentProperties("Last Author") & _
  6. " on " & _
  7. ThisWorkbook.BuiltinDocumentProperties("Last Save Time")
  8.  
  9. End Sub

.

Morons beware! We have our eyes on you now.

Excel Power BI Boot Camp in May

March 23rd, 2015 by datapig 1 comment »

Hi Folks. I've been busy wrapping up the final touches to a book that will come out in May.

I don't have time for a full post, but there's always time for a crass commercial message.

.

I wanted to give you a heads up on our new BI training scheduled for May in Dallas Texas.

This 3-day event is aimed squarely at Excel analysts who find it increasingly necessary to become more efficient at working with the new Microsoft BI tools like Power Pivot and Power Query. 

Bill Jelen (MrExcel) and I will guide you through the mysterious world of Microsoft's new BI Tools from a business analyst's point of view, introducing you to the rich set of tools and reporting capabilities that can be leveraged to more effectively synthesize data into Business Intelligence Dashboards.

 Learn how to:

 * Analyze large amounts of data and report those results in a meaningful way
 * Get better visibility into data with new Data Mining tools
 * Perform advanced Clustering and Market Basket analytics
 * Add interactive controls to your dashboards without VBA
 * Automate repetitive data cleansing and transformation tasks
 * Create eye-catching visualizations and Dashboards
 * Create map-based dashboards
 * Access external data sources to expand your message.

   
If you or any of your team members register with the discount code BUDDY, you'll get a $100 discount off registration.

Click Here to Register for this Event

 

Pull your Global Address Book into Excel

March 6th, 2015 by datapig 5 comments »

A friend of mine recently asked if there is a way to create an Excel table with employee information from the Outlook Properties dialog box for all employees in his Global Address Book. The Outlook Properties dialog box (seen below) gets its data from the company Active Directory. Active Directory is a special kind of database that is typically used to store organizational data such as employee names, user IDs and location information.

.

It's often useful to have a table of employee names and contact info. I have pulled this kind of table together in the past with some VBA code. But today, we don't have to. We can use Microsoft's free Power Query Add-in to reach into the Active Directory and extract data like email address, employee, department, and telephone number.

.

Today, I'll show you the steps you need to take to create a refreshable alpha roster of employees and contact info.

.

Step 1: Install Power Query

First, you will need to download and install the free Power Query Add-in.

Note that Microsoft offers Power Query for both Excel 2010 and Excel 2013 in both 32 and 64 bit platforms. Be sure to download the version that matches your version of Excel as well as the platform (32 or 64 bit) of Office your PC is running. Once installed, you'll need to activate the Add-in by following these steps:

  • Open Excel and look for a Power Query tab on the Excel Ribbon. If you see it, then the Power Query Add-In is already activated. You can skip the remaining steps.
  • Go up to the Excel Ribbon and click File->Options.
  • Choose the Add-Ins option on the left then look at the bottom of the dialog box for the Manage dropdown. Select COM Add-Ins from that dropdown and then click Go.
  • Look for Power Query for Excel in the list of available COM Add-Ins. Check the box next to each one of these options and click OK.
  • Close Excel and restart.

A successful install will result in a new Power Query tab on the Excel Ribbon.

.

.

Step 2: Understand the Active Directory Model

The Active Directory database is huge; it has dozens of tables. It can be difficult to know which field stores which piece of data – especially since the Active Directory field names are often cryptic. To help, I've overlaid the Active Directory field names on top of the Outlook properties dialog box to give you an idea which fields we need to capture. It turns out that these fields can be pulled from just three of the tables in Active Directory.


.

.

Step 3: Start a Power Query Data Extract

Click on the Power Query tab and select the From Other Sources command. There, you will find the Active Directory option.


.

Power Query will recognize your Domain and populate the server name for your Global Address book


.

Pressing the OK button will populate the Navigator pane with all the available tables in your Active Directory. You'll see tons of tables here. Ignore all tables except the user table. Double-click the user table to pull it into the Power Query window.


.

At this point, you'll have a table that looks like this in your Power Query window. This table is made up of related data sets that can be drilled into.


.

.

Step 4: Remove Unnecessary Columns

The first thing you'll want to do is delete all columns except for:

  • displayName
  • organizationalPerson
  • person
  • mailRecipient


.

.

Step 5: Select Needed Fields from organizationalPerson

Now it's time to select the fields you need.

Start by clicking the drill button next to organizationPerson.

This will open up a filter menu.

Click the (Select All Columns) option to clear all the check boxes.

.

Now go down the list of fields and place a check next to:

  • co
  • company
  • department
  • employeeType
  • givenName
  • initials
  • l
  • mail
  • physicalDeliveryOfficeName
  • postalCode
  • st
  • streetAddress
  • title

.

.

Step 6: Select Needed Fields from Person

Next, click the drill button next to Person.

This will open up a filter menu.

Click the (Select All Columns) option to clear all the check boxes.

.

Now go down the list of fields and place a check next to:

  • sn
  • telephoneNumber

.

.

Step 7: Select Needed Fields from mailRecipient

Click the drill button next to mailRecipient.

This will open up a filter menu.

Click the (Select All Columns) option to clear all the check boxes.

.

Now go down the list of fields and place a check next to:

  • mailNickName
  • msExchangeAssistantName

.

.

Step 8: Remove "Dummy" Service Accounts

Because we want this extract to contain only employees, we will want to remove any service account. That is to say, dummy accounts used f6or group email boxes which aren't tied to individual people. To remove these accounts, find the employeeType field and click the drill down arrow. This will open up a filter menu where you can uncheck the (null) value.

.

.

Step 9: Load Result to Excel

At this point, you can choose to rename and reorder the columns as you see fit.

Once you're done, click the Close & Load command on the Home tab of the Power Query window.

.

After a few moments, the entire contents of your Global Address book will be imported to a new worksheet. Best of all, you can refresh this table by simply right-clicking anywhere inside this table and choosing the Refresh option.

.

Automatically Delete Pivot Table Drilldown Sheets

February 24th, 2015 by datapig 5 comments »

I don't know about you, but my favorite thing to do is to pass off a built in Excel feature as one of my own genius report development ideas.

Me: "I made it so that you can double-click on any number inside the Pivot Table to drill into details"

Client: "ooh….nice feature Mike. You're a genius!"

Me: "Well what can I say...I give because I love."

.

Well that nifty trick of double-clicking inside a Pivot Table to drill down has a side effect; it leaves behind lots of worksheets. In most cases, we don't want to keep these sheets. In fact, they often become a nuisance, forcing us to take the time to clean them up by deleting them. This is especially a problem when you distribute pivot table reports to users who frequently drill into details. There is no guarantee they will every clean up the drilldown sheets. Although it is unlikely these sheets will cause issues, they can clutter up the workbook.

.

Today, I'll share with you a little VBA that removes the junk drill-down sheets left behind from double-clicking on a Pivot Table.

.

The basic premise of this VBA is actually very simple. When the user clicks into details, outputting a drill down sheet, the macro follows along and simply renames the output sheet so that the first 10 characters are "PivotDrill". Then before the workbook closes, the macro will find any sheet that starts with "PivotDrill" and deletes it. The implementation does get a bit tricky because you essentially have to two pieces of code. One piece goes in the Worksheet_BeforeDoubleClick event, while the other piece goes into the Workbook_BeforeClose event.

.

The first thing you'll need to do is enter this code into the Worksheet_BeforeDoubleClick event (for the sheet your Pivot Table sits in).

.

  1. Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
  2.  
  3. 'Declare you Variables
  4. Dim pt As String
  5.  
  6. 'Exit if Double-Click did not occur on a pivot table
  7.  
  8. On Error Resume Next
  9.  
  10. If IsEmpty(Target) And ActiveCell.PivotField.Name <>"" Then
  11.  
  12. Cancel = True
  13.  
  14. Exit Sub
  15.  
  16. End If
  17.  
  18. 'Set the pivot table object
  19.  
  20. pt = ActiveSheet.Range(ActiveCell.Address).PivotTable
  21.  
  22. 'If Drilldowns are Enabled, Drill down
  23.  
  24. If ActiveSheet.PivotTables(pt).EnableDrilldown Then
  25.  
  26. Selection.ShowDetail = True
  27.  
  28. ActiveSheet.Name = Replace(ActiveSheet.Name, "Sheet", "PivotDrill")
  29.  
  30. End If
  31.  
  32. End Sub

.

Next, we set up the Worksheet_BeforeClose event. As the name suggests, this code will run when the workbook closes.

.

  1. Private Sub Workbook_BeforeClose(Cancel As Boolean)
  2.  
  3. 'Declare you Variables
  4.  
  5. Dim ws As Worksheet
  6.  
  7. 'Loop through worksheets
  8.  
  9. For Each ws In ThisWorkbook.Worksheets
  10.  
  11. 'Delete any sheet that starts with PivotDrill
  12.  
  13. If Left(ws.Name, 10) = "PivotDrill" Then
  14.  
  15. Application.DisplayAlerts = False
  16.  
  17. ws.Delete
  18.  
  19. Application.DisplayAlerts = True
  20.  
  21. End If
  22.  
  23. Next ws
  24.  
  25. End Sub

.

.

At this point, each time the workbook closes, the "PivotDrill" sheets will be deleted.

No more junk sheets left over from overzealous managers clicking all over your Pivot Tables.

Text Match and Fuzzy Lookup

January 28th, 2015 by datapig 7 comments »

I would bet that every one of us has seen Customer lists from separate sources that can't be matched up because of spelling mistakes and inconsistent naming conventions and abbreviations. In fact, there are consulting firms whose entire business revolves around helping organizations reconcile their "Master Data". I can't count how many times I've seen entire projects launched to create a "gold source for master data". Yet here we are; manually matching up addresses, customer names, or people names from different data sources.

.

Well today, I'd like to introduce you to a free tool from Microsoft that can help ease your pain a little. This tool is a free Excel Add-in called Fuzzy Lookup. The Fuzzy Lookup Add-in enables you to perform similarity analysis to match textual strings that mean the same thing but are not spelled exactly the same. With this tool, you can match things like customer addresses from two separate data sources, or even find imperfect duplicates in a single dataset.

.

Understanding the Jaccard Index of Similarity

The magic behind the Fuzzy Lookup Add-in comes from the Jaccard Index of Similarity.

The Jaccard index, also known as the Jaccard similarity coefficient, was developed by Paul Jaccard as a way to document the distribution of different types of flora (yawn). Jaccard's index gave him a statistical way to measure similarities between sample sets. The gist of Jaccard's index is this:

.

You take two sample sets. You count the attributes shared by both samples (call this Z). You count the attributes observed in only one sample (call this X). You then count the attributes observed in only the other sample (call this Y). Finally, you can calculate the similarity index by using the three counts in this operation: Z / (Z+X+Y).

.

Confused? Don't worry. I have an example.

Let's say we want to compare the similarities between these two Excel nerds.

We define some attributes, then we tag each nerd with a 1 when each attribute evaluates to TRUE.

.

We can now get the count of the intersections and then perform the math.

As you can see, the similarity index is .57 or 57%.

The more similar two sample sets are, the closer Jaccard's index will be to 1.

 

In its simplest form, the same kind of analysis can be done on Textual Strings using the Jaccard Index. You can see in this example, the words hair and hare result in an index of .60.

By the way, in terms of textual matches, 60% is not a great score. In my experience, most textual matches below 70% index is suspect.

 

Understanding the Fuzzy Lookup Transformations

Ok. Now you know that the Fuzzy Lookup algorithm is based on the Jaccard Index of Similarity. But it's important to note that Fuzzy Lookup does not run on your raw data. Before running the Jaccard Index, Fuzzy Lookup applies some built-in transformation algorithms designed to improve its matching results.

 

The first thing to understand is that Fuzzy Lookup converts data using something called a Tokenizer. For example, the record {"DataPig", "Bacon Boulevard"} might be tokenized into the set, {" DataPig", "Bacon", "Boulevard"}. Each word is considered to be a token. Tokens are assigned weights based on frequency of use. High weights are applied to tokens that are uncommon. Low weights are applied to tokens that come up frequently. For example, frequent words such as "Boulevard" are given a lower weight than less frequent words such as "DataPig". Fuzzy Lookup gives you the opportunity to override the default token weights by supplying your own weights via an Excel Table.

.

Fuzzy Lookup also has algorithms and built-in dictionaries that allow for the automatic correction of spelling mistakes, word merge scenarios, string split scenarios, and string prefix corrections. All of these automatic transformation can be turned on and off by via the Fuzzy Lookup Configure dialog box. You can add custom Transformation rules by pointing the TransformationRowsetName property to your own Excel Table of rules.

Note: The sample workbook installed with the Fuzzy Lookup Add-in includes a Customization tab that shows you how to set up your own custom transformation table.

.

Using the Fuzzy Lookup Add-In

To use the Fuzzy Lookup Add-in, you'll need to first convert your datasets into Excel Tables (click anywhere inside your data and press Ctrl+T).

In this example, we have two Excel Tables (IndProfile and CustInfo). We want to match up the customer names.

.

Click the Fuzzy Lookup command button to launch the tool.

.

In the Fuzzy Lookup dialog box, do the following:

1. Select the two tables you want matched.

2. Select the columns that hold the data you want matched.

3. Click the join button to drop your two columns down to the Match Columns section.

4. Select all the columns you want included in your output. Be sure to include the FuzzyLookupSimilarity column.

5. Define how many matches you want per record. I recommend setting this to 1.

6. Define the Similarity Threshold. This is the Jaccard Index number. As mentioned before, in my experience, any match with an index of less than .70 is typically not that accurate. I usually set this to .60 just to see the results.

.

Once you have defined your selections, be sure to place your cursor in a new blank worksheet. Be warned, Fuzzy Lookup WILL OVERWRITE DATA.

Press the Go button to output the results.

You will see a new table that includes the Output Columns you selected (in step 4 above). The Similarity column will show you the Jaccard index for each match. Any row with no matches will get an index of 0.

.

It's important to review any matches with a similarity index less than 1.

Don't take for granted that Fuzzy Lookup will get everything right.

For example, you can see in Row 7 below, Fuzzy Lookup gave us a .86 index score on the match for NVISION Inc. and WORLD VISION INC. An index score of .86 is typically a good match. However, we can see that in this case, the two names are not the same.

.

If you want to perform a Fuzzy search for duplicates within the same table, simply reference only that table when setting up the search parameters. For instance, in this example we are looking for Fuzzy duplicates in the Org Name column of the IndProfile table.

.

I guess that's about it.

Fuzzy Lookup is one of those tools you really need to experiment with to get a feel for how helpful it can be.

Although it's not perfect, it has saved me hours of scripting and manual matching.

So before you burn out your eyeballs staring at two data sets again, take some time to play with Fuzzy Lookup. It may become your new best friend.

Business Intelligence vs Data Science

January 26th, 2015 by datapig No comments »

Hello all! Belated Happy New Year! I've been busy working on a new book and a few other projects. It's time to get back to blogging.

.

I recently saw this article explaining the difference between BI Analysts and Data Scientists.

On the surface, this article highlights the operational differences between BI Analysts and Data Scientists. But at the core, it's a view into how the BI industry has changed over the last few years. Here's a graphic from that article.

.

As I look at this graphic, I personally recognize many of the tasks outlined here.

I can remember a time when I felt the needs of my customers change from simple trending dashboards to more predictive analytics. In the last few years, many Excel analysts have been asked to do more Data Science'y kind of stuff. Stuff like:

  • Analyze the correlation between key demographic data and transactional metrics such as number of purchases, purchase amounts, and frequency of visits
  • Score customers based on customer satisfaction levels, likelihood to recommend, and capacity utilization
  • Analyze association rules; for example, if Customer A buys Product A, we can predict (with 90% confidence level) that there is an 85% likelihood that this customer will buy Product B.
  • Predict outcomes based on certain independent variables found through regression analysis; for example, a machine's likelihood to break down given a number of interrelated variables.

.

In the past, business intelligence primarily consisted of dashboards showing historic data on an agreed upon set of key performance metrics. Organizations today are demanding more dynamic predictive analysis, the ability to iteratively perform data discovery, and the freedom to take the hard left and right turns on data presentation. These managers often turn to Excel analysts to provide the needed analytics and visualization tools.

.

Over the new few months, I'd like to focus my blog posts here on some of the more advanced Data Science'y kind of stuff that we'll all be asked to do.

.

This leads me to the Excel Power BI Boot Camp I'm hosting with Bill Jelen (Mr. Excel) in Dallas from May 20th – May 22nd.

.

This 3-day event is aimed squarely at Excel analysts who find it increasingly necessary to:

  • Analyze large amounts of data and report those results in a meaningful way
  • Get better visibility into data with new Data Mining tools
  • Perform advanced Clustering and Market Basket analytics
  • Add interactive controls to your dashboards without VBA
  • Automate repetitive data cleansing and transformation tasks
  • Create eye-catching visualizations and Dashboards
  • Create map-based dashboards
  • Access external data sources to expand your message

.

Bill and I only have 25 seats left in the class. If you sign up before March 1st, you'll get a $100 discount off registration.

.

I'll be back tomorrow with an exciting new series of posts on the SQL Server Data Mining Add-in.

See you then!

 

My Take on Linkedin

December 23rd, 2014 by datapig 6 comments »

The Case Against Maths

December 19th, 2014 by datapig 11 comments »

I didn't want to finish out the last full work week of the year without a blog entry.

So let me bring up something that has been bothering me for quite some time.

Take a look at these two statements, then point to the one that looks right to you.

.

If you're American, you probably chose the statement on the right. If you're English, I bet you chose the statement on the left. In researching articles for some of my statistics posts, I often encounter variations on how the abbreviated word for mathematics is used. American authors use the word math as in – "Do the Math". The English use the word maths as in – "Do the Maths".

.

As a proud overweight American, I find it odd to say "Do the Maths".

What's bothers me the most is the seemingly arbitrary plural designation given to the abbreviation for mathematics.

.

Here is the way I see it. Mathematics is not a pluralized word for Mathematic. You don't have 1 Mathematic and 5 Mathematics. You simply have Mathematics. In fact, the word Mathematics (the overarching name for the study of all things mathematical) is used solely in singular verb forms.

You would say "Mathematics is useful in business".

You would not say "Mathematics are useful in business".

.

Even the English agree.

They say "Maths is fun".

No one says "Maths are fun"

.

That's right, even though there is an S in Maths, it's not treated as a plural.

It's obvious that the English use the abbreviated term "maths" (with an S at the end) to account for the s in Mathematics. Ok, I see that. So the question now is why is there an S at the end of Mathematics in the first place? After all, we don't say Literatures or Musics.

.

Well, according to Wikipedia, The most probable reason is that the origin of the word Mathematics is the Greek word mathematika. In Greek, when a word ends in the letter A, that typically denotes that the word is plural. So when mathematika was being translated to English, the letter S was added to the end.

,

In short, the term Maths is a singular word that looks plural.

And that, my friends, gives me the heebee-jeebees.

.

Ahhh…it feels good to get that off my chest…finally.

Well, have a great weekend – and remember:

Office Update Breaks ActiveX Controls

December 11th, 2014 by datapig 7 comments »

I checked my email today and saw a bevy of emails from friends and clients claiming that their workbooks broke over-night. When one person contacts me saying something is broken, my general attitude is "they're on crack". But when I get a flood of emails, that's a horse of another color.

.

Apparently, a recent Office Update introduced security "fixes" that break any workbook or code involving ActiveX Controls. I have to admit, I didn't even notice it. I guess my workbooks are sorely lacking in ActiveX Controls.

.

The ever amazing Jan Karel Pieterse shows us how to fix the issue at Daily Dose of Excel.

.

His fix addresses the set of .exd files that prevent the addition or use of ActiveX Controls. These .exd files are added to your machine through the update.

The answer is to delete these .exd files (which are located under C:\users\[your name]\App Data\local\temp).

Here are the ones I found on my machine. I deleted the nasty buggers and my Excel is back to normal.

.

I bet it's a great day to be on the Microsoft Help Desk today.

.

Thanks Jan Karel for the fix!