Using Power Query to Combine Data from Multiple Excel Files into One Table

A common task/nightmare that most Excel Analysts have faced at one time or another is to combine data from multiple Excel files into a single table. Without a solid knowledge of Excel VBA programming, this task typically entails opening each file, copying the data, and then pasting the data into single workbook.

.

Today, I want to show you a relatively easy way to do this using a Power Query function. The basis for this tip comes from an article Hilmar Buchta posted about consolidating Hadoop files.

.

Before you start, you’ll need to have a few things:

  1. You need to have the Power Query Add-in installed
  2. You’ll need your multiple Excel files in a single directory
  3. Each file must have a similarly-named tab containing the needed data (in the same table structure)

.

The secret to this trick is Power Query’s formula language (also known as “M” language). Each action you take when working with Power Query results in line of code which is written into a “query step”. Query steps are embedded lines of M code that allow your actions to be repeated each time you refresh your Power Query data. You can extend the utility of that embedded code with your own custom function.

.

The idea is relatively simple. Build a starter query via the Power Query Editor, and then wrap the resulting M code in your own function.

.

In this example scenario, I’ll walk through the steps for combining the data from a set of files in the C:\Temp directory. Each file contains a table (with the same table structure) on worksheet named MySheet.

.

Step 1: Build a query that connects to just one of the Excel files

On the Power Query tab, select the From Excel connection option.

.

Browse to the directory which contains all the Excel files and choose just one of them.

After a few seconds, the Navigation pane will activate. In the Navigator pane, choose the sheet that holds the data which needs to be consolidated and then click the Edit button to open the Query Editor.

.

.

Step 2: Apply any needed transformation actions

Use the Query Editor to apply some basic transformation actions. For example, you’ll probably want to make sure that the first row in the data is used as the header row. To do so, you can right-click on any column and select the “Use First Row as Headers” option.

.

Apply any other needed actions. You can see your actions in the Query Settings pane.

In this example, you can see that First Row was promoted to column headers and a few unneeded columns were removed.

.

.

Step 3: Edit the embedded code to create your function

Once you’ve applied any needed transformations, open the Advanced Editor window by clicking the View tab and selecting the Advanced Editor command.

.

As you can see, while you leisurely build your query in the Query Editor, Power Query diligently creates the bulk of the code for you.

Note in the portion of the code highlighted in gray (for illustration) Power Query hard-coded the file path and the file name for the Excel file originally selected. The idea is to wrap this starter code in a custom function which will pass a dynamic file path and file name.

.

Wrap the starter code with your function tags, specifying a FilePath parameter and a FileName parameter. Replace the hard-coded file path and file name with your dynamic parameters.

.

When completed, close the Advanced Editor.

In the Query Settings pane, change name of the query in the Name input box. The goal here is to give your function a reasonably descriptive name (in this scenario, fGetMyFiles).

.

Click the Home tab of the Query Editor and select the Apply & Close button.

.

Power Query creates a fairly useless looking table.

Unfortunately, there is no way to create and use a custom function without creating an associated table. So as worthless as the table seems, you can’t delete it.

At this point, the custom function is ready to be used.

.

.

Step 4: Use your newly created function to combine all Excel files

Now, we’ll leverage the function in a new query.

Click the Power Query tab in Excel and select the From Folder connection option.

.

In the From Folder dialog box, provide Power Query with the file path of the target directory.

.

The Query Editor window will activate to show you a table containing a record for each file in the chosen directory. The fields we’re particularly interested in are the Folder Path and Name fields. These will provide our function with the needed FilePath and FileName parameters.

.

Right-click on any column header and select the Insert Custom Column action. In the Insert Custom Column dialog box, invoke your function and pass the Folder Path and Name fields as parameters separated by a commas.

.

Once you confirm your changes, Power Query will trigger the function for each row in the data table.

The function itself will grab the data from each file and return a table array.

.

Click the Custom column header to see a list of fields included in each table array. Here, you can choose which fields in the table array to show, click the Expand radio button, and then click the OK button.

.

With each table array expanded, Power Query exposes the columns pulled from each Excel file and adds the detailed records to the data preview.

.

At this point, you can click the Apply & Close command to output the final combined table.

.
As long as the Excel files are in the same location with the same naming conventions, you can right-click the output and click the Refresh option to have Power Query automatically re-combine any fresh data. This means you can run through this setup once, then simply refresh the query to re-combine your Excel files.
.
One small annoyance is that Power Query functions apply only to the workbook in which they reside. If you start a new workbook, you’ll need to recreate your function in that new workbook.
.
.
Now, this may seem like a lot of steps, but think about it.

For all the steps required to accomplish this task, very little effort was actually expended on writing the code for the function. Power Query wrote the code for the core functionality, and we simply wrapped that code into a function.

.

The take-away here is that you don’t have to be an expert on Power Query’s M language to pull together effective and useful custom functions. You can leverage the Query Editor to create some base code, then just customize from there.

.

.

Like this tip? Well guess what:

Bill Jelen (Mr. Excel) and I will be putting on a new Business Intelligence Boot Camp April 22-24 in Bentonville Arkansas, where 75 lucky registrants will be getting tips and trick like the one you just learned.

This 3-day event is aimed squarely at business analysts and managers who find it increasingly necessary to become more efficient at working with the new Microsoft BI tools like Power Pivot and Power Query. We’ll guide you through the mysterious world of Microsoft’s new BI Tools from a business analyst’s point of view, introducing you to the rich set of tools and reporting capabilities that can be leveraged to more effectively synthesize data into Business Intelligence Dashboards.

Click here to get more details

48 thoughts on “Using Power Query to Combine Data from Multiple Excel Files into One Table

  1. AlexJ

    I REALLY like what you have here. I’ve been trying to figure out an approach like this with PQ off and on for a while.

    Now:
    Is there a way to use the natural parameter names in the resulting query instead of “Custom.ParameterName”? (I know that the name “Custom” can be changed).

    Since I already have a number of files which consume allogmerated (!) data like this (UNION queries), I would like to switch to the PQ approach without modifying all the parameter names.

  2. AlexJ

    Mike,
    I meant that the resulting column headers this method provides read as “Custom.Region.”, “Custom.Market.”,etc. I wanted to get “Market”, “Region”, etc, without necessarily editing every column name in PowerQuery so that my existing pivot tables can consume the PQ data without significant modification.

    Hope this isn’t a dumb question. Perhaps I am under powered in PQ techniques.

  3. Joe

    Thank you for this tip Mr Mike.

    I have also been trying figure out a way to do this with PowerPivot for a while now.

    Will definitely be giving this a try

  4. datapig

    AlexJ:

    Oh I see. That’s pretty easy. After you output the results, go back in and Edit the query for the output.

    In the Advanced editor, find the line of code that expands the table array. It will look something like this:

    #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"State", "Job", "JobDescription", "Area", "Region"}, {"Custom.State", "Custom.Job", "Custom.JobDescription", "Custom.Area", "Custom.Region"})

    Notice there are two sets of column names, the original names and the funky output names starting with Custoem.

    All you have to do is edit this line to change the second set of column names to the ones you want.
    In this example, I simply removed the silly Custom. prefix from my final headers.

    #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"State", "Job", "JobDescription", "Area", "Region"}, {"State", "Job", "JobDescription", "Area", "Region"})

  5. Eltjo

    Very nice! I do struggle with this part: “Wrap the starter code with your function tags, specifying a FilePath parameter and a FileName parameter. Replace the hard-coded file path and file name with your dynamic parameters.” How should I do that? When I type exactly as written, I end up with a “Invoke Function” option. I can then enter the folder and file, but it cannot find the file.

    Thank you,

  6. Eltjo

    Just noticed that the Invoke Function does exactly what it is supposed to do. Feels like magic!

  7. Saul Espinoza

    Mike, thx for this one
    I tried this one from multiple CSV files converted from excel, but I am still trying to find a way to remove the repeated headers that populated to the entire table. I guess since my new table contains more than 100000 rows if cannot reach to those text values (headers).

    https://www.youtube.com/watch?v=QZXB9dB8SlA

  8. Saul Espinoza

    You rule Mike!!!, thx a bunch again, I was finally able to figure out your “how to wise guidance”, BTW I listened to you thru Chandoo’s podcast!!! Great Session. Cheers and greetings from Lima – Peru

  9. rajeev

    This is very interesting tool, however what if someone need to add data of two table side-by side instead of one below other. I am seeking this as my data has one table with primary key and half columns in other file without primary key

  10. stu4stew

    This is exactly what I want to do but I have hit a snag I am struggling at step 3 where I have to input my own FilePath&FileName. Ex. my file is C:\File1 and the excel documents are located in that file are Doc1.xlsx and Doc2.xlsx what would I insert instead of the grey box?

  11. Ghazal

    I get to Step 4, but run into a problem at the Insert Custom Column. I get an error message:

    Expression.Error: Cannot convert a value of type Table to type Function.
    Details:
    Value=Table
    Type=Type

    Any help will be much appreciated

  12. James

    Please expand further with an example of “wrapping the starter code with dynamic parameters”. What do I put in the gray box? I am brand new to Power Query. What goes where the hard coded file name was to make the command dynamic? I have a long list of excel files from which I want to lift data and combine on a single summary table. All files reside on a shared drive.

    Thanks so much…great article. Hope to get past Step 3!

  13. rajeev

    This helped me with my work at university for long, but since few weeks all the files I created with merge options never refresh correctly. they keep on consuming memory while refresh and ultimately crashes. Any idea what might be going wrong. or is this trick broken with new updated power query.?

  14. Lisa

    I’m getting that invoke function as well. Instead of creating the dummy table that you have, I’m seeing nothing in the workbook and the query window saying “Load is Disabled”. Any ideas?

  15. Zachary Goldman

    Several comments have asked about the grey box. In Step 2, you take a single sample file and carry out the steps you need to in order to pull in the data from that file and clean and structure the data however you want. (This same process is what will then be applied to all of the files).

    In Step 3, you are turning this procedure that you had used for one specific file into a general function that can be used to import/clean data from any file later on. Initially, the value that had been in the grey box was the path to the single example file you had written the procedure on. When turning this into a general formula, the grey box gets changed to variables instead of a constant path.

    The header of the function “let GetMyFiles = (FilePath, FileName) =>” specifies that when the GetMyFiles function is used, two variables will need to be passed to it– the FilePath and FileName for any file you wish to apply this function to. The grey box shows where these variables are actually used in the body of the function itself.

    You could directly call this function with any FilePath and FileName; GetMyFiles(“C:\”,”File1″).

    Step 4 shows you how to generate a list of files in a particular folder and pass EACH of these files to this function in order to pull in the data from each file and process using the function you created and then aggregate all of the resulting tables into a single big table.

  16. Leslie

    Is it possible that this wouldn’t work if you’re trying to pull the files off a shared network? I’m trying to set this up and it seems to work great when the files are saved to my desktop, but when I try to reference a location on a network the file path won’t work.

  17. Dave

    For “Load is Disabled” error, right click the query in the Queries list and select Load To…

    If you set for current worksheet or other option, this error will clear.

  18. kurt d

    I’m getting a ‘out of memory error’ all the time..even when importing just 2 small excel files… using the latest powerquery version in Excel 2013 32 bit

  19. micRey2015

    Hi,

    I want to know if there is some posibility to work with relative path (instead of Absolute path) in Power Pivot and Power Query with Excel 2010 and Excel 2013

    Thanks in advance

  20. Laura

    This is awesome! Thank you very much.

    I did try to modify it to read in sheets, but didn’t make it very far. However, this will work, as I end up downloading a new spreadsheet from the website, anyway.

    While playing around, it looked like there might be some other options now available in Excel 2013, but I didn’t really have time to play with them (yet!)

  21. Harvey French

    That really worth knowing. However, sometimes I think using UNIX is so much more elegant…pretty much all of the above can be achieved using

    cat FolderName\*.csv > MynewFile.csv

    or similar (I’ve not used UNIX for a while).

    The use MyNewFile.csv in excel…

    Just saying.

  22. Søren Damgaard Jensen

    Hi,

    Is there anyway to make a similar approach to loop Excel sources in the new Power BI Desktop – I have created the function but kinda gets stuck there.

  23. Matt Gaskins

    Thanks for sharing! Got this to work in current version of Power Query; only a couple button names had changed. This saves me SO much time!

  24. David

    I loves this! I’m working on a tender project with over 300 suppliers, this will save me 2-3 days. 🙂

  25. Pingback: Consolidate Multiple Excel Files in SharePoint using PowerQuery | ExceletorbiExceletorbi

  26. Valeri

    When I go through Step 4 I get an Error in the column, saying that:

    An error occurred in the ‘fGetMyFiles’ query. Expression.Error: 2 arguments were passed to a function which expects 1.
    Details:
    Pattern=
    Arguments=List

    I cannot understand how I can get around this error. Any help would be appreciated.

  27. Valeri

    And when I input only one argument, I get this:

    Expression.Error: 1 arguments were passed to a function which expects 2.

    I am getting really desperate here….

  28. pradeep

    Hi,

    A quick help needed:

    I have One workbook and have 5 sheets of data.

    Each Sheet has data in a different RANGE. Eg: Sheet 1: Data is in D4 till L22 – all cells have data. Sheet2: Data is in B1 till N46 and all cells have data. Sheet3: P2 till H48 and all cells have data.

    How do i use Power query Or Power pivot to collate data into one view?

    TIA

  29. LT

    I have an interesting dilemma after applying this method to my files. The files I am using have row headers spread across a few rows. I ended up creating two functions: first to concatenate the header rows and second to pull the data for the columns. Then I append the queries and use the first row as column headers. This works well if the file is have the same number of columns and they do not change. But in reality, as you may have guessed, the columns in my files are all over the place.

    Is it possible to merge the two functions into one function?

    LT

  30. Omer Eški?

    Great help, but I need some more help because I have lots of similar files but with few different WorkSheet names, is it possible to put somehow OR operator in Source statement so that I could use few different WorkSheet names?

    Thanks a lot

  31. jaroszrek

    Hi Guys,

    heads-up – in below code part:

    “let
    Source = Excel.Workbook(File.Contents(FolderPath&”\”&Name))”

    had to change & to &”\”&

  32. Pingback: Consolidate Multiple Excel Files in SharePoint using PowerQuery - Excelerator BI

  33. Leah

    I am getting Token Comma expected error when using the Advance Editor and changing final part to

    in GetMyFiles

    When I click on the error it says the word ‘in’ is the problem.

    Help!

Leave a Reply

Your email address will not be published. Required fields are marked *