Cut the Size of Your Pivot Table Workbooks in Half

In honor of the upcoming 4th of July weekend, I’ve declared this week to be Pivot Table week. What do Pivot Tables have to do with Independence day? I’ll let you know when I find out.

In the meantime, are you ‘that guy’ who keeps clogging everyone’s email with 20 megabyte files.  I want to share a simple trick to cut the size of your Pivot Table workbooks in half.  

First a few words about the Pivot Cache:  When you initiate the creation of a pivot table report, Excel takes a snapshot of your dataset and stores it in a pivot Cache. A pivot Cache is nothing more than a special memory subsystem in which your data source is duplicated for quick access. That is to say, Excel literally makes a copy of your data, and then stores it in a cache that is attached to your workbook.

Each pivot table report you create from a separate data source will create its own pivot Cache that will increase your file size. The increase in file size depends on the size of the original data source that is being duplicated to create the pivot Cache.

Of course, the benefit you get from a pivot cache is optimization. Any changes you make to the pivot table report, such as rearranging fields, adding new fields, or hiding items, are made rapidly and with minimal overhead.

The down side of the pivot cache is that is basically doubles the size of your workbook. So every time you make a new pivot table from scratch, you essentially add to the file size of your workbook.

 

How to limit the size of your pivot table workbook:  
 
1. Copy and Paste instead of creating from scratch.  Sometimes you need to make multiple pivot tables from the same data source. Instead of making each pivot table from scratch (which adds to the file’s size) copy and paste the pivot table. When you copy an existing pivot table and paste it, you essentially create a pivot table that reads from the same pivot cache. That is, you create a new pivot table without creating another memory container to add to the file size.

Note that pivot tables which share the same pivot cache will also share calculated fields, calculated items, and groupings.

 
2. Delete your source data tab.
I often see workbooks that have a pivot table tab and a ‘Raw Data’ tab. If your workbooks have both, you’re wasting space, essentially distributing two copies of the same data.

You can delete your source data and your pivot table will function just fine.  After deleting the source data, saving will shrink the file. 

The only functionality you will lose is the ability to “Refresh” the pivot data, as the source data is not there. But DataPig, how do I get back to the raw data if I delete it?

Simple. Just double click the intersection of the row and column totals. This will create a new sheet with all the data currently contained inside the pivot cache.

 

3. Use a Duplicate Cache Finder.
For those of you who already have a workbook filled with pivot tables,  you can use a Duplicate Cache Finder to find and fix all duplicate caches.  Debra Dalgleish has a nifty workbook that will do just that:  PivotCacheFix.zip

On this 4th of July weekend, you should check out Debra’s site www.contextures.com. She’s Canadian, but she’s smart like an American.

47 thoughts on “Cut the Size of Your Pivot Table Workbooks in Half

  1. E

    To reduce the file size I uncheck ‘Save data with table layout’ in PivotTable Options. If I want the underlying data to be available the next time I open the spreadsheet I click the ‘Refresh on open’ box in PivotTable Options.

  2. jeffrey weir

    Or you can reduce the size of your cache by orders of magnitude by modifying your ‘get data’ query so that it only actually gets the data you need for the specific task.

    For instance, if you’re sucking every single line item from every single order for every single customer from a database to Excel, only to aggregate the data up to monthy totals across major product groups, then you’ve got MAJOR redundancy in your pivot cache.

    You should filter and aggregate at the database end wherever possible. An exception might be when you’re examining data with fresh eyes, and you’re just pulling in a whole heap of different fields to “see what you can see”. This is what I’ve been doing at the place I’m currently contracted to, because I’m trying to find the best metrics and the best patterns from scratch. But even then you should go back later and filter out the ‘boring bits’ and just keep the stuff you actually need in the cache.

  3. Tracy

    I’m a bit late reading this – but love the tip for deleting the source sheet. didn’t realize we could do that. thanks!

  4. stotsky

    I’ve got 6 Sheets in the Excel file and each of them has pivot table. The sourse of data for each of these pivot tables is the same table in MC Access. Here comes the qestion, how can I avoid duplication??
    Thank You.
    PS Sorry for my poor englis, I’m from Russia

  5. AndyA

    Hi there
    I was directed to your site thru the Excel G forum, i have a a workbook with two pivot tables – however I cannot find th PT1 ( i know it exists as the wizard for PT2 says Pivot table to in thenoptions dialogue box. I tried the Pivot Cache Fix but it did not work for me – is there anyother way of deleting a pivot table and its cache?

  6. datapig

    Andy: The PivotTable name in the Options dialog box is not a true index number. You very well could have only one pivot table named PivotTable2. In other words, the name field has nothing to do with the number of PivotTables that actually exist.

  7. Doran

    Is there a way to clear the pivot table cache ?

    I would be interested in clear the pivot table cache before a pivot table refresh, that way, I would be assured that the data received would be in datasource order, especially if using ‘none’ as your pivot field sort order.

    I am trying to avoid using the sort option, as it sometimes interferes with the grouping option (causes it to sort in correctly).

    Any suggestions on clearing the pivot table cache?

    Thanks in advance.

  8. deadeye

    You may want to reconsider deleting the data tab after creating a pivot table in Excel 2003 if you have cells with more than 255 characters.

    Excel will truncate the contents of the cached version to 255 characters. I don’t know if this is the case with newer versions of Excel.

  9. Johnathon Zircher

    They notice changes in the market trends and recreate their internet marketing tools as well as website content periodically. They put emphasis on the core areas of their business that bring the most benefits. A lot many customers come in to buy when they know that they get the best value for their money. Internet business marketing is also about establishing individual contacts with your prospects as well as existing customers.

  10. Jenny

    Hi, thanks for the video tutorial.

    Is there an add-in for excel 2007? If so, where can I download this from?

  11. jeff weir

    Hi Mike. Regarding your comment “The increase in file size depends on the size of the original data source that is being duplicated to create the pivot Cache.” do you know exactly how?

    Recently I’ve created a couple of pivot tables and noticed that their file sizes are much LESS than a spreadsheet containing just the raw data. For instance, if I create 3 complete rows of =randbetween(0,10000) in excel 2007, then here’s the resulting file size from those 3.14 million cells:
    raw data only 26.4 MB
    Pivot table only 12.6 MB
    Pivot table and data 39.1 MB

    This really surprises me. Do you know why this might be?

  12. datapig

    Jeff: The difference is formatting. In a pivot cache, only the raw text needs to be stored. In a spreadsheet, you have formatting which can take up space.

  13. jeff weir

    Thanks That’s a heck of a lot of unwanted formatting. I was naughty and cross-posted a question about this over at http://www.dailydoseofexcel.com/archives/2004/11/26/creating-a-simple-pivot-table/#comment-48929 (not because you’re not the font of all excel knowledge, mind! Dick thought the same thing, although I’ve just commented that there’s still some stuff puzzling about all this to me…check this out:
    On my home pc, using excel 2007 if I create 3 complete rows of =rand() then convert them to values, then:
    1: when I save the workbook, I get a file size of 50 MB.
    2. If I then create a pivot table on a second sheet from this data, but don’t explode it (i.e. don’t put anything in the data or column/row areas) I get 83.6 MB for pivot and data. So it seems the pivot alone adds 34 MB.
    3. If I then explode the pivot table (i.e. put all three columns in the row area) so that the pivot resembles the original data source, I get 163 MB for pivot and data, which is quite some increase. I’d expect to actually see 50 MB (raw data) plus 34 MB (Pivot Cache) plus perhaps another 50 MB MAX for the extra resource required to store the cell formatting info for the range used by the exploded pivot. So the most I’d expect this to take up would be 134 MB, but it runs to 163 MB.
    4. If I delete the raw data on Sheet 1, the file size drops 50 MB to 113 MB for the exploded pivot only, as you’d expect.
    5. If I drag the three columns out of the pivot, then I get 52 MB for the unexploded pivot – which is almost the same file size as for the data alone. (Note I deleted all cells below and to the right of the pivot in order to try to make sure excel’s used range was limited to the pivot only. Not sure that this worked, given this high file size) So my pivot only seems to have grown from a 34 MB footprint when created to 52 MB just because I exploded it then contracted it.
    6. If I reopen the file from (2), delete the data, and save the unexploded pivot, I get a file size of 34 MB…which was what I was expecting in step 5.

    It seems very odd that if you expand a pivot table out, then contract it, the file size doesn’t decrease accordingly.

    Maybe this means that if you’re playing around with a pivot that at some time you’ve exploded, then you might want to recreate your final product from scratch to cut down on some file size.

  14. General Ledger

    Excellent Article! I live and breathe pivot tables (PT) and did not understand why they made the files so big. Knowing there is a cache explains why you need to Refresh your PT when you change the source data.

    You suggested copying and pasting a PT, rather than creating a new PT, in order to keep the size down. I assume using the Excel Move or Copy sheet option (displayed after right clicking on the sheet tab) does exactly the same.

    Excel also provides a tip on saving space when creating more than one PT in the same file. If you start the PT wizard and select the same source data range as was used for another PT, Excel displays a message. It says you can save space by basing the new PT on the old PT data. You are offered a Yes or No option to do so.

    Please bring your road show to New Jersey so I can thank you in person for your outstanding contributions to the average guy.

  15. Why

    Will having multiple pivot tables point to the same pivot cache when connecting to an OLAP datasouce be a problem?

  16. SteveT

    Never thought you could delete the data. I will have to try it out on a large (30mb) pivot table file i have.

  17. Susan

    the delete source data doesn’t work. If I delete the tab containing this data, the file still stays the same size because it’s linking back to this source.

  18. Pablo

    So what happens if I :
    1. delete the datasource tab and then I need to update the data?
    this is beacuse I have a dynamic database that I add data to on a daily basis.
    2. delete the datasorce tab AND clean out & refresh the pivot cache using VB code everytime I make a database query fromm my dashboard output.
    In the event of “2.” does the pivot cache still contain all my data, pivot items, and calculated fields?

  19. Elle

    She’s a Canadian but smart like an American?????? Wow. You had my undivided interest all the way up to that ridiculous nationalist comment.

  20. jeff Weir

    You’re right, Elle. Good-natured ribbing has no place in an Excel blog.

    Oh well…on the upside, at least the good-natured ribbing didn’t take place until right at the end of this excellent post, so you might even have learnt something up until that final point where Mike lost your undivided attention.

  21. Robin

    Hi, my excel work book is now 163 meg big…. there are 21 pivots …. some bigger than others, there are quite a few data sheets as well ….. how on earth can i educe the size of the work book??? i have tried

    @E says:
    July 2, 2009 at 9:32 pm
    To reduce the file size I uncheck ‘Save data with table layout’ in PivotTable Options. If I want the underlying data to be available the next time I open the spreadsheet I click the ‘Refresh on open’ box in PivotTable Options.

    But it made very little difference to the size ….. I now have the problem that my laptop is battling to function with this size work sheet and if I open any other work sheets etc it crashes…. all advice welcome – PS dumb it down for me please – I only know the basics.

  22. jeff Weir

    @Robin –
    1. what type of file extension is it at the moment, and what version of Excel do you have?
    2. Do your pivots draw solely on data from within the workbook, or do they get their data from an external database?
    3. See my comment of July 4 2009 above. Best way to reduce your workbook size is to throw out data that you don’t desperately need.

    Once I hear back from you regarding the above, I can probably give you some pointers

  23. lazyvba

    Hello,

    Having the issue on regular basis of pivot table that I can’t send by email due to its size, or because all values can’t be shared, I wrote a macro that recreate a lighter version of the pivot, by recreating an aggregated source only containg visible fields and unfiltered values. That could help if you need to share your pivot, or if the performance is affected by the size:

    http://lazyvba.blogspot.co.uk/2012/08/little-big-pivot.html

    Try on a copy of your workbook, and let me know if it is working for you.

    Have a good day

  24. shirley

    What’s up to every body, it’s my first pay a quick visit of this web site;
    this website consists of remarkable and in fact fine stuff
    designed for readers.

  25. Atul

    Just copy and paste the coding in macro and see the magic it worked for me

    Sub PTReduceSize()
    Dim wks As Worksheet
    Dim PT As PivotTable

    For Each wks In ActiveWorkbook.Worksheets
    For Each PT In wks.PivotTables
    PT.RefreshTable
    PT.CacheIndex = 1
    PT.SaveData = False
    Next
    Next
    End Sub

  26. Casey

    Hi All
    I have a P&L constructed in a form of pivot table with a # of calculated fields and calculated items. When dragging the fields around in the pivot – usually 2 — it takes about 3-5 min to complete refreshing (because it seems to calculate everything, not jus the fields I am filtered on). Adding a 3rd field can take 20 + mins. How can I speed this process up? turning off option DEFER LAYOUT UPDATE in table options doesn’t work here….when I make the update with it off it still takes the same time. The source data has about 10,000 rows only..If anybody has an advice here I would really appreciate it!

  27. Bob

    Hello, thank you for the tips. I have many pivot tables. One at a time, I would like to save their settings/filters (not the data), so that I can recreate each of them on the fly, choosing from a drop down say. How can I do this in VBA? thanks!

  28. Saul Espinoza

    I have decided to keep my data source (beyond 150 rows and 25 columns – 25MB) in a separate Excel file and use Ms query to connect my new pivot table from a brand new Excel workbook externally to the data source and uncheckbox Save source data with file. From here I copy the same pivot table to different sheets for caculations and I use the columns and rows needed for my dashboard-report and I keep the new excel file about 500KB, and when new data is added to the data source excel file then i refresh the existing PT in my new workbook.

  29. Saul Espinoza

    Any one please opine on whether my approach is okay? Will any conflict surface soon?

  30. datapig Post author

    Saul: Very smart. Bringing the data directly into a pivot table greatly reduces the size of your workbook, as Excel doesn’t have to allocate space for the data or the formatting that goes with the data.

    This way, you have a very small workbook to distribute, while your working file can be as big as it needs to be.

    Great idea!

  31. Doug Glancy

    The part about one pivot cache each separate creation doesn’t sound right to me. I just created a dead simple source (three rows) in Excel 2010. Then I did Insert Pivot Table twice. Entering ? Thisworkbook.PivotCaches.Count in the Immediate Window yields 1.

  32. datapig Post author

    Doug: I believe Microsoft fixed this in Excel 2010. Excel no longer creates a new cache for every pivot table as long as the data you’re pointing to is used in an existing pivot.

  33. HowHardCanItBe

    datapig,
    Thanks for the great article, but you’ve ruined my social life. I can’t stop binge reading this site. This is worse than when I started watching Breaking Bad.

  34. Carlitos Barboza

    hehe Saul Espinoza, glad we met myfriend. If you get a notification of this reply, please send me an email ! take care man.

    Mike, you are the man! Greetings from Peru!

  35. Ian9090

    I managed to reduce a data+pivot table s/sheet recently from 30mb to about 2mb simply by first importing the data into access (this creates a large dbase you just keep somewhere on your drive) then linking a pivot to access – I was astonished at the compression – good tip if you want to really save space when sending pivots via email!

  36. Pingback: Pivot Cache in Excel - What Is It and How to Best Use It

  37. tessy

    Is there a way we can reduce the size of a excel data by filtering in pivot and further on the filtering only should show the filtered data .

Leave a Reply

Your email address will not be published. Required fields are marked *