Extracting raw data from a PowerPivot model using Python

前端 未结 3 1062
名媛妹妹
名媛妹妹 2021-01-31 20:53

What seemed like a trivial task turned into a real nightmare when I had to read in some data from a PowerPivot model using Python. I believe I\'ve researched this very well over

3条回答
  •  情深已故
    2021-01-31 21:19

    I got in touch with Tom Gleeson (aka Gobán Saor) who was kind enough to let me post his emails here. There are some interesting nuggets in them, so hopefully others will also find them useful.

    Email #1

    When you say Python, you mean running Python.NET as a standalone exe? If that’s the case, you’re out of luck with Excel PP models (different story for Power BI desktop though). I’ve accessed PP models (2010+) successfully from both VBA, and from Python.NET (via AMO) using similar code to that in your SO question. The difference being (in both VBA & .NET version) is that my code is running in-process within Excel using Excel’s various add-in technologies. (Likely Tableau is also running as an add-in or has embedded Excel within itself enabling similar behaviour). DAX Studio (a useful C# code base to learn the how-tos of PP access) runs both as an Excel add-in and as a standalone EXE, but only as an add-in can it access Excel based PP models.

    Email #2

    You might find the process of using Python.NET for this somewhat challenging. You would need to embed a Python engine using C#/VB.NET Excel add-in code. I’ve used Excel-DNA (a fantastic open source project) rather than MS’s highly cumbersome "official" method for developing such .NET addins in the past, but I mainly stick to VBA where at all possible.

    Using VBA you’ll not be able to access the .NET-only AMO (so no ability to create calculated columns on the fly), but by loading the resulting dataset into an ADO recordset you should be able to output to a worksheet OR to a corporate-database/MS Access OR to a flat-file/CSV etc.

    Unlike the 1M worksheet limit, for a flat-file or database output memory (RAM) will be the limiting factor, but, assuming you’re using 64bit Excel and have enough memory to hold the compacted model and the workspace for the largest of the model’s tables in un-compacted form (i.e. a row based rather than column based format that’ll result from a DAX Query), multiplied by 2ish (one instance within PP workspace the other within VBA’s ADO workspace) you should be okay.

    Having said that, I’ve never attempted extracting a very large dataset, and using models as a dataset exchange medium is not one of PP’s "use-cases"; so, very large tables might hit some other bug/constraint!

提交回复
热议问题