Power Query: Steroids for MS Excel and Power BI
In this article I want to talk about some of the features of the free and extremely useful, but as yet little-known MS Excel add-in called Power Query .
Power Query allows you to collect data from a variety of sources (such as csv, xls, json, text files, folders with these files, a variety of databases, various api like Facebook opengraph, Google Analytics, Yandex.Metrica, CallTouch and much more) , create repeatable sequences of processing this data and load them into Excel tables or the data model itself.
And under the cut you can find the details of all this splendor of possibilities.
Compatibility and technical details
Power Query is available for free for all versions of Windows Excel 2010, 2013 and is built-in by default in Windows Excel 2016. For MacOS X users, Power Query is not available (however, even without this, Mac Excel is disgusting to the touch and advanced users, including me, most often work with normal Excel through Parallels or running it on a remote Windows machine).
Also, Power Query is integrated into a new business intelligence product - Power BI , and there are also rumors that Power Query will appear in other Microsoft products. Those. Power Query has a bright future and it’s time for adherents of Microsoft technologies (and not only) to take up its development.
How does it work
After installing Power Query, a separate tab of the same name appears in the Excel 2010-2013 interface.
In the new Excel 2016, Power Query functionality is available on the Data tab, in the “Get & Transform” block.
First, in the Excel interface, we select a specific data source, from where we need to get it, and before us opens the window of Power Query itself with a preview of the first lines of the downloaded data (area 1). At the top of the window is a Ribbon with data processing commands (area 2). And on the right side of the screen (area 3) we have a panel with the sequence of all actions that apply to the data.
Power Query Features
Power Query has a lot of features and I want to dwell on some of my favorites.
As I wrote above, Power Query is remarkable in that it allows you to connect to a variety of data sources. So it allows you to load data from CSV, TXT, XML, json files. Moreover, the process of selecting options for downloading the same CSV files is more flexible and more convenient than it is implemented by regular Excel tools: the encoding is automatically often selected correctly and you can specify the column separator character.
Combining files in a folder
Power Query can take data from a specified folder and combine its contents into single tables. This can be useful, for example, if you periodically receive some specialized reports for a separate period of time, but the data for analysis is needed in a common table. GIF
For columns from text in Power Query, by clicking on the buttons on the Ribbon, you can apply functions such as:
- Divide the column by character or number of characters. And unlike Excel, you can specify the maximum number of columns, as well as the direction from which to read the characters - left, right.
- Change the case of cells in a column
- Count the number of characters in the cells of a column.
For columns with numerical values by clicking on the buttons on the Ribbon you can apply:
- Arithmetic operations
- Raise to degrees, calculate logarithms, factorials, roots
- Trigonometric operations
- Round to set values
- Determine parity, etc.
Functions for working with dates, time and duration
For columns with date and time values by clicking on the buttons on the Ribbon, you can apply:
- Automatically detect the format of the entered date (in excel, this is a big pain)
- Extract in one click the number of the month, day of the week, the number of days or hours in the period, etc.
Unpivot - Pivot
The Power Query interface has the “Unpivot” function, which in one click allows you to bring the data with one metric arranged in columns by periods to a form that will be convenient for use in pivot tables (I understand that it is difficult to write - see an example). Also, there is a function with the reverse action of Pivot. GIF
Operation Merge - Death of the VLOOKUP
The VLOOKUP function is one of the most used functions in MS Excel. It allows you to pull data into one table from another table with a single key. And just for this function in Power Query there is a much more convenient alternative - the Merge operation. With this operation, joining tables of several tables into one by key (by simple or compound key, when a match must be found by several columns) is performed in literally 7 mouse clicks without keyboard input.
The Merge operation is an analogue of join in sql, and it can be configured so that join is of different types - Inner (default), Left Outer, Right Outer, Full Outer.
Upd. I was prompted here that Power Query is not able to do Aproximate join, but can. True, out of the box there are no alternatives. GIF
Connection to various databases. Query Folding.
Power Query is also remarkable for being able to cling to a wide variety of databases - from MS SQL and MySQL to Postgres and HP Vertica. In this case, you do not even need to know SQL or another database language, because data preview is displayed in the Power Query interface and all those operations that are performed in the interface are transparently translated into the database query language.
And in Power Query there is the concept of Query Folding: if you are connected to a compatible database (currently it is MS SQl), then Power Query will try to perform heavy data processing operations on the server side and take only processed data to itself. This feature dramatically improves the performance of many treatments.
Programming Language “M”
Power Query add-on is an interpreter of a new, scripted, data-specialized programming language M.
For each action that we perform with data in the Power Query GUI, a new line of code is written to the script. Reflecting this, in a panel with a sequence of actions (area 3), a new step is created with a talking name. Due to this, using the panel with a sequence of actions, we can always see how our data looks at each processing step, we can add new steps, change the settings of the applied operation at a specific step, change their order or delete unnecessary steps. GIF
Also, we can always see and edit the code of the written script itself. And it will look something like this:
The M language, unfortunately, is neither like the formula language in Excel nor MDX and, fortunately, is not like Visual Basic. However, it is very easy to learn and offers tremendous opportunities for manipulating data that are not available using the graphical interface.
Downloading data from Yandex.Metrica, Google Analytics and other Api
Having mastered the “M” language a bit, I was able to write programs in Power Query that can connect to the Yandex.Metrica and Google Analytics APIs and pick up data with specified settings from there. The programs PQYandexMetrika and PQGoogleAnalytics I put in open source on github under the GPL license. I urge you to use. And I will be very happy if these programs will be finalized by enthusiasts.
For Google Analytics, there are quite a lot of exporters of this kind in different implementations, but for Yandex.Metrica, as far as I know, my exporter was the first publicly available, and even free :)
Power Query can generate headers for post and get requests and take data from the Internet. Thanks to this, with the right level of dexterity, Power Query can be connected to almost any API. In particular, for my research, I pull data on phone calls of clients from the CallTouch API, from the Rescuetime computer activity monitoring service API, I am engaged in parsing the web pages I need to retrieve relevant information.
Once again about repeatability and application options
As I wrote above, the Power Query script is a repeatable sequence of manipulations applied to data. This means that once you configure the processing you need, you can apply it to new files by changing just one step in the script - indicating the path to the new file. Thanks to this, you can get rid of a huge amount of routine and free up time for productive work - data analysis.
I do web analytics and contextual advertising. And it so happened that from the moment I met with Power Query in its interfaces I spend more time than in Excel itself. It's more comfortable for me. At the same time, my consumption of another great add-in in MS Excel, PowerPivot, also increased.
Here are some of the tasks I do using Power Query:
- I analyze the semantics for Thick projects,
- I make frequency dictionaries,
- I create web analytic dashboards and reports to analyze specific slices,
- Restoring goal achievement in web analytics systems, if they are not configured on the project,
- I smooth the probability forecast by the methods of Andrey Belousov (+ Bayes :),
- I do an audit of contextual advertising on data from K50 statistics,
- And many other different ad-hoc analysis tasks that need to be done only once
Here is the bi system that I talked about at Yac / M 2015 ( video ) I made completely with Power Query and loaded the data inside PowerPivot.
A few words about localization
On the Microsoft website for users from Russia, by default, Power Query with the interface translated into Russian is downloaded. Fortunately, the localizers did not get to the translation into Russian of the programming language (as it was done with the formula language in excel), but the lives of users with ambiguous translations made it very difficult. And I urge you to download, install and use the English version of Power Query. Believe me, it will be much clearer.
Official resources in English:
- Download English version of Power Query
- M Formula Reference
- Microsoft Technet Forum thread about Power Query
At the time of writing the article in Russian, there is practically no information on Power Query, and what I found is given below:
- Import data from tables into Google Spreadsheets
- Power BI Getting data from the REST API
- Excel, Power Pivot, and Power Query Facebook Advanced Community
- Updated page in Marketing-wiki about Power Query