Tuesday, June 5, 2018

Using Machine Learning

Data

Microsoft released a rather intriguing capability in their May Power BI Desktop update that allows users to quickly gain competitive intelligence and pricing information. The update is called Get Data from Web by Example and enables the Power BI Desktop user to “scrape” data from websites using machine learning. 

Once you provide a couple of examples by typing text from the web site, the Query Editor in Power BI reads the remainder of the web page and places the text in the specified columns. Then once you have the data in a table, you can then use the myriad of transformation capabilities in the Query Editor to further shape your data. The previous link provides a step-by-step example using the Microsoft Store.

In this blog post I highlight the capability to support a relatively common use case of gaining competitive intelligence and pricing information in the retail industry. One of our clients in the retail pet supplies industry wanted to gain pricing and product information from one of their largest online competitors. Although there are many third-party services and tools that can support this need, they were interested to learn if Power BI could provide them with a fast and easy way of obtaining current products and prices from this competitor.   

Our client provided me with a list of the top 10 pet care brands in which they were interested in gaining competitive pricing information. Using this new feature of Power BI, I created the following report containing information on the top 10 brands (1,611 individual products) offered by their competitor.   

Below is a step-by-step summary of how I accomplished this:

1. I performed a search on the competitor’s website for each brand.

2. I used the Get Data from Web capability to establish a base table by brand.

3. I duplicated the initial query for the base table and then extracted data from succeeding web pages by pasting the new link over the initial link in the Advanced Editor.

4. I then used the Append Query capability to combine all tables from the same brand together.

5. Next I used the replace values and split column functionality to create columns for Brand and Size.

6. Once I completed all 10 brands, I then appended all the base brand queries together into one Product table (see below) which I then applied to the Power BI Data Model.  

BI Product Table

7. I then created the report highlighted above and published it to the Power BI Service.

8. Since the Power BI Service does not support scheduled refresh for web sources, I used a third-party tool, Power Update, to keep the Power BI Desktop file updated with daily pricing and automatically publish a new version of the report each day.

Summary and Considerations

In just under two hours, I was able to create a report on competitive pricing for more than 1,600 retail products using Power BI, without writing any custom functions or code. It’s important to be aware that Get Data from Web by Example does not work on all web sites and some sites may block any automated web scraping solution. However, the speed of development, versatility in data shaping, and integration with Power BI makes this a very handy tool.  I am excited to see what future machine learning and artificial intelligence capabilities Microsoft will incorporate in Power BI.

Neal Levin is the Vice President of Business Intelligence & Analytics at Rightpoint. Follow Neal on Twitter and LinkedIn.