XaocCPS September 15, 2014 at 12:00

Introduction to machine learning and a quick start with Azure ML

Transfer

This is a translation of an article by Rafal Lukavecki from Project Botticelli Ltd, which offers online training and courses on various technologies, including machine learning and Power BI and so on. The original article can be found at

The Azure Machine Learning machine learning service is currently in pre- public testing available to anyone with an Azure account (or at least trial access ). If you are wondering why I was always so excited about this technology, see my review article written a month ago or read on this post in which I will tell you everything.

In short, in order to perform predictive analytic tasks with Azure Machine Learning, you just need to follow these steps:

Download or import any current or accumulated data online (for example, your client’s demographics and their general expenses)
Build and validate a model (e.g. predict cost based demographics)
Create a web service that uses your models to make fast predictions in real time (decide which offers to provide to a new client based on his demography)

The Azure ML service (also known as the Passau project ) is represented by two conceptual components: Experiments and Web Services and one development tool called ML Studio . You can invite other people who have a Microsoft account (Live ID) to work together in your workdspaces using ML Studio without having to even pay for an Azure subscription to work with you.

Experiments can be represented as stream configurations ( data-flow) of what you would like to do with your information and your models. You, as an Azure ML data researcher, focus on experiments and can spend all your time in ML Studio, doing only rebuilding experiments, changing parameters, algorithms, validation criteria, periodically making changes to data, and so on. ML Studio is a web application and looks like the Azure Management Portal (as of this writing, mid-2014). The interface looks clean, pleasant and works well not only in IE, but also in Firefox and Chrome, though with some caveats, but this is only the first preview version.

ML Studio is the place where you start your work, deciding which data sources you want to use: the data sets you downloaded or live data available through the mechanismReader from a web page, OData, SQL Azure, Microsoft Azure, Hive or Azure blobs. Then, you may need to perform some Data Transformations , such as grouping, renaming columns, joining, eliminating duplicates, or a very useful binning / discretisation operation. In addition, you can take advantage of other, more interesting transformations, for example, filters for the final and infinite input response ( Finite and Infinite Input Response), which are used in signal processing. They can also be applied more broadly for data related to economics, which can be considered as complex waves (for example, especially time series) This is part of the work of determining seasonality and is often associated with finding frequencies similar to musical in these seasonalities. In addition, if you are just starting your project and are not quite sure which of the data columns to include, then the Feature Selection filters may be useful to you, presenting you with a good choice of correlation indicators. In practice, however, in the later steps you will want to specify a set of columns manually to achieve maximum accuracy.

azure-ml-machine-learning-tasks

Now we will move on to what we have been waiting for: we will do the real Machine Learning - which means Initialization (determination) of the model, Training ( Train ) of the model with some data, Evaluate) the performance of the model and its validity and, if everything is ok, the Score of the model (making predictions based on it). Azure ML offers many algorithms for Classifying Tasks, including Multiclass and Two-Class Decision Forests, Decision Jungles (developed by Microsoft Research), Logistic Regression, Neural Networks, as well as Two-Class Averages Perceptrons, Bayes Point Machine, Boosted Decision Trees and Support Vector Machines (SVM). Clustering uses a variation of the standard K-Means approach. Regressions include Bayesian Linear, Boosted Decision Trees, Decision Forests, of courseLinear Regression, Neural Network Regression, Ordinal, and Poisson Regression. And this is only in version 1.

You can use useful statistical functions in your experiments, including general elementary ones, for example, calculating deviations. Try it yourself, start with a simple task instructions Descriptive Statistics of their data and Visualize ( Visualise ) results (use the connection point on the tasks). Enjoy the boxplots in your visualizations - something that has long been lacking in all Microsoft BI tools, even Excel ...

One cool example of how Azure ML brings external research into your experiments can be found in the task sectionText Analytics. The Named Entity Recognition task will allow you to process the input text (called stories , for example, email addresses, typed descriptions of situations or tweets) and extract named terms from them, automatically classify them as People, Places or Organizations . There is also support for the Vowpal Wabbit project , which is supported by Yahoo and Microsoft Research. You can use it to get hashes for entities on demand. I expect more tools and capabilities in this area to come in the future, as it is obvious that Microsoft has a huge amount of knowledge stored inside Bing.

Deep R language support

And on top of that, you can use R inside Azure ML. According to my calculations, today Azure ML contains about 410 pre-installed packages on top of R 3.1.0 (surprisingly, the latest version). Among the packages are ggplot2 (yes!), Plyr and dplyr, car, datasets, HMisc, MASS and all the other most commonly used packages for data mining. like rpart, nnet, survival, boot and so on.

which-r-packages-come-with-azure-ml

If you want to find a list of packages that were included in Azure ML, then just create a small experiment, such as mine, shown here, execute some R code and save the resulting CSV on your computer. Column 1 will show all included packages.

What if your favorite R package (e.g. ROCR or nleqslv ) is not listed? In fact, the documentation may confuse you. It says that “currently” there is no way to install your own packages, however, then the documentation describes a workaround that helps to connect your package using a zip file. You can find a description of this approach at the bottom of this link , which shows how to use install.packages () while using the link to the file passed to the Execute R Script task .

The key to understanding the importance of the fact that R is part of Azure ML, in my opinion, is not only that the platform provides access to the de facto statistics and analytics language (lingua-franca), but also how fast and painless it is in the process of processing your data. This is especially noticeable against the background of the fact that R itself is not such a convenient tool for data manipulation. So instead of using the reputable RODBC (included) inside your R-script, you can consider using Azure ML for all heavy data processing tasks (sorry, plyr fans ) and transfer the data to the R-script as an Azure ML Dataset Data Table, which becomes available as a native data frame for R. The data will magically appear inside your script as an object called dataset . You can add multiple data sources.

I have not finished my performance tests yet, but anything that can somehow improve the performance of R in processing large amounts of data can only be warmly welcomed. In addition, these features look like an obvious advantage of the cloud provider, compared with the usual boxed solution. And I can imagine that Microsoft uses a number of tricks to increase performance when Azure datasets are linked to the Azure ML service, even if you keep in mind the 10GB limit at this time.

azure-ml-api

With or without R, you can have a working experiment that you can use as a working brick inside your web-based application. Imagine that you just built a recommendation system. In terms of Azure ML, you have an experiment that uses the Scoring (prediction) task. You determine which of the input ports should be used as the Publish Input for your web service, and accordingly, what should be considered Publish Output . They will be presented in the form of small green and blue shootouts on the contour of the task. You restart your experiment again and use Studio ML to publish it as an Azure ML Web Service . Now you can consume the results throughAzure ML REST API as a simple web service or OData endpoint . This API offers a Request Response Service (RRS) for synchronous access with low latency for making predictions, and asynchronous execution of the Batch Execution Service (BES) for retraining the model, say, with your future fresh data. This API offers an automatically generated sample code that you can simply copy and paste for use in Python, R, or a C # application, as well as anywhere else, as it is all just based on REST and JSON.

testing-a-prediction

There is a cool little page for testing, which will allow you to enter the required values for a fresh service and make a test prediction.

The service also has additional functions intended for practical use, for example, preventing Microsoft from automatically updating any of the components (tasks, etc.) of your experiment, the change of which could change or even break your work. The right decision, Microsoft is something that any IT professionals who support web systems do not like to face. You can test service updates in staging and configure security through the API access key.

Cost

How much does it all cost? Bearing in mind the pricing of the preview version, it looks very attractive. There are two types of costs, per-hour active compute per hour and per-web-service API call), both types of costs are proportional. Hourly pay is lower while you use ML Studio ($ 0.38 / hour) and slightly higher in industrial operation through the ML API Service ($ 0.75 / hour). Costs of API calls are not considered while you are working in ML Studio and cost $ 0.18 per 1000 predictions during application in industrial operation. If anything, this is an interesting and extremely simple pricing model, unlike the others that Microsoft had. I am extremely interested in knowing what my development clients think due to the cool opportunity to effectively resell Azure ML as part of your own web application, spending only a minimum of support effort, without having to build the entire system yourself.

Where to begin?

Where to begin? Visit azure.microsoft.com , subscribe, and create a workspace in New / Data Services / Machine Learning. Then go to the Dashboard and click the Sign-in to ML Studio link . After reviewing the tasks that will determine the Experiment, I will advise you to choose one of the many examples, create a copy of it and run it. If it works, follow the steps above to publish it as your first prediction web service.

Of course, make sure that you do not miss our upcoming videos and articles on this topic: become a member of the site to receive a newsletter with a large set of information. If you want to get started quickly, pay attention to our training.Data mining Training , especially for modules dedicated to data preparation, as these concepts, especially cases, input and output columns, will definitely come in handy when working with Azure ML.

Wish you enjoy learning machine learning!

Announcement! All readers of the translation of this article are offered a 10% discount on author courses with code MSBI2014RU! The discount is valid until the end of October 2014

Sitelinks

Try Azure for 30 days for free

Only registered users can participate in the survey. Please come in.

Are you interested in Machine Learning?

24.2% Yes, I use in practice 60
59.9% Yes, I want to get acquainted for solving problems in the future 148
32.7% Yes, just interesting 81
2% No 5

Tags: