By Melody Zacharias, ClearSight Solutions
To make it easy for you to work with Azure ML, Microsoft built the Azure Machine Learning Studio (ML Studio). This is a drag-and-drop environment where you go to build, test and run your predictive analytics.
The data can be loaded in one of several ways. One way is to use the Reader module. The Reader module allows quick and easy access to a variety of persistent storage locations, including your Azure SQL database.
The easier way, assuming your data is already available in a saved dataset, is to simply drag and drop it into the experiment canvas. However, data is rarely clean and defined well enough to simply drag and drop. That is why Microsoft provides you with a Data Transformation tool. This tool makes it easy to, clean, normalize, partition, or sample data. You can even use this tool to combine multiple datasets. Once your data is ready, then you can take advantage of the drag-and-drop.
When you have your data dropped into ML Studio, you can use the built in tools to engineer the best predictive model. Normally, if you are building your model, you would include all the relevant features and exclude all the features you consider irrelevant. While it is intuitive to include as much data and data sets as you perceive to be relevant, this doesn’t necessarily create the best predictive model.
All too often, the best solutions are counter intuitive. Using ML Studio, you can quickly and easily run a variety of experiments that will find those counter intuitive correlations that will give you the best predictive model.
Again, you simply drag-and-drop the analysis module that you want to run. With your data and analysis module connected via the canvas, you run the experiment in the ML Studio. Once it is run, you can save the results, edit your experiment, and run a new experiment for comparison.
Using the ML Studio, you can create multiple copies of your data. Then, using the Execute R Script module, you create a variety of derived features to include in your base dataset. You can then choose the appropriate built-in algorithm to analyze each augmented dataset. If appropriate, you can adjust many of the parameters in the algorithm you choose.
Once you have run the algorithm, the results can be tested against known outcomes from a separate dataset in order to choose which set of derived features leads to the best predicted outcomes. Using the built in modules, Score Model and Evaluate Model, you can quickly determine what data produced the most accurate predictions.
When you are happy with the results, the ML Studio will help you publish it as a web service to allow others to see it.
There are plenty of sample datasets already loaded in ML Studio. There are also sample experiments that you can use as a template, or for learning.