By Melody Zacharias, ClearSight Solutions
It is time to play!
A good place to start is by downloading the Azure Machine Learning Studio Overview diagram. Download the diagram here: Microsoft Azure Machine Learning Studio Capabilities Overview (Keep this diagram handy as you navigate your way around the Learning Studio.)
As mentioned in a previous post, you can try Azure ML for free! Sign in here with your Microsoft account and you get 10 GB of storage to play with and access to scripts and predictive web services.
Now it is time to create and run an experiment.
Click +New in the ML Studio, select EXPERIMENT, then Blank Experiment. To do this experiment, you will of course need some data. Microsoft has provided several sample data sets for you to play with. The samples can be found in a palette on the left of the window. Drag a dataset onto the experiment canvas.
You can view the data by clicking the output port at the bottom of the dataset and choosing Visualize. There will be some missing data in your dataset, so, it will have to be cleaned by removing any row that has a missing value. If there is a column with a lot of missing data, you can delete that specific columns by selecting the Project Columns module and choosing to “exclude” a named column. You can then choose the Clean Missing Data module and remove any row that still has missing data in it. In the Properties pane, choose Remove entire row under the Cleaning mode. Double click the module and enter the comment, “Remove missing value rows”.
You will then need to click RUN in the experiment canvas in order to complete the cleaning tasks.
Once the data has been cleaned, you can choose to view it again, or proceed on to the next step in preparation.
Each measurable property in your dataset is a “feature” in machine learning. Each row of data may, for example, represent a computer. Each column would represent a “feature” of that computer, like price, speed, weight, etcetera. In preparing your predictive model, some features will have more value than others. For example, the weight of a desktop computer likely plays very little roll in the purchase decision. Since a feature like weight would only add noise to your model, you may as well exclude it from the process.
Drag and drop another Project Columns module to the left output port of the Clean Missing Data module. Double click the module and type “Select features for prediction”. Choose Launch column selector in the Properties pane. Choose No columns for Begin With, and then choose Include and column names in the filter row. Enter the column names you want to include. Click the check mark on the bottom right to accept the input.
Now it is time to choose the learning algorithm that best suits your data and what you are trying to find. If your prediction is about a defined set of values, such as amount of RAM, then you would use a Classification module. On the other hand, if your prediction is from a continuous set of values, such as weight, you would use a regression module.
Drag and drop the Split Data module into your experiment. Now you will have two copies of your cleaned data. You can use one to train your model, and the second to test it. If you set the Fraction of rows in the first output dataset to 0.75, you will use 75% of the data to train the model and the remaining 25% will be used for testing. Run your experiment.
Drag and drop the Train Model module onto the canvas. Connect the left input to the output of your regression or classification module. Connect the right input port to the training data output (left port) of the Split Data module. In the Train Model module, choose Launch column selector in the Properties pane and then select the column you want your model to predict.
Now that you model is trained, you will need to figure out how well it did at predicting the desired outcome. To do this, drag and drop the Score Model module to the canvas. Connect the left input port to the output of the Train Model module. Connect the right input port to the test data output of the Split Data module. Connect the left input port to the output of the Train Model module. To see your results, choose the output port on your Score Model and click Visualize.
Your last step in preparing the experiment will be to drag and drop the Evaluate Model module onto the canvas. Connect it to the Score Model module.
Now run the experiment. If you are not happy with the results, you can go back and select different features, modify or change your learning algorithm, or even add additional algorithms to your experiment.
Play with your data until you find the experiment that is right for you data and what you are trying to predict. Once you have found the sweet spot, you can deploy it as a web service directly from the Studio.