Machine Learning Requires Multiple Steps

Article By : M. Tim Jones, Mouser Electronics Inc.

Deploying machine learning is a multi-step process. This article discusses the steps and breaks them down for a better understanding of machine learning models.

Deploying machine learning is a multi-step process. It involves selecting a model, training it for a specific task, validating it with test data, and then deploying and monitoring the model in production. Here, we’ll discuss these steps and break them down to introduce you to machine learning.

Machine learning refers to systems that, without explicit instruction, are capable of learning and improving. These systems learn from data to perform a particular task or function. In some cases, learning, or more specifically training, occurs in a supervised manner where incorrect outputs result in an adjustment of the model to nudge it toward the correct output. In other cases, unsupervised learning occurs where the system organizes the data to reveal previously unknown patterns. Most machine-learning models follow these two paradigms (supervised vs. unsupervised learning).

Let’s now dig into what is meant by a “model” and then explore how data becomes the fuel for machine learning.

Machine-Learning Model

A model is an abstraction of a solution for machine learning. The model defines the architecture which, once trained, becomes an implementation. Therefore, we don’t deploy models; we deploy implementations of models that are trained from data (more on this in the next section). So models plus data plus training equal instances of machine-learning solutions (Figure 1).

Figure 1 From Machine Learning Model to Solution.

Machine-learning solutions represent a system. They accept inputs, perform computation of different types within the network and then provide an output. The input and output represent numerical data which means that, in some cases, translation is required. For example, feeding text data into a deep-learning network requires an encoding of words into a numerical form that is commonly a high-dimensional vector given the variety of words that could be used. Similarly, outputs might require translation from a numerical form back into a textual form.

Machine-learning models come in many types, such as neural network models, Bayesian models, regression models, clustering models, and many more. The model that you choose is based upon the problem at hand.


This month’s In Focus highlights the developments in artificial intelligence (AI) and machine learning (ML) sectors, the engineering challenges, and whether or not the world is ready for an AI-centric future.

 


In the context of neural networks, models range from shallow multi-layer networks to deep neural networks that include many layers of specialized neurons (processing units). Deep neural networks also have a range of models available based upon your target application. For example:

  • If your application is focused on identifying objects within images, then the Convolutional Neural Network (CNN) is an ideal model. CNNs have been applied to skin-cancer detection and outperform the average dermatologist.
  • If your application involves predicting or generating complex sequences (such as human language sentences), then Recurrent Neural Networks (RNN) or Long-Short-Term-Memory networks (LSTM) are ideal models. LSTMs have also been applied to machine translation of human languages.
  • If your application involves describing the contents of an image in human language, then a combination of a CNN and an LSTM can be used (where the image is fed into the CNN and the output of the CNN represents the input to the LSTM, which emits the word sequences).
  • If your application involves the generation of realistic images (such as landscapes or faces), then a Generative Adversarial Network (GAN) represents the current state-of-the-art model.

These models represent some of the more popular deep neural network architectures in use today. Deep neural networks are popular because they can accept unstructured data such as images, video, or audio information. The layers within network construct a hierarchy of features that allow them to classify very complex information. Deep neural networks have demonstrated state-of-the-art performance over a wide number of problem domains. But like other machine learning models, their accuracy is dependent upon data. Let’s explore this aspect next.

Data and Training

Data is the fuel that drives machine learning, not just in operation but in the construction of a machine-learning solution through model training. In the context of training data for deep neural networks, it’s important to explore the necessary data in the context of quantity and quality.

Deep neural networks require large amounts of data for training; one rule of thumb for image-based classification is 1,000 images per class. But the answer is obviously dependent upon the complexity of the model and tolerance for error. Some examples from production machine learning solutions yield a spectrum of dataset sizes. A facial detection and recognition system required 450,000 images and a question-and-answer chat-bot was trained with 200,000 questions paired with 2 million answers. Smaller datasets can also suffice based upon the problem being solved. A sentiment analysis solution (which determines the polarity of opinion from written text) required only tens of thousands of samples.

The quality of the data is just as important as the quantity. Given the large datasets required for training, even small amounts of erroneous training data can lead to a poor solution. Depending upon the type of data necessary, your data might go through a cleansing process. This ensures that the dataset is consistent, lacks duplicate data, is accurate, and complete (lacks invalid or incomplete data). Tools exist that can support this process. Validating data for bias is also important to ensure that data does not lead to a biased machine learning solution.

Machine-learning training operates on numerical data, so a pre-processing step can be required depending upon your solution. For example, if your data is human language, it must first be translated into a numerical form to be capable of processing. Images can be pre-processed for consistency. For example, images fed into a deep neural network would be resized and smoothed to remove noise (among other operations).

One of the biggest problems in machine learning is acquiring a dataset to train your machine-learning solution. Depending upon your problem, this could be the largest endeavor since it might not exist and require a separate effort to capture.

Finally, the dataset should be segmented between training data and test data. The training portion is used to train the model, and once trained, the test data is used to validate the accuracy of the solution (Figure 2).

Figure 2 Dataset Splitting for Training and Validation.

Tools exist to accomplish this process and most frameworks include “split” functions to segregate training and test data. Let’s now explore some of the frameworks that simplify the construction of machine-learning solutions.

Framework

It’s no longer necessary to build your machine-learning model from the ground up. Instead, you can rely on a framework that includes these models and other tools to prepare data and validate your solution. This same framework also provides the environment through which you’ll deploy your solution for production. Choosing a framework is typically done based upon familiarity, but if you’re starting out you can choose one that fits your application and the model that you intend to use.

TensorFlow is the best of the deep-learning frameworks. It supports all of the popular models (CNN, RNN, LSTM, etc.) and allows you to develop in Python or C++. You can deploy TensorFlow solutions on high end servers down to mobile devices. If you’re just starting out, TensorFlow is the place to start if nothing else than for its tutorials and breadth of documentation.

CAFFE started out as an academic project, but after being released into open source has grown into a popular deep learning framework. CAFFE is written in C++, but also supports Python for model development. Like TensorFlow, it supports a wide range of deep learning models

Facebook began work on a derivative of CAFFE called Caffe2 which included new models, but rather than bifurcate the CAFFE project, it was instead merged into another framework called PyTorch. PyTorch is another good choice based upon the wealth of information available, including hands-on tutorials to build different types of solutions.

The R language and environment is a popular tool for machine learning and data science. It’s interactive, which allows you to prototype and build a solution incrementally while seeing the results in stages. Along with Keras (an open-source neural-network library), you can build CNNs and RNNs with minimal development.

Model Auditing

Once your model is trained and meeting your accuracy requirement, you deploy it in production. But once there, you’ll need to audit your solution to ensure it meets your requirements. This is particularly important based upon the decisions made by your model and how they can impact people.

Some machine-learning models are transparent and can be understood (such as decision trees). But other models such as deep neural networks are what are considered “black-box” and decisions are made by millions of calculations that cannot be explained by the model itself. Therefore, while periodic auditing was once acceptable, continuous auditing is quickly becoming the norm in these black-box situations because mistakes are inevitable. Once a mistake is discovered, this information can be used as data to tweak the model.

The other consideration is the lifetime of the solution. Models decay, and input data can evolve resulting in changes of the model’s performance. Therefore, accepting that a solution will be brittle over time, machine-learning solutions must change along with the world around it.

Summary

To deploy a machine-learning solution, we start with a problem and then consider possible models that solve it. Acquiring data is next, and once properly cleansed and segmented, the model can be trained and validated using a machine-learning framework. Not all frameworks are the same, and based upon your model and experience, one of many can be selected and applied. This framework is then used to deploy the machine-learning solution and with proper auditing, the solution operates in the real world with live data.

For more information, visit Mouser Electronics.

 

About the Author

M. Tim Jones is a veteran embedded firmware architect with over 30 years of architecture and development experience. Tim is the author of several books and many articles across the spectrum of software and firmware development. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and protocol development.

 

 

Subscribe to Newsletter

Leave a comment