Machine learning is transforming business. But even as the technology advances, companies still struggle to take advantage of it, largely because they don\u2019t understand how to strategically implement machine learning in service of business goals. Hype hasn\u2019t helped, sowing confusion over what exactly machine learning is, how well it works and what it can do for your company.\nHere, we provide a clear-eyed look at what machine learning is and how it can be used today.\nWhat is machine learning?\nMachine learning is a subset of artificial intelligence that enables systems to learn and predict outcomes without explicit programming. It is often used interchangeably with the term AI because it is the AI technique that has made the greatest impact in the real world to date, and it's what you're most likely to use in your business. Chatbots, product recommendations, spam filters, self-driving cars and a huge range of other systems leverage machine learning, as do \u201cintelligent agents\u201d like Siri and Cortana.\n[ Find out whether your organization is truly ready for taking on artificial intelligence projects and which deep learning network is best for your organization. | Get the latest insights with our CIO Daily newsletter. ]\nInstead of writing algorithms and rules that make decisions directly, or trying to program a computer to \u201cbe intelligent\u201d using sets of rules, exceptions and filters, machine learning teaches computer systems to make decisions by learning from large data sets. Rule-based systems quickly become fragile when they have to account for the complexity of the real world; machine learning can create models that represent and generalize patterns in the data you use to train it, and it can use those models to interpret and analyze new information.\nMachine learning is suitable for classification, which includes the ability to recognize text and objects in images and video, as well as finding associations in data or segmenting data into clusters (e.g., finding groups of customers). Machine learning is also adept at prediction, such as calculating the likelihood of events or forecasting outcomes. Machine learning can also be used to generate missing data; for example, the latest version of CorelDRAW uses machine learning to interpolate the smooth stroke you\u2019re trying to draw from multiple rough strokes you make with the pen tool.\nAt the heart of machine learning are algorithms. Some, such as regressions, k-means clustering and support vector machines, have been in use for decades. Support vector machines, for example, use mathematical methods for representing how a dividing line can be drawn between things that belong in separate categories. The key to effective use of machine learning is matching the right algorithm to your problem.\nNeural networks\nA neural network is a machine learning algorithm built on a network of interconnected nodes that work well for tasks like recognizing patterns.\nNeural networks aren\u2019t a new algorithm, but the availability of large data sets and more powerful processing (especially GPUs, which can handle large streams of data in parallel) have only recently made them useful in practice. Despite the name, neural networks are based only loosely on biological neurons. Each node in a neural network has connections to other nodes that are triggered by inputs. When triggered, each node adds a weight to its input to mark the probability that it does or doesn\u2019t match that node\u2019s function. The nodes are organized in fixed layers that the data flows through, unlike the brain, which creates, removes and reorganizes synapse connections regularly.\nDeep learning\nDeep learning is a subset of machine learning based on deep neural networks. Deep neural networks are neural network that have many layers for performing learning in multiple steps. Convolutional deep neural networks often perform image recognition by processing a hierarchy of features where each layer looks for more complicated objects. For example, the first layer of a deep network that recognizes dog breeds might be trained to find the shape of the dog in an image, the second layer might look at textures like fur and teeth, with other layers recognizing ears, eyes, tails and other characteristics, and the final level distinguishing different breeds. Recursive deep neural networks are used for speech recognition and natural language processing, where the sequence and context are important.\nThere are many open source deep learning toolkits available that you can use to build your own systems. Theano, Torch and Caffe are popular choices, and Google\u2019s TensorFlow and Microsoft Cognitive Toolkit let you use multiple servers to build more powerful systems with more layers in your network.\nMicrosoft\u2019s Distributed Machine Learning Toolkit packages up several of these deep learning toolkits with other machine learning libraries, and both AWS and Azure offer VMs with deep learning toolkits pre-installed.\nMachine learning in practice\nMachine learning results are a percentage certainty that the data you\u2019re looking at matches what your machine learning model is trained to find. So, a deep network trained to identify emotions from photographs and videos of people\u2019s faces might score an image as \u201c97.6% happiness 0.1% sadness 5.2% surprise 0.5% neutral 0.2% anger 0.3% contempt 0.01% disgust 12% fear.\u201d Using that information means working with probabilities and uncertainty, not exact results.\nProbabilistic machine learning uses the concept of probability to enable you to perform machine learning without writing algorithms at all. Instead of the set values of variables in standard programming, some variables in probabilistic programming have values that fall in a known range and others have unknown values. Treat the data you want to understand as if it was the output of this code and you can work backwards to fill in what those unknown values would have to be to produce that result. With less coding, you can do more prototyping and experimenting; probabilistic machine learning is also easier to debug.\nThis is the technique the Clutter feature in Outlook uses to filter messages that are less likely to be interesting to you based on what messages you\u2019ve read, replied to and deleted in the past. It was built with Infer.NET, a .NET framework you can use to build your own probabilistic systems.\nCognitive computing is the term IBM uses for its Watson offerings, because back in 2011 when an earlier version won Jeopardy, the term AI wasn't fashionable; over the decades it\u2019s been worked on, AI has gone through alternating periods of hype and dismissal.\nWatson isn't a single tool. It's a mix of models and APIs that you can also get from other vendors such as Salesforce, Twilio, Google and Microsoft. These give you so-called \u201ccognitive\u201d services, such as image recognition, including facial recognition, speech (and speaker) recognition, natural language understanding, sentiment analysis and other recognition APIs that look like human cognitive abilities. Whether it's Watson or Microsoft's Cognitive Services, the cognitive term is really just a marketing brand wrapped around a collection of (very useful) technologies. You could use these APIs to create a chatbot from an existing FAQ page that can answer text queries and also recognise photos of products to give the right support information, or use photos of shelf labels to check stock levels.\nMany \u201ccognitive\u201d APIs use deep learning, but you don\u2019t need to know how they\u2019re built because many work as REST APIs that you call from your own app. Some let you create custom models from your own data. Salesforce Einstein has a custom image recognition service and Microsoft\u2019s Cognitive APIs let you create custom models for text, speech, images and video.\nThat\u2019s made easier by transfer learning, which is less a technique and more a useful side effect of deep networks. A deep neural network that has been trained to do one thing, like translating between English and Mandarin, turns out to learn a second task, like translating between English and French, more efficiently. That may be because the very long numbers that represent, say, the mathematical relationships between words like big and large are to some degree common between languages, but we don\u2019t really know.\nTransfer learning isn't well understood but it may enable you to get good results from a smaller training set. The Microsoft Custom Vision Service uses transfer learning to train an image recognizer in just a few minutes using 30 to 50 images per category, rather than the thousands usually needed for accurate results.\nBuild your own machine learning system\nIf you don\u2019t want pre-built APIs, and you have the data to work with, there\u2019s an enormous range of tools for building machine learning systems, from R and Python scripts, to predictive analytics using Spark and Hadoop, to specific AI tools and frameworks.\nRather than set up your own infrastructure, you can use machine learning services in the cloud to build data models. With cloud services you do not need to install a range of tools. Moreover, these services build in more of the expertise needed to get successful results.\n\n\t\n\nAmazon Machine Learning offers several machine learning models you can use with data stored in S3, Redshift or R3, but you can\u2019t export the models, and the training set size is rather limited. Microsoft\u2019s Azure ML Studio has a wider range of algorithms, including deep learning, plus R and Python packages, and a graphical user interface for working with them. It also offers the option to use Azure Batch to periodically load extremely large training sets, and you can use your trained models as APIs to call from your own programs and services. There are also machine learning features such as image recognition built into cloud databases like SQL Azure Data Lake, so that you can do your machine learning where your data is.\nSupervised learning\nMany machine learning techniques use supervised learning, in which a function is derived from labelled training data. Developers choose and label a set of training data, set aside a proportion of that data for testing, and score the results from the machine learning system to help it improve. The training process can be complex, and results are often probabilities, with a system being, for example, 30 percent confident that it has recognized a dog in an image, 80 percent confident it\u2019s found a cat, and maybe even 2 percent certain it\u2019s found a bicycle. The feedback developers give the system is likely a score between one and zero indicating how close the answer is to correct.\nIt\u2019s important not to train the system too precisely to the training data; that\u2019s called overfitting and it means the system won\u2019t be able to generalize to cope with new inputs. If the data changes significantly over time, developers will need to retrain the system due to what some researchers refer to as \u201cML rot.\u201d\nMachine learning algorithms \u2014 and when to use them\nIf you already know what the labels for all the items in your data set are, assigning labels to new examples is a classification problem. If you\u2019re trying to predict a result like the selling price of a house based on its size, that\u2019s a regression problem because house price is a continuous rather than discrete category. (Predicting whether a house will sell for more or less than the asking price is a classification problem because those are two distinct categories.)\nIf you don\u2019t know all the labels, you can\u2019t use them for training; instead, you score the results and leave your system to devise rules that make sense of the answers it gets right or wrong, in what\u2019s known as unsupervised learning. The most common unsupervised learning algorithm is clustering, which derives the structure of your data by looking at relationships between variables in the data. Amazon\u2019s product recommendation system that tells you what people who bought an item also bought uses unsupervised learning.\nWith reinforcement learning, the system learns as it goes by seeing what happens. You set up a clear set of rewards so the system can judge how successful its actions are. Reinforcement learning is well suited to game play because there are obvious rewards. Google\u2019s DeepMind AlphaGo used reinforcement learning to learn Go, Microsoft\u2019 Project Malmo system allows researchers to use Minecraft as a reinforcement learning environment, and a bot built with OpenAI\u2019s reinforcement learning algorithm recently beat several top-ranked players at Valve\u2019s Dota 2 game.\nThe complexity of creating accurate, useful rewards has limited the use of reinforcement learning, but Microsoft has been using a specific form of reinforcement learning called contextual bandits (based on the concept of a multi-armed slot machine) to significantly improve click-through rates on MSN. That system is now available as the Microsoft Custom Decision Service API. Microsoft is also using a reinforcement learning system in a pilot where customer service chatbots monitor how useful their automated responses are and offer to hand you off to a real person if the information isn\u2019t what you need; the human agent also scores the bot to help it improve.\nCombining machine learning algorithms for best results\nOften, it takes more than one machine learning method to get the best result; ensemble learning systems use multiple machine learning techniques in combination. For example, the DeepMind system that beat expert human players at Go uses not only reinforcement learning but also supervised deep learning to learn from thousands of recorded Go matches between human players. That combination is sometimes known as semi-supervised learning.\nSimilarly, the machine learning system that Microsoft Kinect uses to recognize human movements was built with a combination of a discriminative system \u2014 to build that Microsoft rented a Hollywood motion-capture suite, extracted the position of the skeleton and labelled the individual body parts to classify which of the various known postures it was in \u2014 and a generative system, which used a model of the characteristics of each posture to synthesize thousands more images to give the system a large enough data set to learn from.\nPredictive analytics often combines different machine learning and statistical techniques; one model might score how likely a group of customers is to churn, with another model predicting which channel you should use to contact each person with an offer that might keep them as a customer.\nNavigating the downsides of machine learning\nBecause machine learning systems aren't explicitly programmed to solve problems, it\u2019s difficult to know how a system arrived at its results. This is known as a \u201cblack box\u201d problem, and it can have consequences, especially in regulated industries.\nAs machine learning becomes more widely used, you\u2019ll need to explain why your machine learning-powered systems do what they do. Some markets \u2014 housing, financial decisions and healthcare \u2014 already have regulations requiring you to give explanations for decisions. You may also want algorithmic transparency so that you can audit machine learning performance. Details of the training data and the algorithms in use isn\u2019t enough. There are many layers of non-linear processing going on inside a deep network, making it very difficult to understand why a deep network is making a particular decision. A common technique is to use another machine learning system to describe the behavior of the first.\nYou also need to be aware of the dangers of algorithmic bias, such as when a machine learning system reinforces the bias in a data set that associates men with sports and women with domestic tasks because all its examples of sporting activities have pictures of men and all the people pictured in kitchens are women. Or when a system that correlates non-medical information makes decisions that disadvantage people with a particular medical condition.\nMachine learning can only be as good as the data it trains on to build its model and the data it processes, so it\u2019s important to scrutinize the data you\u2019re using. Machine learning also doesn't understand the data or the concepts behind it the way a person might. For example, researchers can create pictures that look like random static but get recognized as specific objects.\nThere are plenty of recognition and classification problems that machine learning can solve more quickly and efficiently than humans, but for the foreseeable future machine learning is best thought of as a set of tools to support people at work rather than replace them.\nRelated articles\n\n 10 signs you\u2019re ready for AI \u2014 but might not succeed \n Which deep learning network is best for you? \n How to build a highly effective AI team \n Why you should invest in AI talent now \n Why AI careers can start with a degree in linguistics \n The year of Alexa and the coming decade of A.I.