What is machine learning? A machine learning algorithm (an algorithm is nothing but a set of rules or steps to achieve some outcome), is being trained how to play the classic Atari game Breakout. Ten minutes into the game, it was clumsy and missed the ball. Give it some more time, and it plays better than a human! The differentiation here is that instead of programing a traditional if-then-else construct which explicitly directs machines to take rules-based decisions; we instead create algorithms that allow them to learn how to perform a particular task optimally, getting better and better with each iteration.\u00a0\nNow the first thing to note is that machine learning is not new. Check out this 1959 definition by Arthur Samuel:\u00a0\u201cField of study that gives computers the ability to learn without being explicitly programmed\u201d.\nArthur Samuel is perhaps the father of machine learning. He certainly coined the term. The Samuel checkers-playing program is probably the world\u2019s first self-learning program, and an early demonstration of a broader concept: artificial intelligence.\nA very small history lesson\nReading Samuel\u2019s 1959 quote, you may have guessed that machine learning and AI in general are oldish. Which period do you think marks the beginning of AI? The 1930s, \u201940s, \u201950s? Before reading on, take a guess.\nThe conception of AI is actually ancient. Thousands of years ago, the Greeks discussed the possibility of placing a mind inside a mechanical body. This automaton was exemplified by Talos, a legendary mechanical man created to protect the land from pirates and invaders. As early as the seventeenth century, intellectuals including Gottfried Leibniz, Thomas Hobbes and Rene Descartes imagined that all thought could be represented as mathematical symbols (an idea which is fundamental to neural networks).\nAncient and medieval roots aside, the true \u2018modern\u2019 birth of AI can be attributed to Alan Turing\u2019s publication of \u2018Computing Machinery and Intelligence\u2019 in 1950. The paper gave rise to the famous Turing test. This test involved a computer, a human player and a human judge in a game. If the judge is unable to differentiate the human from the computer based on their interactions, the computer wins the game.\nOver the next several decades, there were many milestones in the development of machine learning. These ranged from the conceptualization of the first neural network (called Perceptron), an algorithm inspired by how neurons behave in the brain; to IBM Watson\u2019s defeat of the world champions at Jeopardy! in 2011. This was a triumph of natural language processing and represented a significant leap forward for \u2018cognitive\u2019 technologies, as Jeopardy! is not a mathematically precise rules-based game like chess where the number of possible moves are limited.\nBesides these big milestones, think about how we encounter machine learning every day, without even realising it. From the more obvious virtual assistants like Siri and Cortana (notice how they seem to get smarter after every interaction with you), to chatbots that can easily be mistaken for human service agents, recommender systems on search engines and websites, your ride-sharing app telling you the ETA of your next ride, the autopilot on the next flight you take, and your robo vacuum cleaner. Machine learning, and in a broader sense, AI, is creeping into virtually every aspect of our lives.\nSo what is the difference between AI, Machine Learning, and the even more mysterious Deep Learning?\u00a0\nYou may have noticed that I use the term \u2018machine learning\u2019 instead of AI. Most experts agree that \u2018true AI\u2019 is still very far from reality. In other words, the Terminator is not going to be around blowing things up very soon, unless of course it is sent from the very distant future:-)\nAs a rule of thumb, machine learning is a subset of AI. And another form of highly specialised machine learning called\u00a0deep learning is a subset of machine learning.\nDeep learning deals with learning data representations, rather than task-specific algorithms. Deep neural networks, deep belief networks and recurrent neural networks have shown remarkable results in computer vision, speech recognition, natural language processing, machine translation and bioinformatics. Deep learning, requiring greater computational power than simpler machine learning algorithms, have come to the fore largely because of advances in computational technology such as the creation of GPUs and CPU-GPU architecture.\nNow that you have a small taste of what machine learning is and some of its examples, let\u2019s get you started on understanding your very first ML algorithm\u2014linear regression.\nLet\u2019s take an example of housing prices in relation to the size in square feet of a house.\nWe have a table as follows: Size in square feet (X)\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014-> Price ($) in 1000s (Y) 2104\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014>460 1416\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014>232 1534\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014>315 852 \u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014>178 ...X \u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014->...Y\nThe goal of linear regression would be to fit a straight line to this dataset, which most accurately represents the relationship between X and Y. And then be able to predict the price of the house (\u2018Y\u2019), given the size in square feet of the house (\u2018X\u2019). The linear regression hypothesis can be represented in many ways. One such way is: HB(x)\u2014-\u2014-\u2014> Y; [A function which takes x and some paramaters as its input to compute the value of Y] HB(x) = B0*x1+ B1*x2......+ Bnxn\nB0 and B1 in this case, are the weights or parameter values, whereas x1 through xn represent the features.\nThough the housing example above is a simple univariate linear regression (i.e. with just one variable to predict the price of the house: the size in square feet), in a case where we use multiple variables (e.g. location of the house, age, and number of bedrooms); the weight assigned to each variable will become more important. For example, how much weight (or importance) should be attributed to the size of the house, as compared to the location of the house, in determining its price?\nThe goal of training the model in this case, would be to minimize the distance between the actual values of the price of the house, and the values predicted by our algorithm, until the cost function of the algorithm becomes 0. When this happens, we will have achieved a perfect linear regression model to predict the price of the house, given its size in square feet.\nToo simple? But remember that the intent of machine learning and particularly deep learning is about enabling machines to learn complex concepts by breaking them down into simpler concepts. Companies like PayPal use linear regression, in combination with other algorithms, to detect fraud.