Recently, a friend informed me that he was still working on an artificial intelligence (AI) model for an automated stock trading platform, a project I recalled him saying would be completed a few months prior. My natural proclivity was to question when the project would be done, so I could obtain a copy of the code and quit my daytime job. However, my friend complained that his model had “overfitted” and that it was not performing as well on real live data.
Consequently, I began thinking of the broader problem of how the type of machine learning (ML) model and the training data influence the accuracy of the ML model – not a big surprise here. However, given the complexities of today’s models, this simple fact is easy to overlook. Incorrect output is bad enough when you are trying to beat the stock market, but as we come to unquestioningly rely on AI models with little or no human intervention, it could have far-reaching effects in other areas. Since ML will take over many routine tasks going forward, much is at stake here: from inconveniences caused by wrongly denied loan applications, to fatal accidents by autonomous driving vehicles failing to recognize a human with different skin tone.
While the term overfitting (or “underfitting”) may not be commonly known, we are all familiar with the term “bias” and with how bias affects our daily lives. Some of the bias we experience is conscious, but sometimes we fail to recognize the unconscious biases we harbor. The human brain, despite its sophistication, often takes shortcuts for making quick decisions and, unfortunately, this leads to simplifications like stereotypes. This is an intriguing topic in cognitive neuropsychology, which is not addressed herein, and I refer the reader to other sources. Because of these simplifications, all learning models are prone to developing biases, which necessitates extra caution in designing AI models that impact lives daily.
Recognizing bias in decision making
Here is another example of the undesirable effects of failing to recognize bias in our own decision making – one I’m fond of because it has the elements of a thriller, with the “usual suspects,” and even a twist at the end…
Approximately 10 years ago, a major city decided to collect data from drivers with smartphones every time they hit a pothole. The sensors in the smartphones at the time were sensitive enough to record these jolts. In principle, it seemed like a good idea, since the data would indicate which roads with potholes were more frequently used and, hence, which ones needed urgent repairs. In a well-meaning effort to allow the data to do the talking, their decision on which roads to repair was purely driven by the generated data.
Unfortunately, city officials were accused of favoritism when roads in more affluent neighborhoods were repaired, while those located in poorer neighborhoods lay in blatant disrepair. But how could this be a case of human bias? In the end, the manner of data collection was identified as the culprit.
Recall that, over a decade ago, only richer neighborhoods could afford smartphones. This meant that the source of the collected data was mostly generated in affluent areas and was not representative of the city’s pot hole problems as a whole. There was clearly a measurement bias here.
Other types of bias
Somewhat related, a sample bias occurs where sampled data is not representative of the universal dataset. Measurement and sample are two of the four kinds of bias that should be taken into consideration in creating a new model.
Going back to my friend’s dilemma with his misbehaving ML model… When he used the term overfitted, it immediately brought back memories of statistics classes in college, and of how this problem is just as relevant today as it was to the good old days of curve fitting and regression analysis.
To be clear, statisticians and data scientists use the term bias slightly differently. Bias indicates how close the predicted values are to the actual values. If the average predicted values and actual values are very far apart, then bias is said to be high, and low for the opposite. Another model property, “variance,” occurs when the model does well on the training data, but not on actual data.
The source of the output error is sometimes based on the mathematical limitations of the ML model itself. Parametric-based models, like linear regression, logistic regression or linear discriminant analysis, are prone to this type of algorithmic bias. A problem occurs when the ML model is underfitting the data, has too little complexity (high bias, low variance) and is making too many generalizations about the input data set. In other words, it hasn’t completely captured the signals in the data. The opposite problem of overfitting occurs when the ML model is too complex (low bias, high variance) and is using the signals and noise in the data to create a highly tuned model. Either way, this leads to incorrect results when the ML model is exposed to a novel, more comprehensive dataset.
Ideally, we would like to have low bias and low variance. Unfortunately, this isn’t mathematically possible, and we usually must find an in-between zone. These are algorithmic limitations of the model itself, and that’s why picking the right model for complex data is crucial.
The last bias I want to talk about is prejudice bias. This is a particularly tricky one to identify, because its effects are hidden in the data. Let’s say you did everything correctly, picked the correct ML model for the complexity, identified a representative training dataset and were extra careful to avoid both sample and measurement bias. You still wouldn’t know if the resulting dataset was inherently biased based on historical decision-making principles.
A noteworthy example of the challenges faced by AI designers is exemplified in the lessons learned with Microsoft’s short-lived AI chatbot Tay that very quickly picked up on human prejudices online and had to be shut down in a matter of hours. Another example of prejudicial bias is when an HR filtering algorithm learns from its training data that more men have higher paying jobs, and hence avoids women as high-paying-job candidates. Although historically, this may have been true, we’ve made great strides as a society, and we risk losing those gains if we are not vigilant to what our machines are learning from our history books.
Often this bias occurs when there isn’t enough data on the various members contained within the data. For training datasets involving people, make sure there is an equal distribution of data on each gender, race, etcetera. This might be possible in some cases and not in others, which is the reason why many AI luminaries have called for training data transparency. Much like open source code, others are free to examine the data used to train your models.
Minimizing bias errors
Alas, there is no silver bullet that eliminates all our ML learning woes, but here is a summary of what you can do minimize the errors.
- Pick the right ML model for the complexity of the data. This is a trial-and-error process, and you typically should try several to trade-off on speed and accuracy, among other factors.
- Understand your data (easier said than done), and then – to the best of your abilities – pick a representative dataset. If you must sample, make sure you’re giving equal representation to the various elements upon which decisions will be made. Again, the more you know about your data going in, the better off you’ll be in spotting shoddy output, especially in cases of ethical violations.
- Always trial your ML models against live data. This should never be a one-off process. In fact, a periodic audit of machine versus human, wherever possible, should be conducted.
To learn more
You can find more information on topics referenced in this blog at the following links:
To learn more about unlocking the value of data with artificial intelligence systems, explore Dell EMC AI Solutions and Dell EMC HPC Solutions.