In today\u2019s enterprises, machine learning has come of age. No longer a niche application, ML is now increasingly used for mission-critical applications and executive decision-making. Business leaders, for example, are using ML algorithms to determine new markets to enter, understand how to best support their customers and identify opportunities to make strategic financial investments.\nGiven the growing importance of ML to the enterprise, CIOs need to be sure their ML algorithms are producing accurate, trustworthy insights that are free from data set bias. And unfortunately, this is often not the case.\nA growing body of research has identified cases where ML algorithms discriminate based on such classes as race and gender. In a 2018 study, for instance, researchers found that a facial recognition tool used by law-enforcement misidentified 35 percent of dark-skinned women as men, while the error rate for light-skinned men was only 0.8 percent.\nSo how did we get here? The simple version of the story is that algorithms learn based on both data and the labels on the data. So if there is a problem in the data or the labels, the algorithm will probably learn to recognize an incorrect pattern.\nThis is a growing problem. A study by a trio of researchers from MIT and Amazon found evidence of pervasive label errors in data test sets. These errors, triggered by human mistakes, cause ML algorithms to learn erroneous patterns, which in turn destabilizes ML benchmarks.\nHere\u2019s one example of what\u2019s at risk. When algorithms are used in the hiring process, data sets of previous hiring years can include more males working than females due to historical bias. If this goes by undetected, algorithms are likely to suggest that more males are hired than females.\u00a0\nClearly, this is much more than an academic issue. Erroneous pattern recognition in ML algorithms can lead to discrimination against groups of people, impacting their livelihoods. Errant algorithms also put businesses at risk if they do not assess the quality of the data sets their models are learning on. This failure opens businesses up to the threat of litigation and audits with significant legal and financial consequences.\nThe way forward\nFor CIOs who want to help their enterprises move beyond the threat of bias in ML data sets, we propose a series of steps that can help ensure the integrity of the results produced by algorithms.\nStep 1. Educate your people.\nTo fix a problem, companies and their employees have to first recognize that they have a problem. With that thought in mind, work to educate your people on the bias challenge. Raise awareness that the threat of bias is an issue in all data sets, and explain the importance of mitigating bias.\nStep 2. Build data-set analysis into your processes.\nTo ward off bias in data sets, you can\u2019t leave things to chance. Processes make perfect. To that end, make data-set analysis part of your ML development processes. Put systems in place to check for unwanted associations in your data, such as cases where a specific group is more likely to be related to a specific result due to bias. And check for equal representation of all concerned parties within a data set.\nStep 3. Establish a review committee.\nTo ensure the integrity of your ML development processes, establish an independent review committee that is tasked with assessment of bias in your data sets. This commitment should ensure that data sets are auditable, so they can be revisited after they have been put into service.\nWhen you take these steps, your IT team can have greater confidence in the outcomes of your machine learning algorithms, knowing they are not unintentionally biased. Your management team, in turn, can make better-informed decisions based on the insights generated by your ML applications. And over time, these steps will help your organization build fair data sets from the ground up.\nTo learn more\nTo explore innovative solutions for machine learning in the enterprise with Dell Technologies, visit \u201cGo from AI-possible to AI-ready.\u201d\nLearn more about analytics solutions from Dell Technologies and Intel.\n Proceedings of Machine Learning Research, \u201cGender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,\u201d 2018.\n Preprint under review, via arXiv, \u201cPervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks,\u201d accessed May 31, 2021.