What do humans and HDD’s have in common?

gettyimages 829563242 1280x1280
Dell Technologies

As part of the DELL EMC data science team, one of holy grails for our team has always been to develop a model for predicting drive failures. Rotating magnetic drives are still the main technology for storing digital information, and at DELL EMC they can reach up to thousands in a single server or a data center. As part of a broader trend, automated solutions for predicting hardware failures introduce innovation and AI into DELL EMC core products. The ultimate goal is obviously to avoid a data loss and improve the customer experience, but as part of a pilot run in a DELL data center, we were able to prove that such capabilities also save highly on technician costs, and that with the level of accuracy achieved, ROI can reach to millions of dollars per year. This is an example of Predictive Maintenance, which is based on the premise that if we monitor and analyze all the different attributes that make a machine or a process, we can predict most incidences in advance and drive great value for the business.

A recent project of the team involved developing People Analytics capabilities for a DELL strategic customer. In general, People Analytics strives to capitalize HR data with the use of data science tools in order maintain quality employees in the organization. This led me to the question: In terms of data science, which is the ideal combination of business insight and statistics, what is the difference between humans and hard drives?

The huge potential of People Analytics

The main goal of this relatively new type of analytics deployed by organizations is to assist managers to make data-driven decisions concerning their employees. Example questions that can be answered with People Analytics are: “how well does an employee fit with the requirements for a position?” and “who are the employees at the highest risk of leaving the organization?”  Human resource is an organization’s most valuable and hard to replace asset. Research show that the price of hiring and on-boarding a new employee is estimated at $15,000, and the loss for a bad hire can easily reach $850,000. No wonder corporations (IBM, the US army and Google) invest more than $400 billion per year for developing their own predictive tools.  A key investment area is HR related data collection and integration of tagged digital information into one data lake, providing a new centralized and reliable source of information. This broader view is critical to achieve some sort of Predictive Maintenance and allow for significant changes in policies and decision making.

The first step towards a comprehensive decision support tool is to define the data sources one aims to include in the analysis. Since employees, and therefore employee data, is one of the central and most sensitive assets of any organization, it is likely that data is collected and maintained in individual systems that don’t necessarily communicate with each other. This may include demographic, salary, performance ratings, attendance and more. As this is a relatively new field of practice for most companies it is not yet considered common practice to record all this data in sufficient resolution, but this should be the aspiration in any Data Science project – your insights can only be as good as your data. All the available data should be obtained, organized and tagged in a way that allows for seamless access and analytics.

data driven decision support Dell Technologies

One fundamental difference between HDD failure and employees is that unlike HDD’s, humans at times, if not often act irrationally. Thus, the assumption that the collected data characteristics can tell us the entire story does not necessarily hold. To address this, the original data is enriched with calculated features based on domain specific knowledge of the system. For example, to predict attrition in organizations that stress the element of competition and personal excellence, data scientists can enrich the data with comparative features, such as how much an employee is promoted or rated relative to his/her co-workers.

Enabling data driven business decisions

An example output of a Machine Learning model trained on HR data is the probability that an employee will leave the organization in a specific time window. However, this output by itself is not enough - the goal of data science in an organizational environment is to provide insights that can be acted upon. Should a probability of 60% chance of leaving entail a different action by an organization than 70% chance of leaving? To this end, model explainability is key. An additional output is the reasons motivating this employee to leave or stay. To this end we used Shap values that constitute a sorted list of signed normalized weights, one for each of the features considered by the model to make the prediction for a specific employee (Learn more on explainable AI and Shap values here). This helps managers understand who the individual is and what are his/her main drivers for action. Such a data-based depiction of an employee’s motivations gives his/her manager what to focus on during engagements with the employee.

probability to leave chart Dell Technologies


So, are employees comparable to hard drives when applying Data Science solutions? The human brain contains 900 to 100 billion nerve cells, with each cell making about 1000 connections to other neurons. This translates to 100 terabytes of information assuming each connection holds 1 byte of information, which is comparable to the amount of information a machine or a cluster of machines can hold nowadays. But still, human behavior and motivations are very complicated to capture, and in many cases the raw collected data will not tell the entire story. This brings forth the case for having a data-based decision support system in place for assisting managers and policy makers in making the right, non-biased business decisions relating the human assets driving their organization.

To summarize, People Analytics offers numerous possibilities for business questions that need to be explored. Managers should adopt this new tool by first investing in the collection of high-quality HR data, then applying both Machine Learning and domain expertise knowledge to capture employees’ behaviors and develop meaningful Data Science solutions with a proven value to the organization.

Copyright © 2020 IDG Communications, Inc.