Europe’s new privacy law, called the General Data Protection Regulation, is scheduled to go into effect in May 2018. It requires companies to give E.U. citizens “meaningful information about the logic” of automated decision-making processes. In many cases, global companies that do business in Europe will need to disclose what factors go into the algorithms they use. They are not required, however, to provide a complex explanation of the algorithm used or disclose the formula or source code involved.
In the U.S., disclosure rules like this have been in place for the financial services industry. For example, if a lender uses information in a consumer report to provide credit on terms that are materially less favorable than the most favorable terms available to a substantial proportion of his consumers, then he must disclose that a consumer report includes information about the consumer’s credit history. He must also describe the type of information included in that credit history; that the lender reviewed the consumer’s account using information from this report; and that the annual percentage rate on the account has been increased based on the information from this report.
If a credit score was used in increasing the annual percentage rate, then the lender must include a statement that a credit score was used to set the terms of the loan and that a credit score is a number that takes into account the information in a consumer report. The lender must also disclose the credit score used in making the credit decision, the range of possible credit scores, and all the key factors that adversely affected the credit score, not to exceed four.
To be clear, companies need not disclose the formula used to calculate the score, just the major factors that determine its level. This protects valuable trade secrets and rewards efforts to improve these risk scores.
Something similar is contained in the new EU privacy law, but applicable more broadly to all companies that control data on European citizens and use that data to make automated decisions. How broad the requirement is depends on the interpretation of automated decisions, which will be the subject of another column. According to a group of European data protection authorities charged with enforcing the new law, these companies must find “simple ways to tell the data subject about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm.” It is very good news that, according to this regulatory group, companies will not be required to disclose the formula used to calculate a score used for decision making or the source code that might implement this formula on a computer, not even to the regulators.
In an illustrative example that reflects current U.S. law, E.U. regulators say that if a company uses credit scoring to deny an individual’s loan application, it must provide the “details of the main characteristics considered in reaching the decision, the source of this information and the relevance…(including)…the information provided by the data subject on the application form; information about previous account conduct , including any payment arrears; and official public records information such as fraud record information and insolvency records.”
Could this need to provide explanations impede improvements in accuracy derived from machine learning, which produces models that are notoriously inscrutable, even to their developers? Could companies be barred from using these more accurate but less intelligible algorithms because their functioning cannot be explained to consumers?
No, and the U.S. company FICO shows us why. FICO uses machine learning techniques to analyze the data typically available in consumer reports to generate the credit scoring model that most accurately fits the data. The innovation is to allow the data to speak, rather than impose on it the data scientist’s own selection of factors, interactions, weights, and functional form of the model.
Not surprisingly credit models derived this way are much more accurate than traditional models. Using machine learning techniques FICO discovered a “powerful interaction between recency and frequency of card usage.” Including this interactive effect led to a 10% improvement in model performance. A more nuanced version of this same interactive effect increased performance by an additional 15%. This improvement did not sacrifice the ability to explain the output of the algorithm as driven by this complex but understandable combination of factors.
As FICO points out, explainable AI is needed to fully comply with the new E.U. data protection law. But companies need not sacrifice the benefits of machine learning accuracy to do this.