by Thor Olavsrud

Compliance Dictionary aims for a simpler life

Jul 05, 2016
ComplianceIT StrategyTechnology Industry

With the assistance of machine learning, the UCF's Compliance Dictionary seeks to simplify the process of creating common controls with a lexicon that maps the connection between terms in authority documents.

Credit: Thinkstock

Compliance is hard. Globalization, an ever-growing corpus of regulations and increasing business complexity all conspire to make it challenging to understand, implement and prove regulatory compliance. With the Compliance Dictionary, Unified Compliance Framework (UCF) is aiming to change that.

Most authority documents — laws, regulations, international standards, contractual obligations, etc. — use custom terms. For instance, ‘Personally Identifiable Information’ (PII) was defined legally in a 2007 memorandum from the Executive Office of the President, Office of Management and Budget (OMB) and later adopted in the National Institute of Standards and Technology (NIST) Guide to Protecting the Confidentiality of Personally Identifiable Information (SP 800-122). But other regulatory and standards bodies frequently refer to PII as ‘identifying information,’ ‘personal information’ or ‘private information.’ In the European Union, EU directive 95/46/EC refers to it as ‘personal data.’

[ Related: How compliance can be an excuse to shun the cloud ]

While it may seem innocuous, it’s no laughing matter for auditors or employees responsible for implementing compliance mandates. Small variances in language, even misspellings and typos, can make it difficult or even impossible to properly configure automated compliance tools. If one compliance control refers to an “active recovery site” and then a new control refers to a “mirrored site” — in reality the same thing — companies have to start from scratch each time a new regulation is introduced, even if the issue was already addressed in a previous requirement.

Complying in harmony

For more than a decade, UCF has pursued the idea of “harmonized compliance” by mapping authority documents to identify overlaps between compliance mandates, thereby dramatically simplifying the process of scoping, defining and maintaining compliance.

“Over 75 percent of the authority documents mapped in the last decade by the Unified Compliance Framework team contain terms unique unto that document, that are not defined in the document, nor were they defined anywhere else at the time of the document’s authoring,” says Dorian Cougias, lead analyst of the UCF and author of The Compliance Book: A Unified Framework for IT Controls and Regulations. “It seems that authority document authors are so caught up in wanting to make their specific point and wanting to create terms of art that they often forget they are writing documents to be shared by a world-wide community. These documents call organizations to action while at the same time they also create maximum opportunities for misinterpretation.”

[ Related: How to manage the risks and costs of software compliance ]

UCF’s answer is the Compliance Dictionary, a lexicon that standardizes and unifies compliance terms and governance requirements. The idea is to create a concrete methodology to determine when a citation’s mandate can (or can’t) be mapped to a common control — a shared compliance requirement written in plain language and connected to the original mandates an organization must follow.

A mandate can be connected to a common control only if the verbs and nouns in the mandate are related. And this is where the multiple variations in terms in authority documents become a problem. If your common control states, “protect Personally Identifiable Information” and the citation’s mandate states, “safeguard private information,” you have to prove the terms are connected. In this case, you must show that “protect” and “safeguard” are synonyms and that the nouns “private information” and “Personally Identifiable Information” match.

The Compliance Dictionary uses a combination of Natural Language Processing (NLP), Part of Speech Tagging and Named Entity Recognition engines to map citations in a repeatable, scientific method to common controls.

[ Related: Cybersecurity much more than a compliance exercise ]

Corralling the horse out of the barn is a fairly accurate way of explaining what the UCF team does with terms of art it finds in published authority documents,” Cougias says. “During the citation mapping phase, the team of mappers use a combination of software and processes to scrape each citation for terms that already exist in its dictionary. The process then follows a patented procedure and uses additional patented tools to decipher whether the new terms should stand on their own, or whether they are simply additional non-standard forms of already accepted terms.”

At that point, Cougias says, each new term is tagged by the mapper who then creates a “new term process” derived directly from the citation it was taken from.

“Think of it, most of the terms you’ve learned to use in your life, you learned in context and weren’t given definitions for them,” says Vicki McEwen, the UCF’s head lexicographer. “It’s the same process we use to scrape authority document citations.”

If the authority document’s citation doesn’t provide enough evidence of a well-formed definition, the rest of the citations are also scoured for recurring usage and additional clues. Various additional dictionaries are then accessed through another patented structure for even more clues to the definition. Finally, if no authoritative sources prove useful, the team performs multiple Google searches before sending the term of art and its newly associated definition to the UCF’s lexicographer.

“Term definitions and making the distinction between allowing a term to stand on its own or be associated as a non-standard derivation of an existing term takes a great deal of effort,” McEwen says.

Machine learning keeps definitions clean

Machine learning assists the team in establishing a good definition and the capability to differentiate between new terms and sloppy nonstandard uses of a known terms.

The dictionary tracks exactly which term was pulled from which authority document and each element is linked, so users can search for terms and see how everything connects. Users can identify which authority documents apply to their organization and it will display which mandates apply and where they overlap.

The Compliance Dictionary will also aid regulators in writing authority documents. The UCF team was recently in Washington, D.C. to train banking regulators to run their content through the UCF’s mapping and dictionary process before releasing authority documents.

“This gives the regulators the opportunity to check their language and check their mandates prior to releasing it to the general public,” says Steven Piliero, UCF’s CTO.

“The UCF team is making its mapping processes and dictionary processes available to any and all regulators and international standards organizations so that the collective dictionary can be leveraged to make their language clearer and more standardized,” Cougias adds. “This two-pronged approach to harmonizing compliance language pays off in more ways than you’d initially think. The UCF’s dictionary goes far beyond a “what does it mean” definition. This is the dictionary of the future, today. The UCF’s dictionary includes definitions, check. Non-standard and other variations of a term’s spelling. Check. And in order to work with Natural Language Processing engines, each term is linked to its form variants such as plurals, possessives and verb tenses. Check. But what it does great is that it extends well beyond linking a term to its synonyms and antonyms (check again). It links each term to its hypernyms (things that are a part of it), meronyms (things that belong to it) and troponyms (other associations such as “referenced by it,” “enforced by it,” etc.).”

That last part, Cougias notes, is important because it gives the Compliance Dictionary the potential to help with artificial intelligence and cognitive learning.