Organizations have been rapidly adopting Hadoop and other big data technologies over the past several years, but it has been accompanied by a steady undercurrent of concern about the state of enterprise-grade security.
While Hadoop distribution vendors and the open source community have been working to add security and governance features to Hadoop, Redwood City, Calif.-based startup BlueTalon has been developing a policy engine intended to span an organization's data infrastructure, providing fine-grained access control and data masking for Hadoop clusters, relational database management systems (RDBMS), NoSQL data stores and more, on-premise, in the cloud and in hybrid cloud environments.
"We're the only company that's coming at this at an enterprise-wide basis," says Eric Tilenius, CEO of BlueTalon and formerly executive-in-residence at Scale Venture Partners. "Companies don't have one data system. Being able to have one consistent access control is really important to data-centric security. We work across all the various data sources."
Today, BlueTalon announced that Hadoop distribution vendor Cloudera has certified the BlueTalon Policy Engine 2.0 with Impala or Hive as part of Cloudera Enterprise.
[ Related: Clorox CIO discusses the real challenge of big data ]
The BlueTalon Policy Engine integrates with Impala and Hive as part of Cloudera Enterprise to achieve the following results:
- Provide filtering with fine-grained access control at the row, column, cell or partial cell levels.
- Dynamically mask data and allow users to utilize sensitive data in queries without revealing it.
- Provision precise data access by enabling role- and purpose-based data access. policies to be authored from a central, easy-to-use graphical user interface.
- Enforce consistent data access policies across users, applications and data repositories.
- Audit data access to ensure compliance with industry regulations such as HIPAA and PCI, and to quickly spot anomalous data requests before significant data leakage occurs.
Tilenius notes that organizations are increasingly putting their data in massive data repositories like data lakes, and while there are tremendous potential benefits in doing so, it also increases risk.
[ Related: 8 analytics trends to watch in 2015 ]
"Businesses nowadays run on data," he says. "It's not OK to just have one guy in the inner sanctum who tells you what the data is. People want direct access to the data. But Hadoop is among the least hardened systems in the enterprise."
"You say security and people think about things in black and white," he adds. "Authentication, Kerberos, encryption — people look at the perimeter. But when attacks come from compromised credentials, none of that protects you. It's not sufficient anymore. It's more important than ever to have a data-centric approach — what is the data and who should be access it and what can they see?"
[ Related: Startup launches big data-as-a-service ]
To that end, he says, it is essential to give users access to the data they need "and not a byte more." That's where role-based access, attribute-based access and dynamic data masking come in. The dynamic data masking even extends Hadoop's capabilities, Tilenius notes, as other access control systems within Hadoop would cause an error if a user makes a query that includes data they're not authorized to access. BlueTalon would allow the query but mask access to any data the user is not authorized for.
For instance, Tilenius explains, a banker might be authorized to see social security numbers for his or her direct clients, but not other clients. A query that includes both would return results, but the banker would only see social security numbers for direct clients.
In addition, the policy engine creates a full audit trail.
"One of the things we do that's really unique is we audit all the activity into or out of a database," Tilenius says. "Because we're a policy engine, for any user we'll kinow what they tried to do and what policies or rules they triggered. We can see both the query they requested as well as the data they received back at a metadata level."