Big Data Security, Privacy Concerns Remain Unanswered

Approaches to storing, managing, analyzing and mining Big Data are new, introducing security and privacy challenges within these processes. Big Data transmits and processes an individual's PII as part of a mass of data--millions to trillions of entries--flowing swiftly through new junctions, each with its own vulnerabilities.

By David Geer
Thu, December 05, 2013

CSO — Approaches to storing, managing, analyzing and mining Big Data are new, introducing security and privacy challenges within these processes. Big Data transmits and processes an individual's PII as part of a mass of data--millions to trillions of entries--flowing swiftly through new junctions, each with its own vulnerabilities.

[Chomsky, Gellman talk Big Data at MIT conference]

Deidentification masks PII, separating information that identifies someone from the rest of his or her data. The hope is that this process protects people's privacy, keeping information that would kindle biases and other misuse under wraps.

Reidentification science, which pieces PII back together reattaching it to the individual thwarts deidentification approaches that would protect Big Data, making it unrealistic to believe that deidentification can really maintain the security and privacy of personal information in Big Data scenarios.

Vulnerabilities, Exposure and Deidentification

Enterprises manage Big Data using large, complex systems that must execute hand-offs from system to system. "Typically an ETL procedure (extract, transfer, load) loads Big Data from a traditional RDBMS data warehouse onto a Hadoop cluster. Since most of that data is unstructured, the system runs a job in order to structure the data. Then the system hands it off to a relational database to serve it up, to a BI analyst, or to another data warehouse running Hadoop for storage, reference, and retrieval," explains Brian Christian, CTO, Zettaset. Any Big Data hand-offs or moves cross vulnerable junctions.

Creators of Big Data solutions never intended many of them to do what they do today. Take map reduce, for example. "Google invented map reduce to store public links so people can search them," says Christian. There were no worries about security because these were public links. Now enterprises use map reduce and NoSQL systems on medical and financial records, which should remain private. Because security is not inherent, enterprises and vendors have to retrofit these systems with security. "That's a big problem," says Christian, "vendors did not design firewalls and IDS for distributed computing architectures." These architectures tend to scale up to extremes beyond what traditional firewalls and IDS can natively address.

[Can we use Big Data to stop healthcare hacks?]

According to the Stanford Law Review article, vulnerabilities that expose PII subject people to scrutiny, raising concerns about acts of profiling, discrimination and exclusion based on an individual's demographics. These abuses can lead to loss of control for the individual. While brands use PII to market to customers to their benefit, those same vendors as well as law enforcement, government agencies and other third parties could also interpret and apply that personal data to the individual's detriment.

Continue Reading

Originally published on www.csoonline.com. Click here to read the original story.
Our Commenting Policies