Graph Databases Find Answers for the Sick and Their Healers

The Neo4j graph database is proving to be popular in the medical community for connecting different entities

A novel form of database that focuses on connections between entities, called a graph database, is finding a home in the health care industry.

"In health care, it turns out, there are quite a number of problems that involve understanding the connections between things," said Philip Rathle, vice president of products at Neo Technologies, which sells support subscriptions to its open source Neo4j graph database.

Diseases may have multiple symptoms. Doctors may belong to multiple heath care networks. There are also relationships between different types of organizations, such as insurance companies and hospitals. In the realm of bioinformatics, multiple connections exist among genes and proteins.

"There are a lot of connections happening, and graphs are good at matching connections," Rathle said.

Neo has landed a number of enterprise customers in the health care space, including the Curaspan Health Group, GoodStart Genetics, SharePractice and Janssen Pharmaceuticals, among others.

Neo4j has been used by them for tasks such as patient management, drug research, clinical trials, genomics, and marketing.

The health care industry is not alone in adopting graph databases -- Neo4j has also been used in telecommunications, financial services and hospitality. Neo4j has been used by a wide variety of organizations, including Cisco, Accenture, eBay and Walmart. The health care industry, however, seems to especially thrive from understanding connections between different entities.

A graph database differs from a typical relational database in that it stores the relations between entities in addition to the entities themselves and the properties for entities. As a result, database operations can quickly move across different, though related, entities, a process that for relational databases can be a headache to orchestrate as well as computationally intensive to the degree that would make such searches infeasible to execute in many cases.

"Most databases are designed for storing and retrieving individual bits of information," Rathle said. "But graph databases are designed to navigate and manage connected data."

Neo designed its database to be highly scalable. The company has customers running production databases, using a cluster of servers, with billions of relationships among different entities. The database comes with its own query language, called Cypher, a relational-like query language designed for determining relationships between entities.

HealthUnlocked is one of its health care customers. The London-based social networking outfit built a new service, called Health Graph, based on the Neo4j database. A graph database was a natural fit. It was able to link across a voluminous vocabulary describing all manner of symptoms and conditions in multiple languages.

A medical question may be asked using any one of a number of different terms, based on the level of medical education. So the system needs to make connections across many different terms in order to make a match, said Alex Trofymenko, HealthUnlocked's head of technology.

HealthUnlocked's user input is stored in a standard relational database, though the different connections between the entities that people discuss are stored on Neo4j. Neo4j provides "a much better way" of visualizing these connections, Trofymenko said.

In another medical use of the graph database, clinical diagnostics company GoodStart Genetics, specializing in inherited diseases, uses it to aggregate genetic carrier screening data from multiple sources, so it can be queried by scientists looking for signs of inherited diseases.

Life sciences analytics service Zephyr Health uses Neo4j to provide a query service for discovering new connections between data from multiple sources. The company found the database's flexibility and scalability to be instrumental in building its service.

Another user is Doximity, a professional network of over 300,000 U.S. physicians. Doximity uses the database as the basis of a recommendation service for the physicians, allowing them to contribute to and draw information from the service.

First developed in 2000, Neo4j is considered to be the most widely used graph database, though others exist in the market as well, such as GraphBase, HyperGraphDB, and Oracle Spatial and Graph.

Facebook has also built a graph database, called Tao, to map the connections across its 1.28 billion users. Social networks also seem to be a natural home for graph databases.

Neo released the latest version of the Neo4j database last month. It is available both in a free, downloadable, open-source community edition and a paid, supported edition with additional features.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the discussion
Be the first to comment on this article. Our Commenting Policies