An IPFS addressable storage model for healthcare with blockchain

P2P file sharing is fun and easy. IPFS holds the power to create a P2P network of medical records — easy to share and access. Let’s explore three ideas that look around the corner into the future.

colorful row of file folders

What to do about the problem of medical record link rot in healthcare? What is link rot? Link rot is the process by which hyperlinks on individual websites or the internet in general point to web pages, servers or other resources that have become permanently unavailable. 

Medical information is available, but getting access proves difficult. Getting access to your unified medical profile sits well in the line of acceptable frustration, where patients are required to collect pieces of their medical profiles sprinkled across the country.

The lack of interoperability affects patients. IPFS presents a new approach for connecting information — a potential fix for medical record link rot.

A Napster for healthcare

BitTorrent, LimeWire (gasp), Napster, FastTrack, eDonkey, Gnutella and Vuze were among the many famous open-source peer-to-peer (P2P) clients or products supporting file sharing. BitTorrent and Gnutella enable the downloading of any files. Videos, music and software files can be swapped over the internet using these clients.

Napster was, however, a centralized model. In the case of Napster, peer computers would register with the core central server and provide their file-sharing lists. Peers sent queries to the central server, which retained a master index of all available files. The central server connected the query computer to a peer (with the requested files).

The strength of this approach was that the central server always knew where the files were located, searching was fast and efficient, and the answers were guaranteed to be correct. The challenge with the Napster model (aside from the obvious we all know about) was that it depended on a centralized server, which resulted in a single point of failure. The problem was magnified because the centralized server needed to have adequate computing power to handle all queries — a setup that could have resulted in unreliable service, if not appropriately scaled.

P2P sharing for a healthier tomorrow

The limitations associated with centralized servers spawned the emergence of Gnutella. Developed by Justin Frankel and Tom Pepper of Nullsoft, shortly after Nullsoft’s acquisition by AOL, Gnutella was the first major P2P client. This is where the story takes a turn.

The original developers announced the launch on Slashdot, but it was stopped by AOL over legal concerns. Thousands had already downloaded the decentralized peer-to-peer network client, and the protocol was reverse-engineered and open-source clones quickly surfaced across the internet. AOL later did release the source code under a GNU General Public License (GPL).

Why does it matter? Today, one reason records can't be shared is because of the multiple systems, databases and formats in which medical information is stored. A P2P approach for healthcare would inch us closer to a unified medical record — accessible by heterogeneous systems.

Idea 1: Distributed access to medical records, using the same file structure by heterogeneous systems.

IPFS: a foundation for the internet upgrade

How is data available today within the unstructured swamp of medical information? We have petabytes of individual genomic records that need hosting and real-time media streams to capture. How are these massive data sets going to be linked?

IPFS or InterPlanetary File System is a peer-to-peer distributed system that connects all networks using the same system of files. Specifically, IPFS is a content-addressable, P2P hypermedia distribution protocol. IPFS is the protocol to upgrade the web.

There is no trust between nodes, and there is no single point of failure. Does this sound like something we need in healthcare? Applying IPFS to store medical information would mean no trust requirement between providers and no single point of failure to prevent patients from accessing their medical records.

Merkle trees or hash trees are a core tenet of why blockchains add value. Merkle trees are a structure in which every nonleaf node is labeled with the hash of the labels or the values (leaves) of its child nodes. Huh? Each branch or link is dependent upon the previous branch, similar to a branch of a tree and its leaves.

Not familiar with hashes or Merkle trees? Not to worry. I’ll be covering these in a future post. For our discussion today, we’ll focus on IPFS, and its impact and macro principles to create a new spiderweb of connections on top of the Internet.

IPFS is a global distributed file system that forms a generalized Merkle-DAG, a directed acyclic graph whose objects are linked to each other (usually just by their cryptographic hash-a unique ID of sorts).

IPFS and the maze of healthcare records

This data structure is the successor to ADS the generically authenticated data structure. The initial advantage of ADS was the construction of a data structure whose operations could be carried out by an untrusted prover. Understanding the historical value of ADS makes the benefits of IPFS easier understand.

Why does it matter? Every patient could create a list of his or her medical records from any provider, all with an address, similar to bookmarking addresses in your favorite internet browser (only securely).

Idea 2: Medical records are addressable with permanent addresses (medical records with a permanent home).

Versioning of your medical records

IPFS is a distributed version-controlled filesystem of hashes. The fundamental principle of IPFS is that all data is part of the same Merkle DAG, a content-addressed block storage model with content-addressed hyperlinks. It doesn't matter where your information is located. As long as the address structure is standardized, this information could be accessible from any platform, database or system.

Unlike traditional networked provider-to-provider systems, with IPFS there are no privileged nodes. IPFS is the result of a mashing of distributed hash tables (DHT), BitTorrent, Git and Self-Certified Filesystems. The IPFS protocol also contains a set of seven subprotocols or principles that synthesize prior peer-to-peer concepts that assemble to form the backbone of IPFS.

1. Identities: peer node identification.

2. Network: govern the connections to other peers.

3. Routing: information relevant to locate peers and stored objects.

4. Exchange: protocol managing how blocks are distributed.

5. Objects: Merkle-DAG, content-addressable immutable objects and links.

6. Files: a versioned controlled file system.

7.Naming: mutable naming (permanent objects) with content-addressed DAG objects.

8. Applications: any application running over IPFS to leverage the new connected web.

The IPFS stack is a combination of eight elements: identity (of each node) + network + routing (distributed hash tables) + exchange (BitTorrent) + merkledag (git) + naming (Self-Certified Filesystems) + applications (web). Together these elements form the IPFS stack — a stack that will be used to standardize the accessibility of medical records. IPFS is the most impactful data structure you haven't yet discovered. Building a new application? You should consider IPFS compatibility. Establishing a new innovation data-centric value offering? If you're not discussing IPFS, you're heading in the wrong direction.

Building the internet of data structures (IoDS)

Identities ensure that connecting peers exchange public keys. The public and private keys are encrypted with a passphrase. The Network addresses the transport protocol, network reliability, connectivity, integrity and authenticity (checking sender’s public key). The Routing ensures peers can find other peers and which peers can serve other peers (technically using a distributed sloppy hash table (DSHT) based on S/Kademlia and Coral). The Exchange sends and receives blocks of distributed data with a BitTorrent-inspired protocol, called the BitSwap protocol. The Objects sit above the distributed hash table and BitSwap (a peer-to-peer system for storing and distributing blocks quickly). The Merkle-DAG links objects are connected by cryptographic hashes of the targets embedded in the sources. Much of this is available with Git data structures, but Merkle-DAGs offer the added benefits of context addressing (including links), tamper resistance and deduplication. The Files are a set of objects (block of data, list of blocks, tree of collections and commits (version history snapshots). The Naming ensures that objects are permanent and can be retrieved by their hash, among other properties. Lastly, Applications can run over the internet and leverage the principles and features of IPFS to create a web of Merkle-links connecting data (objects and blocks) for business applications.

Why does any of this matter? Any change, update or tweak would be listed in order. Patients could be informed and have access to every addition to their personal medical record — a historical longitudinal medical record that never is misplaced.

Idea 3: IPFS will impact every healthcare application deployed by 2020.

Healthcare is going to experience a transformation in data structures that hold our patient and clinical information. The transformation won’t be by disruptive technology; it will be by foundational technology. Think about how your business and technical foundations are shifting and begin the transformation today within your organization.

Copyright © 2017 IDG Communications, Inc.

Download CIO's Winter 2021 digital issue: Supercharging IT innovation