by Paula Rooney

The Library of Congress goes digital

Nov 19, 2021
Cloud ComputingDigital TransformationGovernment IT

CIO Judith Conklin discusses the ongoing cloud migration and digitization of the world’s largest library — a massive endeavor to make more of its 170 million assets available to all.

Credit: Magdalena Petrova

CIO Judith Conklin has a tall task: migrating the world’s largest library to the cloud.

Conklin, who was promoted from deputy CIO in September after former CIO Bernard Barton retired, is leading the Library of Congress’ five-year digital transformation, which will see the institution migrating millions of books, historical collections, and congressional materials to a complex hybrid cloud environment. The move is part of a strategic IT plan launched in 2019 to digitize and make available much of the LOC’s more than 170 million physical assets to the public from any device.

“As the publishing world and library world in general goes more digital, the Library of Congress is going more digital,” says Conklin, who oversees roughly 400 employees in the Office of the CIO, including around 200 contractors.

The Library of Congress — which is housed in three buildings on Capitol Hill, the Madison, Adams, and Jefferson buildings — “ingests” new physical and digital data and metadata continuously. While the goal isn’t to digitize 100% of its materials, the transformation remains vast and complex, Conklin says. “There’s data we’ll keep on premises and then there are some that we want to gain the efficiencies … and elasticity … of the cloud,” she adds.

George Westerman, a principal research scientist and senior lecturer at the MIT Sloan School of Management, says the ambitious undertaking benefits all of society.

“It’s impressive how LOC is aiming to ‘throw open the treasure chest’ through digital, so it can make the library’s diverse artifacts available to citizens, teachers, and innovators around the country without requiring them to come to DC,” Westermen says.

Transforming the Library of Congress

The LOC initially brought in Accenture to help plan its now complete data center transformation. This three-year effort involved moving more than 130 library IT systems and applications out of an “obsolete” data center in the Madison building to a state-of-the-art Tier III data center outside of Washington, DC, as well as to other data centers and cloud services managed by the library and connected via a multi-path WAN.

Judith Conklin, CIO, Library of Congress Library of Congress

Judith Conklin, CIO, Library of Congress

With this enterprise cloud environment in place, the library is now focusing on the Enterprise Copyright System (ECS) for the Copyright Office, the Integrated Research and Information System (IRIS) project for the Congressional Research Service (CRS), and various projects to improve how the library accepts, manages, and delivers collections material, including an audio-visual content management system and a new library content platform.

The ECS project, which assigned copyright data to one of the big cloud providers, though Conklin declined to specific which of the big three, will make the process of applying for copyrights easier and more transparent. “More and more people want to register their materials for copyright,” says Conklin, making this a prime candidate for the scale and efficiency of the cloud.

The US Copyright Office, which comprises several divisions, including licensing recommendations and public records, relies on a mix of manual processes and those that have been automated through IT systems that must be modernized. Congress appropriated $60 million for this task and the library has a “very strict deadline” for its completion, Conklin says. It is expected to go live in October 2024.

Conklin, who is revamping the library’s project management procedures for the digital era, has also embarked on a five-year digital storage plan, which includes “ingesting” or absorbing many “born digital” collections that come into the library in digital format from a variety of sources, as well as digitized content from both houses of Congress.

The library has been storing digitized data for decades in traditional legacy systems, including many important historical documents and collections. Some digitized documents will remain on-premises, and not everything will be for public view.

“It’s not a goal to digitize 100% of our collections, and some people are dismayed by that,” says Conklin, noting the library budget does not allow for an infinite digital data warehouse, though she noted that Congress is loosening up on making more data public following passage of a law two years ago.

The US Constitution, for example, will not be going up on the Library of Congress website, Conklin says. However, the library’s digital transformation has had an impact on our understanding of the Constitution, as an in-house preservationist used spectral analysis of a digitized draft of the Constitution to uncover previously undiscovered edits.

“They analyzed layer after layer of this draft copy of the Constitution and they say they found edits … to the Constitution that weren’t known about,” Conklin says, comparing the discovery to retroactively turning on track changes.

The library is also dabbling in experimental artificial intelligence technologies such as computer vision, machine learning, and applications that focus on audio clips and visual art, much of which is made available as open source software.

Because the data and metadata coming into the Library of Congress is never ending, the job of digital transformation will never really be done. “That’s the struggle for every CIO,” Conklin says.

But no doubt, the LOC is light years ahead of where it was when it started its digital transformation.