The government's release of the 1940 census will give researchers access to 130 million records. Ancestry.com has been preparing for the expected spike in traffic at its website, while applying artificial intelligence to help people find ancestors in its giant databases. For Ancestry.com, big data is about to get even bigger. The subscription-based website for finding long-lost relatives already has 6.7 billion historical records and 4.8 billion people named in family trees on its website. But now it’s adding the 1940 United States Federal Census, which the federal government will release on Monday. The National Archives has turned the 1940 census paperwork into more than 3.8 million digital images. The online archive—being released after a 72-year waiting period—will be a gold mine for people just beginning to compile their family history, though it will become easier to use once the images are indexed. When Ancestry.com’s database and index are complete, users will be able to search more than 130 million census records using fields such name, street address, county and state. Scott Sorensen, Ancestry.com’s senior vice president of engineering and its top IT executive, says that his staff has been busily preparing systems for the expected deluge of search requests. The company learned its lesson two years ago from a huge spike in website traffic during the TV show “Who Do You Think You Are,” in which celebrities such as Sarah Jessica Parker discover clues about their ancestors. At the first commercial break, many inspired viewers apparently dashed to their computers to try their hand at family research. Ancestry.com had prepared for a 300 percent spike in traffic from TV viewers, but the website was slammed by traffic that was (in some cases) 21 times the usual pattern, which “brought us to our knees,” Sorensen says. Since then, the company has added servers and beefed up its network and infrastructure to support bigger surges in traffic, he says. The company has nearly 5,000 servers at its data center and uses a variety of tools to handle its big data work, including the data-mining software Hadoop; traditional relational database software; statistical software called R; algorithms that employ machine learning, a form of artificial intelligence; and Mongo DB, database software that creates linkages among the public family trees posted on the site. The Provo, Utah-based company had about $400 million in sales last year and has about 1,000 employees, according to Hoovers.com. It currently has 1.7 million subscribers. The key business goal at Ancestry.com is to broaden its customer base to include people who are curious about their ancestors but aren’t experienced researchers. Sorensen’s job is to use technology to make the discovery of ancestors as easy as possible—so the first-time searchers don’t go away disappointed. Consequently, his technology group works to improve customer metrics such as “time to first discovery” and (for long-time subscribers) “number of discoveries in a week.” The company continues to enhance the “power-user tools” for sophisticated researchers, too, Sorensen says. Three years ago, most ancestor discoveries were made through the company’s custom search engine, but now more discoveries are made through “hinting,” whereby Ancestry.com’s artificial intelligence technology suggests likely connections or records. “We take the massive amounts of data we have, and the billions of records that people have attached to the family trees, to do record linking and record matching,” Sorensen says. “So you start with 40 million Smith names, and then 4 million John Smiths, but what you want are the four records about your great-great-grandfather John Smith. Our record-linking technology will try to surface those four records and give you a hint,” he explained. “We try to make those discoveries more automatic.” What does the future hold? Sorensen says he envisions a time when the company adds socio-economic data to the classic genealogical data to provide more colorful information and context about ancestors. He offered this example: “I can see [from the 1930 census] that my great-great-grandfather had a radio, and was the only person on the block to have a radio. Then [with socio-economic data] here’s the additional color that shows what percentage of people had a radio in that time and place.” Mitch Betts is CIO magazine’s executive editor. Follow him on Twitter: @mitchbetts. Related content brandpost Sponsored by SAP Generative AI’s ‘show me the money’ moment We’re past the hype and slick gen AI sales pitches. Business leaders want results. By Julia White Nov 30, 2023 5 mins Artificial Intelligence brandpost Sponsored by Zscaler How customers capture real economic value with zero trust Unleashing economic value: Zscaler's Zero Trust Exchange transforms security architecture while cutting costs. By Zscaler Nov 30, 2023 4 mins Security brandpost Sponsored by SAP A cloud-based solution to rescue millions from energy poverty Aware of the correlation between energy and financial poverty, Savannah Energy is helping to generate clean, competitively priced electricity across Africa by integrating its old systems into one cloud-based platform. By Keith E. Greenberg, SAP Contributor Nov 30, 2023 5 mins Digital Transformation feature 8 change management questions every IT leader must answer Designed to speed adoption and achieve business outcomes, change management hasn’t historically been a strength of IT orgs. It’s time to flip that script by asking hard questions to hone change strategies. By Stephanie Overby Nov 30, 2023 10 mins Change Management IT Leadership Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe