Census Bureau Takes a Novel Technology Approach

The circus came to town. The big top appeared virtually overnight. In just a few months, the ringleaders set up 520 offices, hired more than half a million temporary ticket punchers and processed over a billion pieces of paper. And for good measure, they counted every person in the United States.

Welcome to Census 2000, the greatest show in government. Inactive and ignored for most of the decade, in the past few years the U.S. Census Bureau undertook the largest peacetime people and technology mobilization ever. With limited funds and a short window of execution, the decennial count challenged all those who strove to manage it. If you could graph the bureau’s budget and effort over the decade, it would look like the human cannonball’s EKG in the moments before, during and after his harrowing performance. Steady and flat in a state of

anxious readiness, a short but violent spike as he is shot through the air and then a return to calmness as the adrenaline fades.

The years after the last census, in 1990, were quiet. Around 1995 the strategizing for Census 2000 got under way. That year some of what would prove to be the key decisions were first discussed, and by 1997 the blueprint for Census 2000 was pretty much complete. It included some complicated and innovative technologies and massive effort with little margin for error. Counting heads millennium-style involved an unprecedented level of IT. To collect and process information on the approximately 275 million people in America, the bureau relied on 10,000 PCs, a network with more than 600 routers and over 33 terabytes of storage at the satellite sites alone. The deployment of technology was a departure from previous censuses, when most data was processed manually. For all the data-management advantages brought on by this IT intensivity, though, it also created one serious drawback: no safety nets. Most problems in earlier censuses could be overcome simply by assigning more workers. That solution wouldn’t work for software, however; inaccuracies in coding would be replicated million-fold. By automating Census 2000, the bureau sacrificed its safety nets. Even critical and complex steps in Census 2000 had no backup plans.

There was a simple explanation for this seeming deficiency, says Census Bureau CIO Rick Swartz, from his headquarters in Suitland, Md. With a project of this magnitude, plan B’s are pretty expensive." And with a tight budget and no time for mistakes, most of the time, plan B was simply to make plan A work."

With just a three-month window to collect and process the bulk of the data, a major failure at any step of the project could have been fatal. The bureau realized early on that the only way to assure a perfect performance was constant, comprehensive rehearsal. Each aspect of the hardware and software involved was tested until the bureau was certain it would work¿nd then it was tested some more. In some cases, testing lasted three years. Still, failure seemed a certainty early on. Both the federal inspector general and the General

Accounting Office issued reports predicting disaster. Besides the obligatory Y2K scare, these overseers couldn’t envision how the Census Bureau’s blend of experimental technologies and convention defying uses of established technologies could possibly succeed. Yet somehow Swartz and his staff got to the far pole without slipping.

Stand and Be Counted

A census has taken place every 10 years since 1790. Then, as now, the primary use of census data was reapportioning seats in the House of Representatives. Then, as now, there were concerns that some residents went uncounted. But where the initial questionnaire comprised six questions, Census 2000 asked more than 50 questions on the long form, which was sent to one in every six households in America. Many of these questions come from other federal agencies, which submit them to the Census Bureau to help design their strategies and allocate funding.

The Constitution mandates that the census be taken as of April 1 and that the final results reach the president by the end of December. That’s not much time to count a rapidly growing population and analyze its trends. The other federal agencies’ eagerness to ask questions doesn’t help. The race and ethnicity question alone present on both the long and short forms has 128 possible combinations. To plan for this data load, the Census Bureau had to commit to several critical decisions¿ow to process returned forms, manage the temporary workers and disseminate the completed data¿ears before the decisions’ results would be known.

In past censuses, the bureau handled all the major phases of the count: the data collection, data processing and data dissemination. It was clear early on that this wouldn’t be feasible for Census 2000, however. We set up our own processing centers last time. We wrote all the software and installed all the hardware," says Gary Doyle, the bureau’s manager of systems integration. But between 1991 and 1995 we couldn’t hire anybody. We actually lost people because the government was downsizing. And without staff you can’t do anything."

The bureau would have its hands full just managing the administrative aspects of the count. Setting up the 520 local offices and managing the estimated 700,000 census takers a workforce nearly twice that of General Motors¿as a challenge equal to or greater than administering the count and processing the returned data. For their first act of daring, the Census Bureau’s directors decided that the capture of information off completed forms should be outsourced. In March 1997, Lockheed Martin, with its new data capture system that included optical character recognition (OCR) technology and optical mark recognition (OMR), became the bureau’s partner.

This was a giant risk. The Census Bureau was counting on a minimum recognition rate of at least 50 percent, which would have required half or less of the data load to be keyed by hand. But at the time of the decision, no OCR system had ever come close to even a 40 percent success rate. There was a lot of debate about the OCR," recalls Swartz. There was no fail back.’" And for a while, there was some uncertainty. A lot of the guys who didn’t like the imaging would bring me articles saying another Lockheed Martin rocket blew up. That scared the heck out of us."

Another alarm was sounded during a dress rehearsal in 1998, in which the OCR’s accuracy was tested against a series of control forms. The event soon turned ugly. Two forms would go through at once and it would get mixed up," Swartz says. The machines would jam; there were a lot of problems. And you think to yourself, If this doesn’t work we have a billion pieces of paper waiting, and there is no plan B."

But Doyle and his team of testers were up to the challenge. At each stage of development they maximized the OCR’s capabilities. We put it in a simulated production environment and it worked. Then we gave it crappy feeds and crappy writing and it still worked," he says. Through constant iterative fine-tuning, the bureau’s confidence in the technology grew.

Ultimately, 83 percent of the data in the short forms and 63 percent of the data in the long forms were recognized with a 99 percent accuracy rate, leaving only a fraction of the work to be keyed by hand. By sticking with the decision to use OCR/OMR technology, the Census Bureau saved money, time and people. And since the OCR/OMR was able to capture the forms so successfully, the bureau didn’t need to save completed forms, as in the past. Instead, captured forms could be destroyed immediately, saving millions in storage costs.

Take Up a collection

Before the data could be captured it had to be collected; planning for and supervising the data collection presented another challenge. Managing the giant workforce at their disparate locations was a project management nightmare. In past censuses, scheming employees had found ways to cheat the bureau. In 1990, census takers at one location circulated the same completed forms before they were found out and dismissed. Others would return from an eight-hour day with only two completed forms or stacks of forms that all contained the same information. These scams weren’t detected until weeks had gone by, and they cost the bureau dollars and time.

To avoid such scenarios in Census 2000, the directors decided that the payroll system, which not only tallied the hours each employee worked but also the number of forms collected, should be updated daily. While each piece of software in the system was working fine on its own, making them work together required an unprecedented level of integration. This, according to Doyle, wasn’t so much a technological leap of faith as it was a leap of execution."

But that didn’t make it any less risky. If you keyed the stuff daily you had no room for error," says Swartz. If you paid weekly you could have a couple of days to fix a problem. But you don’t have that luxury with a daily payroll. It is an incredible workload and, if it didn’t work, it would have been a disaster." If the network failed or the payroll application had any glitches, time that the Census Bureau could ill afford to lose would be gone forever. Again the answer lay in testing.

The testing challenge facing the bureau was different than that of private IT shops. Beta testing was an impossibility. The bureau’s years of preparation telescoped into a critical three-month period, from April to June, that would make or break the 2000 census, Swartz says. With such a short window, every day was crucial; no branch could afford to lose productivity because of faulty software. Releasing a mostly finished product and hoping nothing went wrong was out of the question. We can’t have things fail when we have a half million people working temporarily," says Doyle. He, Swartz and their superiors had to have 100 percent certainty that everything would work flawlessly.

We have a very controlled testing environment. That’s the only way to have a plan A/no plan B mentality," Doyle continues. You can’t say, ¿ooks good enough, OK, let’s go with it.’ It has to work in the lab over time. And we don’t just test the initial system but everything thereafter." Between October 1998 and January 2000, Doyle tested 1,259 pieces of software that the bureau’s developers had deemed ready for release and still rejected over one-third. Making sure everything worked before it was released was only part of the challenge. To ensure against future breakdowns, the bureau had to find glitches before they happened relying on people to report problems would be too late. In 1997, a special lab, the Network Operations Center, was designed to monitor and maintain the census network.

According to the center’s chief, Gary Sweely, his staff tested every conceivable combination of programs and configurations. When an incompatibility or heavy traffic was detected, the problem could be fixed before it affected the network. All in all, the Network Operations Center spent about 15 percent of its budget on testing and quality assurance. Sweely is convinced that this commitment to testing was the key to Census 2000’s success.

While the approach to data collection was largely an aggressive one, caution ruled in one notable area. In 1995, the Census Bureau decided against using the Internet for data capture. There were too many competing standards at the time, and security was a major concern. In early 1998, as the Web boomed, that decision was reexamined, and the bureau set up a proof-of-concept data collection site as an alternative to mailing back the short form. To keep traffic to a minimum, however, the site was minimally advertised, and only about 70,000 households filed their census forms online. In every case the site worked flawlessly. None of the security breaches that seemed so threatening in 1995 occurred. The experiment went so well that Swartz is convinced a fully funded Web-based collection system could have supported 8 million users a day. The Internet proved its readiness for a major data capture role in the 2010 census, he says, and the success of the test site vaulted the Web to the lead of Census 2000’s dissemination effort.

The Great Data Dump

Prior to 1990, census information was released through volumes of printed books and reams of magnetic tape. The 1990 census pioneered the use of CD-ROMs. This year, books and CD-ROMs will still be available, but the primary means of data dissemination will be the Internet, via the American Fact Finder (AFF) website (factfinder.census.gov).

The Web offers the first real chance to make census data available to the audience from which it’s gathered, says E. Enrique Gomez, manager of the data access and dissemination systems (DADS) program and the head of the AFF project. Relatively few people used the data in tape and book form, and in 1990 few Americans could use CD-ROMs. Making the information available over the Internet, says Gomez, serves many different people in many different ways. The American public, and that includes anyone from academia to businesses to libraries, will be able to use this data to meet their needs."

1 2 Page 1
Page 1 of 2
7 secrets of successful remote IT teams