Inside a Network Operations Center

Harvard's NOC uses tools from TopLayer and Q1 Labs to keep an eye out for security problems.

I recently had a chance to visit Harvard University's network surveillance center. One doesn't normally see the words university and network surveillance in the same sentence, because surveillance of any kind is usually seen as being at odds with the tradition of academic freedom present at most universities. Unfortunately, higher education has long been associated with Internet-related computer crime—both as victims and as the home institution of many perpetrators. As a result, many universities have had to make significant investment in various kinds of network monitoring.

What makes Harvard's network surveillance notable is not the fact that Crimson engages in network surveillance but the scale and technical sophistication of those monitoring operations. Harvard has 6-gigabit connections to both Tier 1 Internet providers and Internet2. Between 10 and 20 terabytes of data moves across Harvard's border every day. What's more, traffic frequently undergoes asymmetric routing, which means that packets travel across different border routers depending on whether they are leaving Harvard or returning—one of the unfortunate consequences of something known as "hot potato routing."

Yet despite this complexity, Harvard manages to categorize and record information about practically every packet crossing its borders.

To find out how Harvard works this magic, I met with Jay Tumas, Harvard’s network operations manager. It wasn’t a long walk: Jay's office at University Information Systems is just a block down the street from my office at the School of Engineering and Applied Science.

No Packet Left Behind

Harvard's connections to the Internet and Internet2 take place in three physical locations: two in Boston and one in Cambridge. But rather than deploy intrusion and anomaly-detection systems at the border, Tumas has built a dedicated monitoring system that takes all critical traffic, makes a copy of every packet and sends those copies to the network surveillance center on 10-gigabit optical fibers. There the flows are reassembled using Cisco switches and sorted according to protocol family using a cluster of Top Layer 4508 IDS Balancers.

This architecture both lets Harvard split the load among multiple systems—it’s too much data for one IDS—and lets each IDS be configured with only the signatures that it actually needs, which makes each IDS run faster than it would if it were responsible for the full protocol suite.

"Last year we had over 10 million IDS hits," says Tumas. But instead of sending out an alert for each hit or just tabulating them in some log file that nobody ever really reads, Harvard has built a reactive system that rates the severity of each IDS hit, judges the chance of a false positive and then automatically alerts the responsible security manager.

The Harvard Network Operations Center has a database with between 1,500 and 2,000 registered system and network managers. When the IDS detects a "hit," the system tries to correlate the hit with other hits. If enough tests pass, the system auto alerts and sends a missive to the responsible manager. Last year roughly 10,000 such messages went out. "We want people to treat the auto alerts as gospel," says Network Security Manager David LaPorte, who works for Tumas.

Real-time alerts are an important part of network surveillance, but without the ability to look back in time, alerts are of limited use. It's important to find systems that have been compromised. But once you've found these systems, it's equally important to evaluate the damage that's been done. For example, says Tumas, Harvard's IDS system recently discovered a Microsoft Active Directory domain controller that had been hacked. Not surprising, none of the system's logs had been turned on.

To find out what had happened to the system, Tumas and his team turned to QRadar, a security monitoring system sold by Q1 Labs. QRadar monitors multiple sources of information, including packet traces, network flows and security events; builds a model of the network; uses the real-time information to update the model; and archives information as necessary to permit event reconstruction at some future time.

Just as every packet in and out of Harvard gets evaluated by the IDS systems, every packet also gets processed by QRadar. The system analyzes the packets, reconstructs the UDP and TCP streams, decodes the protocols, determines whether protocols are running on the correct port and updates a database of what it's learned in real-time. The system can also be programmed to record part or all of every packet that it sees, although doing so obviously requires a significant amount of storage for a network the size of Harvard's.

"We data-mined every single connection that this system created across the border, then went through and picked out the things that were not typical command-and-control bot traffic—anything that we couldn't identify," Tumas says.

It turned out that the compromised system had participated in a 350-megabyte file transfer with a computer system at another university. This was a matter of great concern. So Harvard contacted the other university and had it look at the other compromised system. The administrators at the other school found the files—350 megabytes of French music. "They weren't in [the system] long enough to discover the value of what they had," Tumas surmises.

In another case, a network administrator at Harvard Medical School called up to complain that its network was under attack. The operators in the Network Operations Center logged in to the QRadar system and immediately saw that the medical school was experiencing a "smurf" denial-of-service attack. The team then put a few additional rules on the Harvard border routers and the attack ended.

"I've never come across a tool that has been able to give the pivot views of data as quickly as QRadar," says Tumas. The system lets Tumas quickly see the total levels of traffic and then break them down according to different categories, such as network protocol, administrative controls, geographical location, time or security severity.

The QRadar system runs on a dedicated dual-processor server running Linux. The packets and databases are stored on a 6-terabyte storage area network connected with fibre channel. When I spoke with Tumas the system was recording the first 64 bytes of every packet, which translated to roughly 30 days' worth of data. It turns out, though, that storing the first 64 bytes of each packet isn’t tremendously useful—you can't reassemble images or webpages, for example. The plans are to reconfigure the system so that it just keeps metadata about each network connection but discards each packet. With this change, the system should be able to keep six months' worth of forensic information.

Like many modern security appliances, QRadar is accessed over the Internet using a Java applet that runs inside a Web browser. The system at Harvard has been set up so that individual network managers can view the data associated with their own networks. This allows managers to solve their own problems without bothering the team at the network operations center. It also means that QRadar can be used for network debugging and even performance turning, rather than using it solely for security management.

Needs Improvement

For all of this power, there are at least two problems with the QRadar system that were evident to me during my tour—one that's currently a limitation with the system, and one that isn’t.

The annoying limitation with QRadar is that the system really doesn't understand how packets are routed on the Internet—it doesn't understand about Internet autonomous systems, peering relationships and the Border Gateway Protocol (BGP). When QRadar sees traffic leaving Harvard it knows the destination network, but it doesn't necessarily know the destination organization. If QRadar understood BGP, it could actually build a map of various networks that the leaving packet was due to traverse. The Harvard network operations group would like to see this deficiency addressed—and the sooner, the better.

But a deeper problem is that QRadar does make it possible to engage in a kind of surveillance that really isn't appropriate at a university. Out of the box, the system exhibits all kinds of intrusive and inappropriate behavior—at least, inappropriate at Harvard. For example, the system can build a profile with the IP addresses of computers at Harvard that are going to porn sites, Internet gambling sites, job boards and so on. This data could trivially be cross-tabulated against authentication logs or Ethernet media access control (MAC) addresses to produce detailed reports of each user at the university. At the same time, the system is not keeping detailed logs about its users. It knows when they log in and log out, but it doesn't keep audits of who is searching for what kind of data.

Although it's tremendously important that organizations have the ability to reconstruct what’s happened in the past, it's also important to be able to detect when this ability is abused. One way to do that is by having surveillance systems automatically generating logs and reports of their own use. We use this sort of approach in our government, where surveillance requests are reviewed in detail both before and after the surveillance takes place. The Administrative Office of the U.S. Courts publishes an annual wiretap report that details summary information for every court-ordered wiretap in the United States. Organizations that have surveillance equipment should institute similar procedures, and surveillance tools such as QRadar should generate immutable logs that record not just who logged in and who logged out but also what they did.

This story, "Inside a Network Operations Center" was originally published by CSO.


Copyright © 2007 IDG Communications, Inc.

7 secrets of successful remote IT teams