Diary of a Product Testing Nightmare

What really goes on in a serious high-end product review?

1 2 3 4 Page 3
Page 3 of 4

Then things got even worse. To make our testing more meaningful, we actually mixed in a wide variety of traffic: viruses, spam, HTML, text, ZIP files, Microsoft Office documents and images of the Three Stooges. When David clicks that into place, we discover that the Avalanches generate traffic around 10% slower than before. We had to drop this important component out of our testing, and we instead stuck with HTTP traffic that I knew wasn't going to exercise the IPS very much.

So our test plan was a shambles. We were 2 1/2 days into a three-day test, and we knew almost nothing. We could tell that the box we were testing was faster than the test gear we had available; we knew that the vendor wasn't lying about its ability to push large UDP packets; and we couldn't do the part of the test where we push our most meaningful traffic mix through. These were not great moments in testing history.

We decided that we were going to need to break into our emergency day, Thursday, just to finish this test.

Depressed, we worked our way to the Spirent cafeteria, where a swarthy man behind the counter offered us the choice of two unappetizing main courses, or a third choice with a long and unpronounceable Greek name. Not being total idiots (although we feel that way), we opted for the third choice and enjoyed a fantastic lunch.

Feb. 4, 1:30 p.m.

After lunch, David and I resolved to get some actual testing done. Since we had not used NAT on the firewall up to this point, I clicked the "NAT" box in NSM, push policy and David ran another test, with essentially identical results. That's interesting, but not very interesting; it just shows that the SRX didn't have any really stupid design flaws. Remember that we are under-testing this box with our 80Gbps test load -- and it can go nearly twice as fast as that.

Our next test was designed to see how fast the IPS is. Now that I had the ability to push a valid policy, we were ready to go for it. We were more cheerful at this point because Juniper does not advertise performance above 30Gbps with the IPS -- so now we have the chance to be faster than the system we are testing.

David clicked "Run" and, sure enough, we discovered that the IPS slowed the system down very dramatically. Juniper had been honest here, as well: our tests came in just under the vendor's 30Gbps numbers. We had a quick smile as we realized that the margin of safety Juniper gave itself on the firewall (about 5%) doesn't also apply to the IPS -- they've got zero margin there; in fact, we got results which were a tad lower than the data sheet advertises.

As we were testing, I did some debugging just to check the results and found my own problem: we were not really NAT'ing the traffic! I double-checked the configuration in NSM and repushed policy, sacrificed a rubber chicken and spun a Tibetan prayer wheel. No good. Something was very wrong. I shot off a message to Rob at Juniper and we put NAT aside for a bit to turn to the next phase of our testing, behavior under attack.

This was a crucial part of the test for us, because we wanted to know how the SRX would fare if you had it hanging out on a huge Internet pipe. The kind of company that buys an SRX is also the kind of company that's going to have a huge Internet presence, so we assumed customers would come under a variety of denial-of service and distributed DoS attacks. Thus, we wanted to know how the SRX behaved not just when it was seeing good traffic, but when things get ugly.

Of course, no one knows how much crap is floating around the Internet and how much traffic is attack traffic as opposed to just malformed packets and broken software. However, I did an analysis last year and saw estimates of between 1% and 3% being "bad" in my research, so we were aiming for something around those numbers. Since the firewall was handling 30Gbps with IPS turned on, I had a goal of about 600Mbps attack traffic -- about 2%.

Spirent had ordered two of its ThreatEx security assessment boxes for us to test with, and I dragged the cardboard containers into the conference room to unpack and got them configured. Because we were doing a 10G test, there were 10G interfaces as well as extra 1Gbps interfaces in each box.

Unfortunately, things were not going well here, either. The boxes weren't seeing the interfaces and we couldn't get testing without them. Fortunately, Mike Jack, Spirent's local ThreatEx specialist, stopped by to help. He pointed at the 10Gbps interfaces and asked: "What's that?" Suddenly, I was worried. Hasn't this guy ever seen a 10Gbps card before?

Well, no, he hasn't -- at least not in a ThreatEx box. That's because ThreatEx doesn't support 10Gbps interfaces. But things don't work too well even without the 10Gbps interfaces. After we pull the 10Gbps card and extra 1 Gbps card out of one of the ThreatEx boxes, it didn't work at all. During first boot, something seems to have gotten confused, and this box is dead. We attempt to recover the box by reinstalling the ThreatEx software fresh, but that was not working right either.

Fortunately, I hadn't actually booted up the second ThreatEx system. We carefully (think Mission Impossible, a bomb with color coded wires, and Jim Phelps with sweaty palms, wire cutters and a Gila Monster gnawing at his big toe) removed the unsupported extra cards and booted it up to discover a working system. Well, mostly working -- it didn't have a license on it. Fortunately, Spirent tech support promises to turn the license around pronto.

Feb. 4, 4 p.m.

While I was waiting for a license for the ThreatEx systems, I checked my mail and discovered a reply from Rob on the NAT question. Yes, of course the SRX supports NAT. No, it's not handled in the security policy, like all the other firewalls that Juniper makes. You do it the JunOS way, not the Netscreen way. Now, this is one of those nightmares of NSM. Here's what I wrote about NSM and the SRX regarding setting IP addresses: "[With NSM,] it's impossible to get something as simple as a list of interfaces and their IP addresses. You have to find the physical interface, and then click through a series of submenus just to find out what the IP address is -- nine of them. And if you know the IP address but can't remember which port it's connected to, you might as well give up and use the command line to figure it out, since NSM would make you click through eight levels of menus just to see each IP address."

Well, NAT is just as hard. In fact, it's harder and more confusing and less documented. Rob had a better idea: type in the commands at the command line, and then re-import the device to NSM. I tried this and it worked. Rob gets a mental "high five" and we now have working NAT. Pay attention to this detail, because this is where everything goes wrong later.

David launched another test while I turned back to the ThreatEx boxes.

Then the lights dimmed. Not figuratively, but literally. We had blown a breaker in the test lab. David and I scrambled around to find some additional plugs. Fortunately, the lab had plenty of power in it, so we managed to get things going without too much delay.

Feb. 4, 5 p.m.

Because the SRX has 10Gbps interfaces, and our ThreatEx box only had 1Gbps interfaces, we needed to adapt between the two. Fortunately, David had predicted this sort of problem -- he's smart that way, you know -- and brought his own 10Gbps Juniper switch as a way of patching through. David is the kind of guy who thinks traveling with a 10Gbps switch is minimum required equipment, the way most of us try to remember to bring toothpaste (I carry a switch around too; it only goes at fast Ethernet rates, but it's the size of a deck of cards and it cost $9). I broke open the box and started to learn Juniper EX switch configuration.

Feb. 4, 5:30 p.m.

ThreatEx guru Mike Jack is back in the house and we were trying to come up with a configuration that will stress the SRX. This was harder than it sounds. The CVE database is commonly used as a way of mapping between different vendor products in the IPS business. While not every attack or signature maps one-to-one to a single CVE number, at least you can get a good idea of what is going on and the type of attack if at least one CVE number is used as a common reference point.

Juniper's NSM, however, had been under-documented in the extreme: less than half of its attack signatures have CVE numbers assigned. Mike suggested we start with UDP attacks, since they will get our attack level up to the 600Mbps goal. This means that I had to find UDP attacks that are both in the ThreatEx system and the Juniper set of recommended signatures. Oh, and we also needed ones that Juniper triggers on -- there's no guarantee that just because the attack is in the ThreatEx box and the Juniper IPS has a signature for it that they would agree on whether this is a real attack or not.

Since we were at the end of the day, I decided to take a short-cut: I picked about 10 or 15 attacks that I knew were pretty important and we simply ran them through the SRX, hoping that we had got some it also considers attacks. Then, I would keep on passing through the signature set, looking for ones that Juniper detects, until we got a nice assortment.

This is where I hit the wall with NSM, and was very tempted to throw the NSM box through the wall. In a pre-SRX Juniper IPS deployment, there was this beautiful workflow where IPS events show up in NSM, which then lets you analyze and understand what's going on, then feed that information back to the policy. When you put an SRX IPS into the picture, you lose that -- the SRX doesn't send its IPS alerts to NSM. It'll send them to a syslog server, but that's pretty useless, since you have no aggregation and analysis tools over there, plus you're missing all of the packet dump data.

So now we had a situation where I was trying to find out what the IPS alerts on, but the IPS doesn't send alerts, at least not to Juniper's Network Management platform. I was confused and incredulous, but Rob from Juniper confirms this: NSM doesn't get SRX alerts. In other words, what we have here is a security device that doesn't tell us how or even whether the network is under attack.

In disgust, David and I leave the lab around 7 p.m.. Mike Jack stays behind, fine-tuning the ThreatEx configurations.

Feb. 5, 9 a.m.

David and I re-entered the lab with panic on our faces. We're on Day 4 of a three-day test, and we still didn't have a story. Our boss, the Dragon Lady, is not going to have any pity for us, since she's the one paying our travel expenses out of a slimmed-and-trimmed testing budget. We better have results, or die trying. I advocate option two, but David talked me out of it.

In our original test plan, we had the goal of showing how the IPS operates with many different configurations: client protecting, server protecting, critical signatures, recommended signatures, and so on. Right now, we've got one test with Juniper's recommended signatures, and that's it.

I decided to add some signatures into the mix. Rather than just accept Juniper's "Recommended" set, I pushed down into the system all major and critical signatures. This included client-protecting signatures, which are notoriously expensive in performance. Of course, we ran into a wall with NSM: the policy push into the IPS had many error messages. While I was fiddling with NSM to make the error messages go away, David ran a quick test and we found out how to truly bring an SRX to its knees: performance is down to around 8Gbps, a tiny fraction of the maximum for this platform.

Unfortunately, I could not get NSM to push the policy without errors. In addition, I was suspicious about which signatures had actually been loaded into the IPS. Because we were not getting alerts, I had very little confidence that I knew what was truly going on. Rob from Juniper gave me some key information on finding the signature set which was supposed to be pushed into the IPS, but there was no way, even in the much-loved JunOS command line, to actually see what the IPS was running. Count one more strike for Juniper's IPS management system, and another Tylenol for our testers. At this point, we're popping Tylenol the way a 6-year old pops M&Ms the day after Halloween. Because this part of the test plan looked like it could take forever, and because we couldn't really tell what was happening inside of the SRX, we moved onto the ThreatEx tests. I reset the SRX to have our "known good" policy. We rewired the SRX so that David had control of seven pairs of 10Gbit interfaces, and I got the last, eighth, pair. Because ThreatEx and Avalanche are not integrated, it seems simpler to keep things entirely separate at the network layer as well. We know that the IPS will top out at 30Gbps or so, which means that seven pairs of interfaces (70Gbps of Avalanche power) are plenty to keep the IPS stressed.

I run a quick test with the IPS turned off (thank heaven for JunOS' "rollback" command, which lets me switch between IPS on and IPS off very quickly) and we get a cheery 660Mbps of attack traffic out of our ThreatEx boxes. This is great -- we don't have to fiddle with the attacks to get speed up, we can just launch these attacks and see what is happening.

I knew in the back of my head that there was a flaw here, because we didn't know which of the 660Mbps of attacks the IPS would identify, so we didn't actually know how much attack traffic we were sending, from the point of view of the IPS. But rather than work that detail out, I wanted to get a test out the door.

Feb 4, 11 a.m.

1 2 3 4 Page 3
Page 3 of 4
7 secrets of successful remote IT teams