Enterprise applications built using Web services and other component-based architecture have many advantages, including interoperability, reusability and flexibility. But those advantages can become problematical for the operations department, which needs to monitor and track the software's performance.
In short: if all you have is piece parts, how do you know when the software is working? Or, more important, how can you tell if it's failed? That was the problem that Clear Channel's solution architect Curt Smith and enterprise architect John Szurek stared in the face.
Clear Channel is among the world's largest media companies, specializing in mobile and on-demand entertainment and information services for local communities. The company's businesses include radio (they own 8 percent of the radio stations in the U.S.) and billboards (more formally called outdoor displays).
On the IT side, Clear Channel has a primarily Web-based environment built on Microsoft platforms, with a lot of investment in Web services and cloud computing. "Almost everything we build—certainly our core business applications—are Web apps," explained Smith.
For the last few years, the company has been improving its user interfaces (UIs) to make their applications more common and unified. Concurrently, Smith and Szurek have been doing their best to make the delivery and management monitoring of those applications work more smoothly.
Part of creating a standardized UI is to treat the back-end systems as a commodity engine, Smith explained. Three years ago, they decided to adopt Web services, with the notion of a loosely-coupled world with a common interface. Great. The model works. But...
"From the beginning, some of us—Curt and I at the forefront—realized that there was trouble brewing in paradise," Szurek explained. "These SOAP-type systems that we hate so much turn out to be pretty nice after all. They define what the pieces are." While the business appreciates the agility of loosely-coupled architectures, and programmers were happy with it, said Szurek, "The monitoring people don't want that. The operations people don't want that. And with virtual machines in the data center... there could ultimately be chaos."
Chaos? Hey, isn't loosely-coupled software design supposed to make things simpler? For developers, sure. But for IT departments worried about monitoring a whole enterprise's collection of applications, the difficulty is understanding how it all interconnects and having visibility from one end of an application stream to the other.
You can see if an individual Web service is up or down, explained Smith, but what an operations department cares about is the uptime and performance of the system. If an application is no longer a monolithic construct, but composed of synthetic transactions, how do you define what the application is? Especially when the functionality is workflow-oriented, and thus may gate differently—that is change based on user behavior—depending on how the software is employed.
"We were ill equipped to deal with it," explained Smith. The company's first SOA application was fairly simplistic: a contract commitment product for a line-of-business department. "When that system was first brought up, it broke for 18 hours—and no one knew," said Smith. Because users could fill out the Web form and submit the page, getting an appropriate system response that the data was sent, they were happy. But the data was blocked further down the pipeline, and nothing identified that a failure had occurred. "If you aren't looking at [the transaction] all the way through, you don't see the problem," said Smith. "The interdependency model becomes much, much more complex."
Their newfound awareness led Smith and Szurek to work with Microsoft's Operations Manager (MOM), more formally known today as System Center Operations Manager 2007 and other components of Microsoft's System Center family of products, including the company's Configuration Manager, Data Protection Manager and Virtual Machine Manager (the latter currently is still in beta). The IT organization manages the entire platform—750 servers in the data center, and another 500-600 spread out at other locations—from one central location.
"Any solution for a company is a combination of people, process and technology," said Bob Kelly, Microsoft corporate vice president for Infrastructure Server Marketing. "Often what's forgotten is process."
"Along with MOM came this idea of a 'management pack,'" said Smith, which let the department keep an eye on the health of any particular system. But for the architecture to work, it has to be integrated across the entire application development lifecycle. That starts with the programmers.
For the system center tools to accurately watch Web services and business processes, the application's developers need to be aware of the need when they're building the software. Developers have to build in the metadata to describe the relationship between parts, explained Smith. As they bring the application to production, development can deliver a management pack that tells the system how to judge the application's health. "Those pieces intuitively fall into place," said Smith.
This might sound like more work for developers. But, Smith pointed out, they have to worry about it in any case. "If there's a subtle problem in production, it ends up back on the developers' desks anyway," he said. "They're helping themselves to accurately identify and diagnose problems. [With the new system,] they don't have to be as involved in production systems that aren't working."
And, Smith claimed, it's not any more onerous than building unit tests. Building a management pack that defines the health of the system, using Visual Studio Team System (Clear Channel is a .NET shop) doesn't require programmers to learn a whole new language or to acquire any new concepts. "It's easy to say, 'You know this code; how would you tell if it's working correctly?'" Smith said. The developers put in the hooks to expose that measurement or metric to the outside world. It's mostly an issue of awareness, like making a good developer aware of error trapping, he said.
One positive effect of the systems center technology adoption and its integration with Visual Studio tools, said Microsoft's Kelly, is that it lets the software take on the burden of IT rather than requiring individuals to make all the decisions. Doing so, he said, "Moves individuals from feeling like they're a cost center (a break/fix department) into a strategic asset."
So far, this sounds like a technology story. But the real win has been in corporate culture—or perhaps corporate attention to soft skills has enabled the technology to be used effectively.
As Smith explained, "It's no longer about infrastructure. It's about developing solutions that are built for operations, and having an understanding that goes all the way through the chain." To accomplish that goal, he said, "You can no longer have a dev group and an ops group."
Clear Channel broke down barriers between IT architects and developers and now maintains a tight connection between the team designing and building applications, and the team responsible for the systems architecture. At least, that's how the PR department might describe it. According to Szurek, the cultural evolution doesn't mean that everyone likes each other. "We don't have too much hugging and kissing going on," he said. "But people are learning to be more respectful of each other." And they're behaving in ways that benefit everyone.
In day to day terms, said Smith, developers and operations people understand that "this problem is a common problem." They describe it as a problem ownership issue rather than one of personal friendship. The software belongs to both teams, he said: "We're not doing each other a favor," he said. The attitude is, "This is my work, too."
Behind the consciousness-raising, said Smith and Szurek, is a cultural awareness in the Clear Channel IT community, spearheaded by the company's CIO, David Wilson. Using techniques to help people improve how they talk and listen to one another, explained Smith, has encouraged everyone to be aware of how they work together. "And to get better at it," Szurek added.
Doing so was a necessity. Clear Channel had grown swiftly from mergers and acquisitions. Naturally, the company wanted to keep the top notch IT people, but the M&A activity generated "a whole bunch of strong, aggressive people with overlapping job functions," said Szurek, as well as uncertain domains of where one's responsibility began and ended. So the company put an emphasis on identifying roles, improving communication skills, and offering training for conflict resolution skills.
It's worked. People listen to understand, Smith said. Not to agree, not to wait for their turn to talk. But to understand—and good decisions follow. "You can taste the difference in the conversation," he said. That means that IT conflicts are resolved faster, with potential fires stomped out before the flames start.
Previously, problems would be escalated before they were resolved, sometimes up to the CIO level. Now, Szurek says, they're being resolved at the peer level. Smith said, "We, like everyone, had situations where inappropriate, ranting e-mail messages went out. I can't tell you how long it's been since I've seen one of those!"