by CIO Staff

How SOA Really Works

Aug 17, 200525 mins

Pretty frequently, I talk to people who are really smart, articulate and who tell me things that I could never hope to fit into a feature for the magazine. I want to start featuring them here. Hossein Moiin, vice president of technical strategy for T-Mobile International, is one of those rare people who can go deep on the technology but also pull back and put that technology into a business context and a strategy context. Notice how he talks about his SOA. It’s not merely in the context of reusable internal integration—this is a two-pronged architecture that is as focused on external revenue-generating opportunities as it is on internal integration. That is a critical strategic emphasis that seems to be missing from many SOA efforts today. Notice how he understands the different speeds, sophistication levels and adoption tolerance of technology processes versus business processes and management processes. I think we tend to assume that they can all move at the same speeds and have the same degree of sophistication. The truth is business and management processes lag far behind the sophistication, speed and adoption rates of technology processes. Let me know what you think.

Q: How did you map out the SOA for T-Mobile?

A: We designed four planes or four layers, if you will: applications, access, consumption and support. So what we’ve done is we’ve gone holistically into these four planes and said, what are the processes, what are the functionalities, required within each of these planes to deliver the whole gamut of communication information, entertainment and even transactional services to our end users?

Now what we’ve done, based on that, is broken down these areas, these four planes, if you will, into further layers.  So start at the very top:  four layers; go down a little bit deeper into it.  You end up with a number of network-specific and network-agnostic functionalities. So within the application plane, you end up with streaming video, video clips; voice itself—voice calls is one such application—etcetera.

Then you build those applications and combine those applications with the core capabilities of a mobile phone company, which include things like presence and location, messaging, browsing capabilities. As well as ability to charge for the thing, to build databases and data warehouses regarding the customer interaction with the ecosystem. And not to forget, you need the right network and the right devices.

Each of these exposes in one way or another some key functionality which can be utilized by the application. And the goal of our service-oriented architecture is to understand and articulate the capabilities of each of these planes and sub-planes in a manner that can be used by external and internal application developers, so that they don’t have to rewrite the code and reinvent the wheel.

I’ll give you an example. For instance, many devices have very different characteristics. Simple characteristics such as screen size and, related to display, color or black and white pictures is an important characteristic for a set of application providers. Another one would be probably the speed of the downlink guaranteed by the network or provided by the network. These features, if they are not made available to the application developers, need to be discovered and then used by the applications themselves.

T-Mobile’s infrastructure has all this information available. It’s just that it’s not exposed properly and at the right level of abstraction. So what we did was we said, “Okay, we want to have a viable and vibrant ecosystem. And that ecosystem will be composed of the users with their devices, the mobile phone company and his or her internal applications. And the third parties, who will provide most of the content.

So within this framework, what we’ve done is, we said, “What are business processes that we need to follow? We need to have a contract with the third party. We need to have the right handset with our end users. We need to have the right functionality within the networks and the supporting infrastructure. And we capture all of this information and business processes that really are nothing more than contractual agreements, but they need to be translated and boiled down into actual services that are then delivered to the end user. And this is the basic idea behind our SOA.

Now we, like other providers, have realized that Web services play an important role in this game; however, they’re not an end unto themselves. They are not complete as of today. They’re getting better all the time, but they still leave some holes to be filled. And these holes will be filled whenever possible by standards.

So we can take some examples to be more down to earth. Let’s say, for example, that we want to do a deal with [Big-Eared Mouse, Inc.] to send cartoons to our phones. One of the key elements of our overall end-to-end architecture is a third-party gateway. This third-party gateway allows the third parties to interact with the services provided by T-Mobile. So it does a number of key functions. One of them is, it authenticates and authorizes the third party so that the end-user, when they interact with this third party, are secure from things like viruses or malfunctioning software.

The second thing it does, it allows the third party to discover those functionalities and services provided to the outside world. The third thing that it does, it dispatches a request and gathers information from the third party and sends it to the appropriate units within T-Mobile. It gathers the internal responses, and sends it back to the third party.

And it also allows the content that’s moved from the third party to the end user to have additional checks along the way. For example, suitability for adult consumption. Because we made a promise to the U.K. government that we will not allow adult material to be viewed by underage people. And we know how old our users are, because they have a contract. So this is one of the key and very specific elements of our architecture.

And the functionality that it performs is truly based on a service-oriented architecture. And as you can see, the services that it provides are authentication, authorization and even accounting. In addition, it does access control, allows for discovery, allows for dispatching and gathering of responses, and ingestion of the content.

There are many other elements like the third-party gateway which are embedded in our system. And basically all of them expose APIs so that they can be used across the board. Meaning that they can be used internally or externally. And if they need to be used externally by other members of the ecosystem, then there will be access to this third-party gateway.

Q: So let’s pretend I’m a developer at what would be a content provider that you guys work with.

A: So basically, if you have a developer at [Big-Eared Mouse] wants to work with us, the very first thing that we do is we have a set of APIs and set of documentation that we would provide them with. These documentations and APIs are a basic system description of our web services layer and sometimes beyond that, which they can take and tailor their code according to these APIs.

The next stage, once the code is done and tested entirely, is that it will be sent to us. We will put the code on our test systems. So these are exactly like our production systems except no one external can access them. These will be tested by both automated means as well as by manual means, utilizing various type of phones and devices that we actually use. So that if something is not right, we can actually let the developer in [Big-Eared Mouse] know.

And once those series of tests are passed, then that code is put onto the pre-production platform. We have normal releases scheduled once every other week or so.

Now in parallel to this technical track, obviously there’s a business track that needs to be followed. Meaning we need to decide together with [Big-Eared Mouse] what is the revenue share between us. It’s not entirely accurate to say that they sell the contents to us and then we sell it to the end user, but that’s a good model to work with.

So it’s probably not a bad idea to say that we actually are a reseller of [Big-Eared Mouse]’s content. And we add value because of the network, the device, the billing, etcetera. But we agree with [Big-Eared Mouse] at a final price. And then we take a cut of it, and we pass on their share after it’s been collected from the end user. And with that contract, there’s also the element of responsibility of [Big-Eared Mouse] as well as our responsibility.

So obviously there’s some financial responsibility, but also there’s some quality responsibility. So the content that [Big-Eared Mouse] sends needs to be, again, not a virus, not a malfunctioning software that does terrible things to your phone, etcetera. And we cannot fully guarantee, we have a way of identifying the offending party. If a third party’s content has been found to cause a virus on a customer’s phone, for example, because of the fact that we have a record of who’s been authorized, how it’s been accessed, etcetera, we can actually track it back to them. So we have not only a prevention but also an enforcement mechanism for all the safety and security that we need to provide.

So again, what’s interesting is, in the past, the process was more or less the same. And this probably goes back to the return-on-investment. And what happened was, we used to have the content from [Big-Eared Mouse] tested on many different platforms. Because each of them were independent, they did their own thing in their own specific way. And the content from [Big-Eared Mouse] would also be slightly different for each of the, say, national operators that we own.

So for example, in the U.K. and Germany, they would have two slightly different versions of the same content. So it’s the same cartoon, but to access the billing, to access the charging there will be some differences in the codes.

Q: Because the content is the same, but the code surrounding it is different?

A: Yes, exactly, because of all the differences in capabilities of national operators. So what we did was in the case of charging and billing, we built a service on top of it which exposes a unique interface to the third party. So the code is internationalized. And all the differences are internally handled by this third party gateway.

We have around 1,000 content providers working with us, some on an international scale, such as [Big-Eared Mouse], Time Warner, Bertelsmann Group. And others based in countries or local cities are very small content providers. Perhaps in Czech Republic or Germany or U.K., you will find that more than half are small content providers.

Q: So what’s in the documentation you’re sending to the content providers?

A: It tells them what are the functionalities that we provide, what are the APIs to interact with our systems, and what is expected of them, and what is the process. The functionality fits into our four planes of applications, access, consumption and support. For example, the APIs for the billing functionality are in the support plane. And this other API is actually regarding the device, which is the consumption plane. So it’s there, but it’s not explicitly pointed out. Because that level of architectural discussion perhaps is not appropriate for developers. That was our feeling when we did this.

We do not advertise native APIs to the general public, so those are something that you need to talk to the engineering or development department of T-Mobile. And then we allow you to access them and use them.

Q: People are talking about enterprise service bus, both on the conceptual level and as a product that middleware vendors are selling; sort of everybody’s converging on this space from different places. Do you have something like that, that you have built or that you have from a vendor?

A: Within our information systems, we have a product from a vendor which effectively does our business integration. And this is a bus architecture. So basically what we do is, we build adaptors to this bus for various platforms, and the information is placed on that bus and exchanged among the various segments.

So this is actually, I guess, perhaps the only one that we actually use today. The rest of them are in a truly distributed manner, and they make point-to-point calls, or they go through a star system and a star architecture. So there’s a centralized place, and then that gets distributed and dispatched to the appropriate receiver.

Q: The idea that companies are buying these buses from vendors— Does this shoot another hole in the whole conceptual SOA framework, saying that you can’t really build this stuff on your own, and that there are too many holes in the standards for you to create a truly unique SOA without one of these tools?

A: To me, this is another implementation of service-oriented architecture.

So it is not based on the, quote-unquote, “sanitized” definition of SOA. But from a theoretical perspective, it is really doing the job that service-oriented architecture will do. Because our whole philosophy and belief is that the business logic is within the application, but all the other elements of the application can be done outside of the application, and reused by other applications that will need the same functionality. And that’s what we’ve really emphasized within our overall architecture.

Now how you get there, whether it’s sanitized SOA or using a bus architecture, or even a hub and spoke, that’s a matter of convenience, timing, and readiness of product. So for some very complex systems, like our information systems, we felt that this sanitizes SOA is not there today, whereas a bus architecture would give us more benefits. So it was really a timing issue that made us choose this, versus the other way.

Q: So in your mind, within the overall umbrella of SOA, you can implement in a number of different ways, right?

A: Yes.

Q: So pure Web services, bus or hub and spoke—it sounds like you’ve got a mixture of all those.

A: Yes, we do.

Q: But it’s still all focused on this overall theory of exposing things as services, ultimately?

A: Exactly. And the reason for doing so is very business-focused. The reason is to minimize rework, and reuse existing functionality. Now how you do that depends on the particulars of the actual systems you’re discussing, how closely they need to be tied together, efficiency, performance requirements, readiness, availability of products and standards. Once you take all of these considerations into it, then you may come to a different conclusion than we have. But the whole picture eventually will converge towards an SOA architecture.And I think the vendors themselves have realized that this is the way to go, as well. So they’re trying to offer another option.

Q: What is missing from hub and spoke and the bus concept that prevents it from being used as a clean sort of SOA implementation?

A: So the hub and spoke has the difficulty of inefficiency. So you really depend on the hub to be as efficient as possible, and so it creates some performance bottlenecks.

Q: Can you give me an example?

A: So that was exactly what I was going to talk about. Now we have not done any detailed study on this, but this is our general feeling. So for example, within the third party enabling gateway itself is a hub, effectively. So it takes requests from many sources and then distributes them to many systems internally, collects the responses back, sends it back to them. So it acts as a hub.

Now generally, you would like this to be as fast and efficient as possible. To do that, you need very large systems. And we’ve seen the kind of systems that we use are not mainframe, but not far from mainframes. And that’s all they do, primarily: in addition to authentication-authorization, their function is to grab a request and feed it to the right substation within the telecom’s network. So this doesn’t look very efficient to us.So the problem with hub and spoke is the inefficiency of the hub.

Now exact numbers, etcetera: we never really have done this, but when we decided to move to SOA, which we have decided, we need to justify it, and the business case needs to be made; that will be done, and I feel confident that such a case can be made.

The second area regarding the bus architecture involves the fact that you need to write these adaptors. And these adaptors, while not too difficult to write when you talk about 5 or 10 systems, when you talk about thousands of partners, clearly that’s not going to work. Because these adaptors are specifically one vendor, that moves away from openness and from the heart of SOA.

Now if you have full control over your environments like we do within our information systems, clearly this is acceptable. But in the future perhaps, even that needs to be rethought and justified so that it can move into that pure SOA architecture. So those are the two key issues that I see.

Q: How are you getting rid of the inefficiencies?

A: So when we move to a pure SOA, the adaptors specific to this vendor’s bus will be eliminated. And the bus goes away also; that’s the beauty of it. When you actually have everything distributed, then you can make many point-to-point connections within the system. Whereas here, you’re making one-to-many connections. And therefore this central point-to-point has some nice properties like control, etcetera.But that also creates inefficiency.

Q: Yeah. It’s interesting, it seems almost full circle. I did a piece on integration architectures back in the mid-nineties, and at that time hub and spoke was the best answer anyone had. Mostly, I guess, due to the lack of maturation of the technology. But at that time you wanted to get rid of are all these point-to-point, one-off integrations. And you do that by doing the hub and spoke, which as you say, sort of starts to build up into an incredible burden to manage.But at least it’s cutting a lot of extra work out of all these point-to-point connections. Now it sounds like we’re coming full circle and saying, “Well really, what you’d want to do is the point-to-point, where individuals are handling the connections between two applications. But because you’re doing it with a Web service, you don’t have the issues of rework when you tear apart those connections.

A: Exactly. So we have really not gone full circle, per se. But it may appear so, and this is not unusual. In any business, you see industries that are vertically integrated, like telecoms used to be in the States— AT&T, prime example— become more or less horizontally integrated. And then they move back to being vertically integrated.

Within the realm of architecture, this is also possible. We realize some benefits of centralized control, etcetera. But we also realize there are benefits of a distributed environment, and understand perhaps not a lack of control, but the higher performance that it can afford us. Our aim is to improve over some period of time‑‑ again, a decade is perhaps a good thing— to move from one paradigm to the next paradigm, but not forgetting the lessons of the past. And that’s what I think we’re seeing at the moment. So it’s not that we’re going back to where we were, but we’ve moved on. And now we think that distributed way of doing things was the right way, but it didn’t have all the right attributes and characteristics.We worked on the right attributes and characteristics, but we ended up in a different place. Now we want to go back to that old place, but with a different set of attributes.

Q: Let’s talk about governance. When you have these distributed services, how do you govern who gets to use them and how do you prevent one group that offers a popular service from having its network overwhelmed with traffic?

A: So when we say “you’re learning from your past mistakes” and keeping the attributes that you find desirable in the future systems, one of the key difficulties with distributed systems and SOA is the fact that you actually lose that physical sense of control.

So today in a hub and spoke environment, the hub actually acts not only as a distribution point and centralization of API’s point. But much more fundamentally, it acts as a control point. So throughout all the systems, it can manage the access, it can give priorities, etcetera. This needs to be addressed within the SOA framework, and it has not been to date.

How do you actually define your policies and distributed that policy to every single point? It has not been done yet.

Q: There is also the business governance issue of risk and reward, right? You talked about a service that you offered from servers in your Czech Republic subsidiary that was accessed primarily by other countries—not by the Czech Republic company. And the technology executives in the Czech Republic started to wonder what was in this for them, right?

A: Yes, it’s a download server. So it serves ring tones, icons, wallpapers and the like. So it downloads content to the mobile phones. Now what I remember, and the problem that we had, was indeed the fact that the Czech guys don’t get as much out of this, as much as they put into it. However, from a global perspective this is the optimal way of providing this service.Meaning that Czech Republic, because of its lower cost and very high level of capabilities, is an ideal place to have this service.

What’s not ideal is that the resources being used there cannot be compensated by their local demand. So you have this global demand that they satisfy in a very satisfactory manner to the group. But the right rewards are not being put in place so incentives are not there. And we have not resolved this issue, I don’t believe, to any satisfactory conclusions, other than ordering them, “You have to do this, because it comes from the headquarters.” Which may not be the most satisfactory way of doing it.

But then again, in the case of Czech Republic, at least they are part of our group, so their financial numbers are mixed with ours. But still, from their own financial control perspective, it does not make sense.

Q: So then, what are your thoughts on how you create the sort of SLAs for managing and governing these services that you create?

A: I think you’ve introduced a very interesting point—this notion of chargeback for each time the service is used. And that clearly adds some complexity.

So the way I like to move forward is actually along an evolutionary path. Today, our Czech colleagues, sadly they have to do this without realizing any true benefits of it. This obviously works for a little time. In the long run, however, you need to introduce fair and balanced processes into it.

And by doing so, one way of it is actually measuring the amount of utility that you provide to the group, and charging-back the group for that utility; this can be done, based on measuring the resource utilization and coming up with a fair price so that the compensation is fair, and no longer a mandate is required from the HQ to do things.

This can be done, but I believe that I also would be hesitant to introduce it today. A reason being, this perception of complexity and lack of appropriate processes. I believe the technology is there, but I believe our mindset, at least within my company, is not there.

Q: Explain what you mean by that.

A: So again, we have good technology that can measure performance at micro-levels. What I don’t believe is that we have the right processes within our finance and even IT organizations to translate those measurements into fair value for both the users and providers of services. And that’s the evolution that needs to take place. So the genesis is there, but the evolution is not; it may take some time. But I will be a proponent of such an approach.However, I have no plans to introduce it at the moment.(?)

Q: Now do you guys have a repository set up, or a website where you go and access these services if you’re a developer? What’s the control mechanism for the services?

A: Yes. Internally, there are websites and developers in different systems can utilize them to access the services. And to external parties, we try to minimize our exposure of internal functionality, such as billing, such as customer care, such as customer relationship management.

For those, we provide a very abstracted view within our content provider API. So they can uniquely identify a customer and they can charge the customer, for example, but they wouldn’t be able to get access to his or her records. Whereas internally, our developers can.

Q: There seems to be a challenge there, too, when you talk about Web services and exposing them internally versus externally. You’ve got obvious privacy-security issues.

A: Absolutely.

Q: So that essentially you can’t have “one size fits all” service. You need separate repositories for internal and external.

A: Yes, we do. So basically, the way we’ve classified them is “trusted applications” and “internal applications.” Internal is at even a higher level of trust than “trusted.” And there are physical paths that we can choose to say which one is which.

So for example, someone comes in through an ISDN connection to you or a dedicated line, that is a trusted party. And you have a contractual agreement with them, you know exactly what port they’re coming in on, you know everything about them. And they can access some more functionality—not a whole lot more because of privacy issues— but some more functionality than someone accessing the general Web page.

Q: Some people say the ultimate goal of SOA is a screen that comes up on a business analyst’s desk and it shows the services that you offer internally, externally. And they pull them out, drag connections to other services on the screen and away they go. Whatever connections are necessary are done automatically. Do you envision getting there at some point, or do you think there’s something wrong with that vision, ultimately?

A: To be honest with you, I don’t think there is anything wrong with the vision. And I think it’s an excellent idea to provide all these services that we do— simple things like security, authentication, more complex things such as rendering of content— and have those available to be used by developers whose job is ultimately to provide content for an application so they can really concentrate on their core activity. I don’t think there is anything wrong with the vision. I believe that we’re at a point in software development where we can begin to see this happen. But we’re not there yet.

And I think the other aspect of it is: a lot of these activities will be done or must be done in an automated fashion. To date, that has meant introducing some inefficiencies. With Moore’s law really kicking in, we see that inefficiency may no longer really matter. But within telecoms, it still does matter because inefficiencies equal cost. And when you talk about a scale of someone like T-Mobile with 80 million customers, that little inefficiency translates into a lot of cost.

So I believe that we may not be there yet, but— perhaps “certain” is too strong a word—but I’m optimistic that we’ll get there.

Q: So how does this introduce inefficiencies now?

A: So for example, when you actually develop code—as opposed to picking bits and pieces from a diagram, and connecting them together in a way that you feel would serve your purpose. Those connections, ultimately are going to create machine code, which runs on the systems, more code than if you had a developer making those connections. So as Moore’s Law continues, ultimately it probably won’t matter. Today, it does. It costs money to accommodate the extra code and the extra points of inefficiency.