TD Ameritrade’s big data push 1 yr. later: Benefits coming from all corners

Data quality is improving, personalization capabilities are emerging, and the pace of innovation is on the rise

Network World Editor in Chief John Dix first spoke to Derek Strauss a year ago when he was about three years into his new new role as TD Ameritrade’s first Chief Data Officer.  He had built a new group, the Enterprise Data and Analytics Group, and just finished 18 months of work to stand up nine new platforms, including a Hadoop data store and a metadata repository.  Dix recently visited Strauss to see how this massive undertaking is working out.

Derek Strauss, CDO Ameritrade

Derek Strauss, Chief Data Officer, TD Ameritrade

Where do we start for an update on what you’ve achieved since we last spoke?

I’ve got a long list of things we’ve been tracking in terms of value, so I can hit some of the high spots, and then it might be good to step back and look at some of the other things we’re gearing up for that are only possible because of the foundation we’ve laid.  We’re going to be embarking on a pretty aggressive timeline for these new initiatives, and I feel good about being aggressive because the foundation is in place. 

You mentioned the Hadoop effort so why don’t we start there.  The drive with Hadoop is around personalization so our clients feel like we know them and we can provide useful insights and education without it feeling creepy.  The focus is to be like Amazon’s suggestions, where you go, “Wow, I like what they’re suggesting, that’s really useful.” 

We’re calling the Hadoop environment the data marshalling yard.  Why?   Because that’s what is typically upstream from a warehouse.  Think about raw materials being brought together to be manufactured into something.  They will often be transported by rail and come into a marshalling yard where they’ll be sorted for delivery to various factories and warehouses downstream, and you perform analytics on the raw material as it stands.  So it seemed like a natural analogy to call it a data marshalling yard. 


What have we done with that?  A couple of key things.  We have mainly focused on pulling in chat information and emails, a lot of textual stuff, to try and understand client behavior and so we can optimize the client experience in terms of scenarios. We’re also looking at what our clients are talking about and reading. When they phone us, what do they want to talk about? Putting all of that together with their activity on our site, we figure out this client is really interested in certain types of asset classes and we can then look to see if there any reports by third parties, by government, by whoever, and say, “It seems like this is an area you’re interested in.  Are you aware these resources have just been published and here’s a link to them.”  All of that is around personalization. 

So we’re realizing analytics benefits, but there are also benefits around data and data management. 

Let’s take a simple example of a codes table.  A code could be anything, but let’s look at country codes.  South Africa is ZA.  USA is United States of America. When it comes to programmers writing programs, if there isn’t one country code table everyone can refer to as the authoritative table, everyone hard codes the table into their program.  But any large organization has hundreds of systems, so you’ve probably got 100 country code tables hanging around, or worse, one for every program. 

Master data management is all about trying to solve that.   Country code is just one simple example, but when we started looking at this it was amazing how many times people have created redundant tables, and that can lead to all sorts of regulatory and compliance problems and a lot of inaccuracies.

Take me, for example.  I was born in Rhodesia.  Rhodesia doesn’t exist anymore, but if you’re looking for Derek’s birthplace, are you going to know Rhodesia is now Zimbabwe? Keeping that memory of geographical stuff centralized is something every organization needs and no one really has. 

We implemented a master data management capability and the first thing we tackled was country codes.  Now our application development teams know they can go to one authoritative source to find it.  They’re not continuing to perpetuate the redundancy and the inaccuracies in the data, plus if something changes, they don’t have to remember to update their program because someone in the business now owns and is responsible for updating that data. 

Those kinds of efficiencies are huge and very often get overlooked.  When you think of the Chief Data Officer role, people just think about the sizzle of the analytics side, but there’s a very real efficiency side on the data set which is a big plus for any organization. 

Once you have this master data management capability, I presume you go around looking for duplication of effort and multiple versions of the truth?

Right.  And when you find it you need to find someone to own it.  That’s the data governance side of things.  You find an owner and that owner points to the data steward who is normally someone who is already doing work trying to fix the problem, and you say, “Here’s a tool where you can analyze all the different values you’ve got today, harmonize them, create one source of the truth and you own that and you make sure that is up to date and everyone else starts using that.”  That makes a big difference.

But there are literally hundreds and hundreds of instances where this would apply and it’s a question of working with the business groups who are constantly tripping over these things, prioritizing them, and just picking them off one at a time and working through it. 

The big elephant in the room is the client, because we, like many financial organizations, have grown up being account-centric.  So John, let’s open an account for you.  Oh, and you’d like to try something else?  Well, let’s open another account for you, and another, and another. Every time we open an account for you we redundantly create information about you in that account record.  We don’t have one central record about you. 

Behind the scenes, for financial firms to be able to deal with you as a client and understand your total business with us and treat you accordingly, we’ve got a thousand gnomes running around all night trying to bring all this information together. 

I’m exaggerating for effect, of course, but it’s a big thing because it’s like open heart surgery for the organization and you’ve got to really know that you’re going to be successful and you’ve got to plan the creation of a client master very carefully. We now have an opportunity to address that head-on because we’ve put a lot of the building blocks in place.  I’ll come back to that one.  That was just sowing the seed.  Master data management is a key benefit and it’s all about efficiency.  

Data quality improvement is another key benefit.  The Patriot Act stipulated a bunch of things about anti-money laundering, and there are about five major attributes of client that are critical and have to be in good order.  One of them is date of birth.

How could there be any fluctuation around that?

Any company that has grown through acquisition has had to make some decisions where expediency won out over guarantees for the highest quality of data. For example, if we had acquired a book of business with a couple thousand clients and their records related to date of birth were incomplete, we might have decided to bring them in with today’s date being the date of birth and the idea that we would go back and fix it over time.   The expedient thing was to get the conversion done.  Other times the programs capturing the data in the companies we acquired didn’t have the right sort of edits so you had people with birth dates in the 1800s instead of the 1900 or birthdates in the future.  Just crazy stuff.

We saw all those things and thought, “Okay, this is going to be interesting.  We’re going to have to do some real work analyzing these and figuring out the root causes and figuring out the best way of remediation.”

In the past we didn’t know the extent of the problem. We stumbled on it occasionally and have had problems running various types of reports, and we’ve had to rush back and try to figure out what was going on.  Now we know what’s going on.  Now we know where the problems are.  Now we’re actually going back and working to fix it, which is huge.  That’s all the authorities want from any organization they audit. They know it’s not perfect.  It’s what you’re doing about it and do you understand the risk.

And all of these things, of course, have spinoff advantages to the analytics group because they’re starting to work with data that is in better shape, and of course if you’re working off data that’s got high integrity your decisions are going to be stronger and it’s going to be easier. 

Are you bringing all the data into one place to improve the quality, or trying to improve it where it sits?

We’re trying to fix it where it is, at the actual source.  But that’s a good point because, as we start thinking about creating a client master, ideally in the fullness of time we’ll have just one place where that data is and it will be good data.  But because we’ve started fixing it at the source now, when we do create that client master we’re going to be creating it with good data as opposed to data that we have to go fix.

But its complicated.  If there are seven different sources for this particular thing, say, for date of birth, which of those would we consider to be the authoritative source?  If we really wanted to save ourselves the trouble of trying to fix all seven of them, which one would we fix now?  We’re trying to do that thinking as well. 

1 2 Page 1
Page 1 of 2
Survey says! Share your insights in our 19th annual State of the CIO study