When Chris Smith joined Ticketmaster as vice president of data science, the 40-year-old event ticketing business was facing unique challenges in its data-science program.
“We had tech debt that was older than most of the companies I’d worked at,” he says.
Ticketmaster had achieved early data-science successes through custom data integrations with its various IT systems — and there were plenty of those. After 40 years of acquisitions and internal software development, the company had around 300 IT systems, each on its own island of data.
With its move into new markets such as ticket resale and the provision of management and reporting tools for venues, Ticketmaster needed a way to integrate these data silos and make its data available to the entire organization, without rewriting all of its software.
“We needed a lingua franca that we could teach each of the different systems, that would be just a small amount of work in order to get data out of that system and available to everyone,” says Smith. And that change had to be made with minimal disruption to the existing systems. “One of the dangers of taking an old system and applying it to a new problem is that it’s a bad design fit,” he says. “It actually makes the system more unstable, less performant and also more difficult to evolve over time.”
Initially, Ticketmaster experimented with Hadoop to batch-process its big data, and then with Apache Storm, an open-source streaming computation system.
Elsewhere in the business, Ticketmaster was using another open-source stream-processing platform, Kafka, that Smith was familiar with from previous roles working in ad tech.
Kafka was originally developed by staff at LinkedIn, and later donated to the Apache Software Foundation. Its creators went on to found Confluent, a company that employs the majority of Kafka’s developers and offers enterprise services built around the platform.
Like a message queue or enterprise messaging system, Kafka allows users to publish and subscribe to streams of records, but it can also store those streams in a fault-tolerant way, and process the records as they arrive.
Smith recognized that Kafka could push Ticketmaster’s data out to a stream with a very small footprint, and enable his team to iterate on data science problems in a more rapid fashion.
“That’s like catnip for data scientists,” he says. “They become much more interested once they have access to all that data.”
To sell Kafka to Ticketmaster’s chief data officer, Smith had to demonstrate the system’s high availability and scalability.
“Every time we put our tickets on sale, it’s essentially a distributed denial of service attack that we launched on purpose against ourselves, so this creates some very interesting scaling challenges,” he says. “If you go down in the middle of a bunch of people trying to get to a concert or to a sporting event, this is not going to be a happy conversation for anybody involved. It’s of utmost importance that things work.”
Another benefit of Kafka was the support that Confluent provided, including the larger data platform it has built around Kafka. One component it developed is the Schema Registry, which Smith highly recommends.
“A lot of people, when they start off, they don’t use the Schema Registry right away, and that creates a sort of technical dead end that is very difficult to get out of,” he says. Using the registry, companies can evolve the database schemas they use with Kafka while maintaining backward compatibility with existing applications. “You can maintain that rapid rate of iteration that otherwise might get you stuck into a very fragile and difficult to change environment.”
Working with business units
Although Ticketmaster now uses Kafka across the enterprise, Smith’s group started small, working with the fan data team to help it create a single view of fans who bought tickets through Ticketmaster, and from there with three or four other teams to build key integrations.
“We would go to those teams and say, look, we need your data. We want you to use this new technology that you haven’t used before, but we will provide you with expertise to show you how to use it. And of course, those teams had expertise on how their systems work, so it was a joint operation,” he says.
Those initial, low-risk integration efforts created a network effect, he says: “Teams start coming to you instead of the other way around because there’s so much value in being able to integrate the individual components’ data with the larger data ecosystem that we created.”
Clients using Ticketmaster’s ticket resale and venue check-in services were able to benefit from that with a new feature that took down offers to resell tickets once they had been used to enter a venue. That’s useful when a fan offers to resell their ticket, doesn’t get any buyers and decides at the last minute to go to the show — and then someone else offers to buy the now unavailable ticket. “Then you’ve got a disappointing experience for the fan that could have been solved by a lot of point-to-point integrations between our different systems,” Smith says. “But since we already had the relevant data going through the system, it became very easy for the people who are managing those resale listings to integrate their data into Kafka” and get the automatic delisting feature out the door.
While the move to Kafka started as a way to reduce tech debt by eliminating the need for custom data integrations, getting a critical mass of real-time data on the system has opened up many real-time processing opportunities.
“That’s how our identity-based ticketing platform works. It’s how our abuse systems work. It’s how we’re able to provide real-time views of not just our own sales, but also the attendance process of events,” Smith says. “In real time, clients can see where people are struggling to get into their venue and notice when a VIP is entering into the venue so that they can reach out to them and give them that VIP experience.”
Dealing with privacy
Not all the company’s systems are connected through Kafka, though. Some are legacy systems that weren’t a priority, and are now being decommissioned. Others haven’t been around long enough to be integrated, part of some startup or competitor that Ticketmaster has recently acquired.
But, says Smith, if you buy a ticket from Ticketmaster’s website today, data about your purchase will flow through Kafka.
“Our key data systems are absolutely integrated in and a lot of other systems that are, you would normally think of as less important, are integrated in as well because there was so much value to that integration.”
Unexpectedly, Kafka also became a key tool for dealing with privacy.
“It was already linked into most of the systems that had personally identifiable information and we just did a little bit of extra work to get the rest of them integrated into it,” Smith says. “It became the sort of central nervous system for managing any privacy requests or concerns that might come our way.”
One thing Smith would do differently next time is turn to Confluent much sooner for help training staff on Kafka. “We did do that eventually,” he says.
The training support became particularly important once demand for access to Kafka took off within the organization. “We did spend an extended period of time in between, … where we hadn’t yet done the training, but we were doing the adoption and that was, um, a little bit more chaotic than I would have liked it to be. I’m still cleaning up some of the messes from that time,” he says.