Amazon Redshift is a fully hosted, petabyte-scale data warehouse service available as part of the Amazon Web Services platform. It was available in beta in November 2012, with a full release in February 2015. Unlike the Amazon RDS hosted database offering, which is a distributed relational database, Redshift is a column-oriented DBMS able to handle large-scale datasets using massive parallel processing.
Here are seven examples of companies that have used Amazon Redshift to transform their business.
The venerable Financial Times (FT) newspaper may be 126-years-old this year, but that doesn’t mean it’s out of date. The international daily has embraced the digital age. Even so, FT CTO John O’Donovan says the paper needed a way to increase the speed, performance and flexibility of its data analysis. The paper now uses AWS Redshift, allowing it to run 450,000 online queries 98 percent faster than its previous traditional data center, while reducing infrastructure costs by 80 percent.
“We can show much more clearly where the revenue is coming from,” O’Donovan says. “We can see what people are liking on the site. We can start to look at how things might be trending and we can correlate that with other interesting information as well so you start to layer up information in a very powerful way.”
Late in 2012, Nokia’s data volumes “literally broke the database,” says Greg Johnson, head of analytics at Nokia. Nokia’s Xpress Internet Services platform provides mobile Internet services to its markets in India, Asia Pacific, Africa and South America. The platform runs on 2,200 servers and collects 800 GB of log data daily.
“We were no longer able to cost-effectively scale the database or do anything useful in terms of queries,” Johnson says. “I started exploring using Redshift to replace our traditional database. In the course of two months, we were able to migrate most of the high data usage from our traditional relational database over to Redshift.”
Johnson says Nokia can now run queries twice as fast and can use its business intelligence tools to mine and analyze its data at a 50 percent costs savings.
HauteLook, acquired by Nordstrom in 2011, provides private sale, members-only, limited-time sale events that offer premium fashion and lifestyle brands at 50 percent to 75 percent off. With more than 20 new sale events beginning each morning and more than 14 million members, HauteLook CTO Kevin Diamond says the data warehouse is essential.
When deciding which way to go with a data warehouse, Diamond says he looked at some competitors and determined they all required software, implementation and “big” hardware. He elected to skip the RFP and jump directly into the Redshift public beta.
ETL was the hardest part, he says, but HauteLook is now “saving a ton” due to having no hardware costs and no maintenance or overhead costs. In fact, he says, the annual costs of Redshift are equivalent to just the annual maintenance of some of the cheaper on-premises options for data warehouses.
The Foursquare location-based social app has 40 million users worldwide and more than 1.5 million businesses use its Foursquare Merchant Platform. It streams hundreds of millions of application logs each day. The company relies on analytics to report its daily usage, evaluate new offerings and perform long-trend analysis. But its database system required a lot of staff time to keep it running and came with high annual licensing costs says Jon Hoffman, a Foursquare software engineer.
“We needed a solution that freed us from licensing fees and let us use our staff time more strategically,” he says.
The company turned to Redshift and BI tool Tableau.
“With Amazon Redshift and Tableau, anyone in the company can set up any queries they like — from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts had in different areas,” Hoffman says. “It’s very flexible.”
He also notes that the company is saving tens of thousands of dollars in licensing costs alone.
Based in Chicago, VivaKi specializes in developing services, tools and next-generation digital platforms. Its parent, Publicis Groupe, has nearly 50 agencies in its worldwide network that create digital advertising campaigns. VivaKi is responsible for working with ad servers, publishers and data management platforms (DMP) to pull data and provide daily campaign effectiveness summaries to the agencies.
“Our customers have specific requests around audience groups and other targets that ad servers cannot provide,” says Zhong Hong, vice president of Infrastructure and Operations at VivaKi. “Furthermore, gathering data to identify how effective campaigns are by geographic location is critical and this data isn’t available in the reports.”
VivaKi built a new DMP called SkySkraper to suit its needs and uses Redshift to pull data into the DMP.
“We needed to load six months’ worth of data, about 10 TB of data, for a campaign,” says Hong. “That type of load would have taken about 20 days with our previous solution. By using Amazon Redshift, it only took six hours to load the data.”
When media site Upworthy started getting serious about capturing and storing its data, it ran head-on into a problem, says Daniel Mintz, director of business intelligence. It had built the site on MongoDB. The NoSQL database was an excellent choice for flexibility in content management, but it lacks support for JOINs.
“So even a relatively simple question like “How many posts about gender diversity did our freelancers curate in April?” isn’t easily answerable, because it would require joining post data with curator data,” Mintz says.
Upworthy was able to use a tool called MoSQL to map its Mongo collections to PostgreSQL tables, but it was clear its PostgreSQL database wouldn’t be able to handle the volume of data Upworthy wanted to tackle. The solution was Redshift.
“From the analysts’ perspective, Redshift was just a slightly modified version of PostgreSQL,” Mintz says. “Except it was blazingly fast, even when running complex queries with several Common Table Expressions and a bunch of JOINs on many millions of rows. And from the engineers’ perspective, Redshift was great because maintenance was handled by AWS, it was easily scalable way beyond where we’re at now, and reserved instances brought our cost per TB down to about $2,000 per year.”
Z2 is a creator of multiplayer, free-to-play mobile games. The company estimates that more than 55 million people have played its games on smartphones and tablets, generating roughly 3 billion gameplay sessions. The company has created custom software and meters to monitor operational data in real-time.
“I need to know that databases and code are functioning correctly and that customers are having a good time,” says Markus Schweig, director of Live Operations.
To capture data for business analytics, Z2 uses an ETL pipeline and Amazon Simple Queue Service (SQS) to process hundreds of TB of event data and store it in Redshift.
“We could do this before, but using Amazon Redshift in combination with a newly designed pipeline allowed us to reduce our analytics-related AWS hosting cost significantly,” Schweig says.