When media site Upworthy started getting serious about capturing and storing its data, it ran head-on into a problem, says Daniel Mintz, director of business intelligence. It had built the site on MongoDB. The NoSQL database was an excellent choice for flexibility in content management, but it lacks support for JOINs.
“So even a relatively simple question like “How many posts about gender diversity did our freelancers curate in April?” isn’t easily answerable, because it would require joining post data with curator data,” Mintz says.
Upworthy was able to use a tool called MoSQL to map its Mongo collections to PostgreSQL tables, but it was clear its PostgreSQL database wouldn’t be able to handle the volume of data Upworthy wanted to tackle. The solution was Redshift.
“From the analysts’ perspective, Redshift was just a slightly modified version of PostgreSQL,” Mintz says. “Except it was blazingly fast, even when running complex queries with several Common Table Expressions and a bunch of JOINs on many millions of rows. And from the engineers’ perspective, Redshift was great because maintenance was handled by AWS, it was easily scalable way beyond where we’re at now, and reserved instances brought our cost per TB down to about $2,000 per year.”