by Thor Olavsrud

Teradata releases data lake platform to open source

Mar 08, 2017
AnalyticsBig DataData Mining

The Kylo data lake management software platform, available via the Apache 2.0 license, aims to help organizations address common challenges in data lake implementation.

Teradata today released its data lake management software platform to the open source community. The project aims to help organizations address common challenges in data lake implementation, including skill shortages for engineers and administrators, learning and implementing governance best practices and driving data lake adoption beyond engineers.

Teradata is offering the new open source Kylo project under the Apache 2.0 license, and plans to offer services and support for the platform.

Data lakes built on Apache

Kylo evolved from code developed by Teradata company Think Big Analytics over eight years of engagements with Fortune 1000 customers on more than 150 data lake projects. It was built using open source capabilities including Apache Hadoop, Apache Spark and Apache NiFi.

[ Related: 15 data and analytics trends that will dominate 2017 ]

“Open source software has an appeal to users seeking independence, cooperative learning, experimentation and flexibility for customized deployments, Rick Farnell, president of Think Big, said in a statement today.

“Our contribution is all about helping companies build a scalable data lake foundation that can continuously evolve with their business, technology data and analytical goals. We are removing impediments to use data to solve complex business problems and encouraging analytical users to contribute to the growing Kylo community. Going forward, our primary focus as a company is to help our customers create business value through analytics, rather than commodity capabilities. Kylo, along with our Teradata Everywhere approach to software and services, is a great example of our innovative strategy for the future.”

Teradata says data lakes take too long to build, and in the average six to 12 month build cycle, users find that use cases often become out of date. In addition, while the software costs associated with data lakes may be lower, Teradata says engineering costs can mount quickly. When data lakes are successfully created, users often find them difficult to explore.

Great data value and productivity

Teradata says Kylo will help organizations address these challenges, because it integrates and simplifies pipeline development and common data management tasks. That means organizations that leverage Kylo achieve faster time-to-value and greater user adoption and developer productivity. Teradata says Kylo doesn’t require coding, and it offers an intuitive user interface that enables self-service data ingest. Meanwhile, reusable templates help increase productivity.

One major telecommunications company recently implemented Kylo after a large team of 30 data engineers spent months hand-coding data ingestion pipelines. With Kylo, a single individual was able to ingest, cleanse, profile and validate the same data in less than a week, Teradata says.

The Kylo software, documentation and tutorials are now available via the Kylo project website and via the GitHub website. Think Big is offering optional services around Kylo including the following:

  • Kylo support
  • Kylo implementation services
  • Kylo training
  • Kylo managed services