Big data technologies have evolved at a torrid pace that shows every sign of continuing in 2015. MapR CEO and co-founder John Schroeder predicts five major developments will dominate big data technology in the new year.
By Thor Olavsrud
In just a few short years, big data technologies have gone from the realm of hype to one of the core disruptors of the new digital age. 2014 saw big data initiatives inside the enterprise increasingly move from test to production. In 2015, big data will push further into the enterprise with even more use cases — specifically real-time use cases — says John Schroeder, CEO and co-founder of Hadoop distribution specialist MapR.
“This is the year that organizations move big data deployments beyond initial batch implementations and into real time,” Schroeder says. “This will be driven by the realization of the huge strides that existing industry leaders and soon-to-be new leaders have already made by incorporating new big data platforms into their analytics with “in-flight” data to impact business as it happens.”
Schroeder says five major developments will dominate 2015.
1. Data Agility Emerges as a Top Focus
Data agility has been one of the big drivers behind the development of big data technologies, as the processes around legacy databases and data warehouses have proven too slow and inflexible for many business needs. In 2015, Schroeder says data agility will become even more central as organization shift their focus from simply capturing and managing data to actively using it.
“Legacy databases and date warehouses are so expensive that DBA resources are required to flatten summarize and fully structure the data,” he says. “Upfront DBA costs delay access to new data sources and the rigid structure is very difficult to alter over time. The net result is that legacy databases are not agile enough to meet the needs of most organizations today.”
[Related: 8 Big Trends in Big Data Analytics]
“Initial big data projects focused on the storage of target data sources,” he adds. “Rather than focus on how much data is being managed, organizations will move their focus to measuring data agility. How does the ability to process and analyze data impact operations? How quickly can they adjust and respond to changes in customer preferences, market conditions, competitive actions and the status of operations? These questions will direct the investment and scope of big data projects in 2015.”
2. Organizations Move from Data Lakes to Processing Data Platforms
In some ways, 2014 was the year of the data lake (or data hub), an object-based storage repository that stores raw data in its native format — whether structured, unstructured or semi-structured — until it’s ready for use. Data lakes have a strong value proposition in that they represent a scalable infrastructure that’s economically attractive (with a reduced per-terabyte cost) and extremely agile.
Schroeder says that the data lake will continue to evolve in 2015 with the capability to bring multiple compute and execution engines to the data lake to process the data in-place. That’s not only more efficient, it creates a single point of governance and a single point of security.
“In 2015, data lakes will evolve as organizations move from batch to real-time processing and integrate file-based, Hadoop and database engines into their large-scale processing platforms,” he says. “In other words, it’s not about large-scale storage in a data lake to support bigger queries and reports; the big trend in 2015 will be around the continuous access and processing of events and data in real time to gain constant awareness and take immediate action.”
3. Self-Service Big Data Goes Mainstream
Advances in big data tools and services means that 2015 will be the year that IT can ease away from being a bottleneck to the access of data by business users and data scientists, Schroeder says.
“In 2015, IT will embrace self-service big data to allow business users self-service to big data,” he says. “Self-service empowers developers, data scientists and data analysts to conduct data exploration directly.”
Previously, IT would be required to establish centralized data structures,” he adds. “This is a time-consuming and expensive step. Hadoop has made the enterprise comfortable with structure-on-read for some use cases. Advanced organizations will move to data bindings on execution and away from a central structure to fulfill ongoing requirements. This self-service speeds organizations in their ability to leverage new data sources and respond to opportunities and threats.”
4. Hadoop Vendor Consolidation: New Business Models Evolve
In early 2013, Intel made a splash with the introduction of its own Hadoop distribution, saying that it would differentiate itself by taking a ground-up approach in which Hadoop was baked directly into its silicon. But just a year later, Intel ditched its distribution and threw its weight behind Hadoop distribution vendor Cloudera instead.
At the time, Intel noted that customers were sitting on the sidelines to see how the Hadoop market would shake out. The number of Hadoop options were muddying the waters. Schroeder believes Hadoop vendor consolidation will continue in 2015 as the also-rans discontinue their distributions and focus elsewhere in the stack.
“We’re now 20 years into open source software (OSS) adoption that has provided tremendous value to the market,” Schroeder says. “Technologies mature in phases. The technology lifecycle begins with innovation and the creation of highly differentiated products and ends when products are eventually commoditized. [Edgar F.] Codd created the relational database concept in 1969 with innovation leading to the Oracle IPO in 1986 and commoditization beginning with the first MySQL release in 1995. So historically, database platform technology maturity took 26 years of innovation prior to seeing any commoditization.”
“Hadoop is early in the technology maturity lifecycle with only 10 years passing since the seminal MapReduce white papers were published by Google,” he adds. “Hadoop adoption globally and at scale is far beyond any other data platform just 10 years after initial concept. Hadoop is in the innovation phase, so vendors mistakenly adopting “Red Hat for Hadoop” strategies are already exiting the market, most notably Intel and soon EMC Pivotal.”
Schroeder believes 2015 will see the evolution of a new, more nuanced model of OSS that combines deep innovation with community development.
“The open source community is paramount for establishing standards and consensus,” he says. “Competition is the accelerant transforming Hadoop from what started as a batch analytics processor to a full-featured data platform.”
5. Enterprise Architects Separate the Big Hype from Big Data
2015 will see enterprise architects take center stage as their improving understanding of the Hadoop technology stack leads to a better defined and more sophisticated statement of requirements for big data applications, including elements like high availability and business continuity.
“As organizations move quickly beyond experimentation to serious adoption in the data center, enterprise architects move front and center into the big data adoption path,” Schroeder says. “IT leaders will be vital in determining the underlying architectures required to meet SLAs, deliver high availability, business continuity and meet mission-critical needs. In 2014 the booming ecosystem around Hadoop was celebrated with a proliferation of applications, tools and components. In 2015 the market will concentrate on the differences across platforms and the architecture required to integrate Hadoop into the data center and deliver business results.