Analytics in the Cloud: 5 Lessons Learned

In this opinion column, consider some lessons learned from IT leaders who are crunching large data sets in the cloud using Greenplum's technlogy.

By Merv Adrian

CONNECTIONS
Greenplum
Mon, June 15, 2009CIO Every company—from the smallest start-up to the largest firm—needs to be agile in today's market to respond to changing dynamics and new competition. But these days it's often the smaller companies who are better positioned to adapt: as the barriers to entry have decreased, emerging companies now have access to data streams—and techniques for analyzing them—that used to be the exclusive province of the largest companies. At the same time, the CIOs of larger organizations now find themselves as much bound by their legacy systems and data as they are empowered by them. The costs of managing these legacy systems are getting in the way: too much of the budget goes to maintenance, and not enough is left over for new development and technologies.

Nowhere is this dynamic more apparent than with Business Intelligence (BI). As BI once again rises to the top of priority and wish lists, CIOs are struggling with the costs of meeting internal demands while keeping within their budgets, and still finding time for innovation. The costs of proprietary servers and storage devices, as well as the space and energy to manage them, are off the charts and highly visible to every CFO, CTO and procurement professional. Proliferating copies of data into multiple one-off analytical systems—seemingly one for every question to be asked—only adds to the costs, and even new "data appliances" can cost in the tens of millions to scale up as requirements grow.

Clearly, new approaches are needed to cost-effectively scale BI systems while meeting the demand for information on the front lines. Here are some examples of how forward-looking organizations are doing large-scale analytics in the cloud to break the logjam. (For more background on the cloud technology being discussed here, see CIO.com's recent article, "Greenplum Spins 'Enterprise Data Cloud' Vision".)

1. Hold the line with commodity hardware.

Most new analytic data engines run on inexpensive commodity hardware, transforming IT cost models and conventional wisdom about the costs of new systems. As Mark Dunlap, a consultant with Evergreen Technologies and a veteran of massive data warehouse projects at Amazon and Fox Interactive, puts it, "If you're using proprietary hardware, you're in a losing battle. Sooner or later, whatever company's developing that technology will not be able to keep up. We've seen it over and over and over again—they won't keep pace with what commodity systems are doing."

2. Buy capacity when you need it, not according to a closed appliance size

Clint Johnson, VP of Business Intelligence at Zions Bancorporation, says he's avoiding locked-in purchase models as they tackle massive data challenges. "We like the ability to add hardware easily, incrementally," says Johnson. "Specialized appliances we looked at scaled in very specific size increments." Not only are those new purchases large, they may be substantially greater than near-term needs—but payment is not scaled to usage, it's by total capacity.

3. Unused server power is a priceless resource -- use it.

Typical capacity utilization rates on distributed servers used for BI applications or data marts are often at 20 percent or below, leaving substantial system power unused. Newer software can harness that power with effective provisioning strategies. Brian Dolan, Director of Research Analytics at the Fox Audience Network, says, "With my Greenplum [cloud-based] database, I get to share 40 nodes with the production system. I use them when I need them, and then I give them back." Building "sandboxes" as needed—mapping servers (or cores) and data stores into the form needed—addresses the task at hand efficiently. A well-designed server pool, with the right software for flexible provisioning, becomes your internal "cloud."

4. Keep asking, keep changing -- and keep the data.

New practices in BI are echoing the "agile" methodologies programmers are finding effective. Complex techniques, statistical analyses, and new analytical models emerge and disappear. For example, Ryan Hawk, T-Mobile Director of Information Management, and his analytics team needed to build models of telecom usage—propensity to churn, revenue generation, and more— but were challenged because "data is a business case—we have to decide what we can afford to store on our MPP systems," Hawk says. "The hardest thing is having to purge data every 60 days—you can't do much trending."

By shifting their data warehouse into an agile, virtualized infrastructure, T-Mobile now has flexible access to more data and can analyze and rethink at will. Like Fox, they're able to build analytical "sandboxes" on-demand to discover new questions. Grab data to explore those questions. Tear it down, and do it again. Data is the other element of the "cloud;" keep it where you need it, and use as appropriate.

5. Run programs "close to the data."

Dolan's team at Fox might work with two weeks' worth of data: 100 billion lines, 10s of terabytes. Exporting, transforming, moving and distributing that data in chunks (extract, transform and load: ETL style), constrained by bandwidth and system load factors, used to take 3 to 4 days. Rebuilding all the joins, indexes and other structures within the data would consume another day or two. But with new in-database analytics technology, Fox can run programs directly in the database, eliminating the bottlenecks standing between his team and business insights. According to Dolan, "Inside our Greenplum database, setting up two weeks' worth of data takes us about 20 minutes."

Exploiting information has become an imperative for all businesses, and this continues to become even more important as data growth accelerates and new streams of precious information emerge. Supporting the teams who will provide agile response is a competitive necessity.

Merv Adrian is principal at consulting firm IT Market Strategy.


Loading...
Cloud Computing MarketSpace
The Benefits of Two Factor Authentication
Get recommendations on evaluating, cost-justifying, and implementing two factor authentication. Learn more »
The Argument for In-the-Cloud Authentication
The advantages of cloud-based, two-factor authentication continue to gain favor. Learn more »
Cloud-Based Authentication for Next-Generation Extranets
This paper makes the case for implementing greater security for the new social media enabled extranets. Learn more »
Download Forrester Research on Google
Download the independent research report comparing the costs of email from Google and other providers. Learn more »
Cloud Computing: What are its payoffs and pitfalls?
Cloud computing frees up budgets hand-cuffed by IT expenses. Learn more »
Gain Productivity with Cloud Computing
Learn about four organizations who gained savings, productivity and innovation with Google Apps. Learn more »
Cloud-Based Email Management
Who doesn't want cost-effective, efficient email management? But trusting a third-party to manage this essential business tool has been slow to catch on. Until now... Learn more »
 
SPONSORED LINKS
 

Making Consumer Two-Factor Authentication Simple and Cost-Effective

Cloud-Based Authentication for Next-Generation Extranets

Cloud Computing--What is its Potential Value for Your Company?

Should Your Email Live In The Cloud? A Comparative Cost Analysis

Return on Information: Google Enterprise Search pays you back

Cut Costs & Green Your IT Operations with PC Power Management

White Paper: 4 Customer Service Myths

White Paper: Managed Security for a Not-So-Secure World

White Paper: 5 Best Practices for Smartphone Support

Global Research: CIOs Weigh In On Virtualization

5 Key Virtualization Management Challenges

Secure Email and Web-Based Communication from Evolving Attacks

WagerWorks Takes Fraudsters Out of the Game using iovation

Seven Design Requirements for Web 2.0 Threat Protection

Increase UPS efficiency without sacrificing protection.

Learn how advanced forecasting tools can deliver significant business results for global corporations.

Lower IT Costs with Oracle Database 11g Release 2

White Paper: Visibility and the New Normal of Mobile Work

Taking the Service Desk to the Next Level

Learn about The Information Technology Infrastructure Library.

Return on Information: Google Enterprise Search pays you back. Get the facts.

VMware. The source for Business Infrastructure Virtualization.

ShoreTel tells businesses to untangle from competitors' complexity and turn to its brilliantly simple UC solution

Top Five CIO Challenges

Read the RSA report: Security for Business Innovation

Mining the Cloud to Ease the Enterprise Compliance Burden

Solve Five Key IT Security Challenges with Cloud-Based Authentication

Cloud Computing--Latest Buzzword or a Glimpse of the Future?

Upgrading to VMware vSphere with vWire

Maximizing website Return on Information with high-quality search

See how AT&T can help protect your network.

Webcast: Unleashing the Power of Customer Data

White Paper: Improve Agility with Operational Responsiveness

White Paper: Legacy Tools: Not Built for the Helpdesk

Taking a Seat at the Executive Table: The Reality of Virtualization

White Paper: Next Generation Remote Infrastructure Management

Keeping Your Members Safe from Online Scams and Predators

The Total Economic Impact of Network Security Intrusion Prevention

Generation Remote Infrastructure Management - Changing the Paradigm

Cloud-Based Email Management: Opinion Shifts In Favor

eBook: How Can You Make Your People Productive Anywhere?

Achieving Business Agility with Application Grid

Ready to virtualize tier one applications? Check your virtualization maturity.

Seven Ways ITIL Can Help You in an Economic Downturn

Tips for successful virtualization management.

AT&T Synaptic Storage as a Service. Expand on demand

Trend Micro ranked #1 against real-world malware. Read more.

Webinar: Jump-start your in-house e-discovery with Ringtail QuickCull from FTI Technology

Streamline IT Costs. Boost Performance with WAN Optimization.

Build your 1st app FREE with Force.com

 
 
RESOURCE CENTER