Could Google's 'dataspaces' Reshape Search?

By Chris Kanaracus
Mon, May 19, 2008

IDG News Service —

Google, the company most identified with Web search, is not the leading player behind the firewall, claiming about 9,000 customers are using its enterprise search products. Meanwhile, independent search vendor Autonomy says it has 17,000.

Still, in his recent report "Beyond Search," for Gilbane Group, analyst Stephen Arnold portrays the company as a quietly humming engine of activity, with work under way that could "leapfrog" the current generation of search technology.

Arnold, who closely tracks Google's patent applications, is especially interested in a concept called "dataspaces," which stems from the work of Google researcher Alon Halevy. Dataspaces, in Arnold's view, take "content processing into a new dimension."

"A dataspace should contain all of the information relevant to a particular organization regardless of its format and location, and model a rich collection of relationships between data repositories," Halevy wrote along with two co-authors in a December 2005 paper. "Hence, we model a dataspace as a set of participants and relationships."

"The participants in a dataspace are the individual data sources: they can be relational databases, XML repositories, text databases, Web services and software packages," the paper states at another point. "A dataspace should be able to model any kind of relationship between two (or more) participants."

While other vendors are pursuing similar goals, they cannot compete on scale with Google, according to Arnold.

"Even the most robust content processing systems have not been engineered to handle Google-level content flows. The implication of scale means Google is operating largely without competition from the companies profiled in this study," he wrote in "Beyond Search."

Meanwhile, Google indeed appears to have ambitious search and content-processing projects in the patent pipeline that echo the dataspaces concept.

One in particular, U.S. Patent No. 20070198481, "Automatic Object Reference Identification and Linking in a Browseable Fact Repository," describes an invention that crunches together a wide range of data on an individual or topic into a kind of dossier.

Google declined to comment on patent applications or make Halevy available for an interview.

"We file patent applications on a variety of ideas that our employees come up with," a company spokesman said via e-mail. "Some of those ideas later mature into real products or services, some don't."

But a company executive was willing to paint the company's search in general terms.

"Inside an enterprise, and maybe unlike the Internet, you can know a lot about a user," such as who they report to, said Matthew Glotzbach, director of product management for Google's enterprise division. "There's a lot of empirical information you can derive. All of that can be used to create a very, very rich profile about the user, which can then be used to create a really rich search experience."

Do not expect Google to suddenly bring a game-changing product to market, according to Glotzbach.

"The model is not these kind of big-bang approaches where we work for multiple years and then roll something out. In terms of what we do in enterprise search, you'll see a constant flow, as opposed to one sort of big bang -- here's a whole new thing," he said.

For your IT organization to keep pace with the business, you need a new, faster approach to infrastructure deployment-an approach that increases agility and accelerates time to application value. That's HP Converged Systems. Built on Converged Infrastructure, these systems deliver the industry's first portfolio of pre-integrated, tested, and optimized infrastructure solutions for applications running in virtual, cloud, dedicated, or hybrid environments.
Even though virtualization has brought positive change to enterprise IT over the last decade, some skepticism remains about how valuable virtualization can be in the way companies deliver and run business applications. Uncover the truth about how you can run your business critical applications with confi dence without sacrifi cing
availability or service quality-and at lower costs.
This IDG whitepaper highlights key findings based on the Quickpoll Survey conducted with more than 300 Enterprise and Commercial IT decision makers worldwide about the state of their virtualization of business critical applications. This paper answers such questions as: What drivers are pushing companies to extend virtualization beyond servers? and What value are they realizing? Central to the paper are key results that expose risks of the past (fears of limited ISV support, performance impact) no longer are a factor for companies moving to 80+% virtualized.
This guide focuses on key considerations for IT Architects who are in the process of migrating Java applications from UNIX to Linux as part of their VMware server consolidation project.
This IDC white paper explains how much of the Enterprise IT community is at a crossroads in extending their journey to the private cloud: Companies must virtualize their business critical applications in order to reap the benefits of cloud computing. The paper also includes two case studies and a sidebar highlighting the experiences of three enterprises with virtualizing their business-critical applications, which include Oracle and Microsoft SQL databases, SAP and enterprise Java, and a Microsoft Exchange email system.
This guide provides best practice guidelines for deploying Exchange Server 2010 on vSphere.
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as support considerations
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and disaster recovery and support considerations.
Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere® 5, VMware is helping customers accelerate the deployment of business-critical applications, including Exchange, SQL, SAP and Oracle.
Want to say goodbye to missed SLAs? VMware can help you virtualize mission-critical applications such as Oracle, MS Exchange and SharePoint to achieve dramatic improvements in uptime, performance and responsiveness. In this webcast, we'll discuss the key benefits of virtualizing your agency's most critical applications and Oracle databases as a necessary first step in fulfilling OMB's mandate to move IT services to the cloud. With VMware, you'll be on the way to quick, effective and full compliance.
The complexity, cost and technological bloat of traditional Java EE application servers are often barriers to running a lean and efficient IT organization. Increased need for scalability and rapid application delivery are driving businesses to reconsider the platform they use for application deployment. By combining the portability and agility of the Spring framework with a lightweight application server, your organization can meet business demands while staying within budget constraints. VMware vFabric™ tc Server is a modern, lightweight Java application server based on Apache Tomcat. It improves developer productivity, control and manageability-and is the most flexible platform for virtualizing Java applications and workloads for the cloud. View this webcast to learn about real-world examples of companies that have adopted VMware vFabric tc Server and how to plan for future cloud deployments.
Traditional disaster recovery solutions are often too expensive, complex and unreliable to meet business requirements. As a result, IT departments are hesitant to expand disaster protection beyond their most critical applications, largely because they are uncertain whether the quality of the protection is really worth its cost. VMware vCenter™ Site Recovery Manager 5 is the market-leading disaster recovery product that addresses this situation for organizations of all kinds. It complements VMware vSphere to ensure the simplest and most reliable disaster protection for all virtualized applications.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Resource Center