OpenPipeline Seeks to Ease Document Prep for Search

By Chris Kanaracus
Wed, April 30, 2008

IDG News Service —

Enterprise search vendor Dieselpoint is behind a new open-source project centering on a document "pipeline" -- or as the Chicago company's CEO, Chris Cleveland, puts it, "all the boring stuff you need to make enterprise search work."

Enterprise search implementations often cover an array of document sources and components; pipelines allow companies to standardize the processing of information before it gets pushed into a search-engine indexer.

"We're connecting the crawler companies to the text analytic companies to the search engine companies," Cleveland said.

Dieselpoint was having trouble integrating its own pipeline with third-party document analyzers and content connectors, and has open-sourced it as a basis for the project, which is dubbed OpenPipeline.

Its Web site is scheduled to open to the public on Monday, and a fully functional version of the software will be downloadable under the Apache 2.0 license. It is available under a commercial license as well, according to the site.

The software features a point-and-click user interface and provides a number of connectors, including Web and SQL crawlers. It also supports a number of commercial connectors for products such as SharePoint, Exchange and a number of portals.

Dieselpoint is pursuing the project both to make bigger, more complex implementations easier and in hopes that it will draw some customers to its search engine.

"The single biggest barrier to adoption of enterprise search is doing integration," Cleveland said. "Of course, it means enormous consulting engagements, so it's a source of revenue for the industry, but it's a deterrent."

While major search vendors have pipelines, they are "all proprietary and all closed," he said.

A number of other vendors and consultants have signed on to the effort's advisory board. They include Alias-i, Applied Relevance and Raritan Technologies. Cleveland is anticipating more companies will join soon.

Conceptually, an open-source pipeline makes sense for the industry on the whole "because each component is worthless on its own," he suggested.

Guy Creese, an analyst with Burton Group, compared OpenPipeline to an existing project.

"IBM attempted to fix this issue with UIMA [Unstructured Information Management Architecture], its framework for letting multiple vendors work together on a text analytics pipeline. However, UIMA has not done especially well in the market," he said via e-mail. "It's unclear whether that's due to the complexity of UIMA or the fact that the market isn't quite there yet (I believe it's the latter)."

"In short, OpenPipeline is an interesting, open-source alternative to UIMA. However, its appeal will still remain small in the market, as many enterprises aren't at the point where they need to mix and match text analytics modules," he added.

But Cleveland countered that even basic aspects of an enterprise search implementation can involve a lot of "drudgery," which OpenPipeline can help alleviate: "It's the simple stuff. 'Can I get [data] out of the system, add security to it and send it to the search engine?'"

For your IT organization to keep pace with the business, you need a new, faster approach to infrastructure deployment-an approach that increases agility and accelerates time to application value. That's HP Converged Systems. Built on Converged Infrastructure, these systems deliver the industry's first portfolio of pre-integrated, tested, and optimized infrastructure solutions for applications running in virtual, cloud, dedicated, or hybrid environments.
Even though virtualization has brought positive change to enterprise IT over the last decade, some skepticism remains about how valuable virtualization can be in the way companies deliver and run business applications. Uncover the truth about how you can run your business critical applications with confi dence without sacrifi cing
availability or service quality-and at lower costs.
This IDG whitepaper highlights key findings based on the Quickpoll Survey conducted with more than 300 Enterprise and Commercial IT decision makers worldwide about the state of their virtualization of business critical applications. This paper answers such questions as: What drivers are pushing companies to extend virtualization beyond servers? and What value are they realizing? Central to the paper are key results that expose risks of the past (fears of limited ISV support, performance impact) no longer are a factor for companies moving to 80+% virtualized.
This guide focuses on key considerations for IT Architects who are in the process of migrating Java applications from UNIX to Linux as part of their VMware server consolidation project.
This IDC white paper explains how much of the Enterprise IT community is at a crossroads in extending their journey to the private cloud: Companies must virtualize their business critical applications in order to reap the benefits of cloud computing. The paper also includes two case studies and a sidebar highlighting the experiences of three enterprises with virtualizing their business-critical applications, which include Oracle and Microsoft SQL databases, SAP and enterprise Java, and a Microsoft Exchange email system.
This guide provides best practice guidelines for deploying Exchange Server 2010 on vSphere.
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as support considerations
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and disaster recovery and support considerations.
Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere® 5, VMware is helping customers accelerate the deployment of business-critical applications, including Exchange, SQL, SAP and Oracle.
Want to say goodbye to missed SLAs? VMware can help you virtualize mission-critical applications such as Oracle, MS Exchange and SharePoint to achieve dramatic improvements in uptime, performance and responsiveness. In this webcast, we'll discuss the key benefits of virtualizing your agency's most critical applications and Oracle databases as a necessary first step in fulfilling OMB's mandate to move IT services to the cloud. With VMware, you'll be on the way to quick, effective and full compliance.
The complexity, cost and technological bloat of traditional Java EE application servers are often barriers to running a lean and efficient IT organization. Increased need for scalability and rapid application delivery are driving businesses to reconsider the platform they use for application deployment. By combining the portability and agility of the Spring framework with a lightweight application server, your organization can meet business demands while staying within budget constraints. VMware vFabric™ tc Server is a modern, lightweight Java application server based on Apache Tomcat. It improves developer productivity, control and manageability-and is the most flexible platform for virtualizing Java applications and workloads for the cloud. View this webcast to learn about real-world examples of companies that have adopted VMware vFabric tc Server and how to plan for future cloud deployments.
Traditional disaster recovery solutions are often too expensive, complex and unreliable to meet business requirements. As a result, IT departments are hesitant to expand disaster protection beyond their most critical applications, largely because they are uncertain whether the quality of the protection is really worth its cost. VMware vCenter™ Site Recovery Manager 5 is the market-leading disaster recovery product that addresses this situation for organizations of all kinds. It complements VMware vSphere to ensure the simplest and most reliable disaster protection for all virtualized applications.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Resource Center