You plan to do business in Minnesota. You need to search the state’s records for soil pollution rules. Good luck. Type the words “soil pollution regulations” into the search box on the state’s North Star portal, and you’ll find more than 70,000 results, including a seemingly endless stream of outdated government studies, arcane court cases and links to several agency databases.
But wait, a column on the browser’s left side promises hope. There, you see a list of “related topics,” including a “natural resources>pollution>soil pollution” heading. You click on it and—Eureka!—only 97 results, including connections to specific, relevant agency fact sheets.
Welcome to the new world of search technology, where results are measured not only in terms of information depth, but also by real-world relevance. “We now have the power to build a topic structure, including topics, subtopics and browsable subcategories,” says Eileen Quam, information architect for the Minnesota Office of Technology in St. Paul. “People can walk their way through our information.”
For enterprises with rapidly expanding websites and portals, the need to make unstructured information—that is, data that hasn’t been formatted, tagged or indexed for fast retrieval—more manageable is undeniable. “It’s a well-known statistic in the search business that 80 percent of corporate data is unstructured versus 20 percent that’s contained in databases and ERP systems,” says Tammy Alairys, a partner with business and technology consultancy Accenture. Alairys leads Accenture’s information management practice, which provides consulting services on enterprise content management (ECM), business intelligence (BI), and search and collaboration technologies.
It’s the promise of getting that 80 percent of data into the hands of people who can put it to good use that’s driving a growing interest in search technology. “There’s an awful lot of information sitting out there that is very difficult to extract value from,” says Alairys. For CIOs, search technology provides a fast and efficient way (and sometimes the only way) to locate and retrieve vital intelligence. Yet, as more enterprises turn to search tools, CIOs are discovering that the technology also comes with strings attached, particularly in the areas of usability and security.
With search engines becoming increasingly powerful and useful, search engine companies are discovering the truth behind the old axiom, “Knowledge is power, and power is money.” That’s why the search engine business is suddenly red hot. Google is making serious enterprise moves. And an array of tech vendors—including giants such as IBM, Microsoft and Yahoo, along with smaller players such as Endeca Technologies, Verity, Vivisimo and X1 Technologies—are all hoping to snag at least a snippet of Google’s success with their own search products. Heightened competition has also encouraged the companies to give away software, such as desktop search tools, in an effort to bring in more customers.
But despite the flurry of activity, Forrester’s Ramos believes that the enterprise search market stands at a crossroads, leading to consolidation with either ECM or BI software. “Its role in ECM is to help organize and retrieve the content under management,” observes Ramos. “In BI, it is the logical equivalent to data mining for text.” Ramos feels that “search vendors must pick a path and strike the right deals or partnerships to move forward.” For CIOs, consolidation promises to ease costs and simplify deployment and management by integrating search technology into a larger product.
Blame It on the Web
Why the heightened stature of search? The democratizing power of the Web has encouraged enterprises to place vast amounts of unstructured online information into the hands of employees, customers and others. But this development comes as a double-edged sword. “People find it very easy to create documents and content using computers, but they have a much harder time finding it later,” says Hadley Reynolds, director of research for Delphi Group. (Of course, you could always control it all with a digital asset management system such as the one used by public TV station WGBH. See “From Tapes to Bits.” But such systems are costly, complicated and ill-suited for day-to-day content such as e-mail messages.)
Basic search engines have no trouble finding information, but they aren’t so good at placing results into context. That’s why a simple search for “anthrax” will uncover links about vaccines, homeland security and a heavy metal band. And searchers are increasingly more demanding. “If people don’t find value in a search technology, they stop using it pretty quickly,” says Alairys. To attract and retain satisfied searchers and enterprises, vendors are jockeying to prove that their tools go beyond simply matching keywords to links. “It’s not good enough to give someone a search result,” says Laura Ramos, a search technology analyst with Forrester Research. “You want to guide them through the self-service process.”
Follow the Leader
Search engine developers rely on secret mixtures of algorithms, user interfaces and other technologies to create unique tools that generate fast, relevant searches. Verity, the company that supplies the Minnesota Office of Technology’s search software, aims to guide searchers with an add-on feature, the Content Classification Engine (CCE), that can be built into its Ultraseek search platform.
The CCE tightly integrates searching and browsing functions. While viewing topics, users can conduct focused queries by searching within a subject area. “The search engine populates specific themes with pages from all the different agencies,” says Minnesota Office of Technology’s Quam. That means if a user types in “disabled education,” the individual isn’t simply directed to documents pulled from the Department of Education’s website, but to relevant content on all state databases.
XML and Security
Web services, in the form of XML, can play a big role in streamlining and improving the accuracy of user searches. XML allows document designers to classify discrete pieces of data, essentially turning unstructured documents into structured documents. The advantage of structured data is that it lets users fine-tune their searches using concepts instead of keywords or phrases. For instance, it is helpful to limit a search for Jaguar to the category transportation>autos>U.K., rather than mammals>cats. And once the user sees a result, he can use classifiers to browse within a category or subcategory for other results that may be conceptually similar, such as X-Type or XKE, even if they don’t contain the keyword Jaguar.
But while XML promises to help untangle online information, it’s no magic elixir. That’s because the technology has yet to be widely applied. “Industry must publish more content with XML structure so that engines can then extract it and find it,” says Forrester’s Ramos. Therefore, while an enterprise can ensure better searches of in-house databases with XML-tagged documents, external searching remains problematic.
On the other hand, there is much information—such as trade secrets and employee payroll records—that enterprises don’t want everyone to access. Yet whenever enterprise databases are made public, there’s always the chance that critical information can inadvertently slip out. Fortunately, virtually all enterprise search engine systems have controls that grant or restrict information access based on the user’s name, title, division, location and other key criteria.
Still, access controls aren’t foolproof, and carelessness and misunderstandings can lead to potentially disastrous security gaps. Google’s desktop search tool, for instance, was found last year to have a serious security flaw (which was quickly patched). Ultimately, it’s up to the CIO to establish firm guidelines on access controls and the type of search software that may be installed on enterprise systems. “There are a lot of [inappropriate] things that can be done if documents are not stored in the right place or if they’re not saved with the appropriate security attributes,” says Alairys.
As snowballing information makes it increasingly difficult to conduct useful searches, a consensus is gradually forming that it may be impossible for any single vendor to provide a search solution that meets the needs of all enterprises and all searchers in all situations. A layered approach, leveraging the strengths of various search tools to build an aggregate solution, may be the best approach for enterprises that need to balance the search requirements of employees, customers, business partners and others. “The whole idea of search being an enterprise utility, like water coming out of the tap, is very much still a myth,” says Delphi’s Reynolds.
America Online, for example, bases its search feature on Google but enhances it with Vivisimo technology, which organizes results into categories to make them easier to browse. Many enterprises also utilize multiple search engines, such as law firms that give their staff attorneys access to Google for general searches and Lexis for prowling through legal citations. Yet deploying and managing several engines can lead to greater up-front and maintenance costs, as well as possible technology conflicts, if the engines aren’t thoroughly tested under various real-world conditions. Many enterprises use Web services at the data level and a portal at the user interface level to tie engines together for different user search scenarios.
As search technology improves, the need to combine individual search tools may diminish. Artificial intelligence-based enhancements, such as natural language interfaces, will eventually enable users to ask questions like, “How old will President George W. Bush be on Nov. 21, 2006?” and receive a precise answer instead of links. (The president, incidentally, will be 60 years old on that date.)
Search researchers have been promising “semantic searches” for years. The World Wide Web Consortium has an entire project dedicated to the effort (www.w3.org/ 2001/sw), but the complex technology has so far eluded the best efforts of the world’s most skilled search experts and remains years away from fruition. For now, enterprises are simply wondering how to use the tools at hand to help people find the information they need. “The problem is,” says Reynolds, “they can’t turn to a search engine to find the answer.”
John Edwards is a freelance writer based in Arizona. He can be reached at firstname.lastname@example.org.