Categorization Software Improves Search Capabilities
At one time, researchers speculated that solving such search problems might require artificial intelligence: systems that simulated human thought and could behave like skilled reference librarians. But there is an easier solution?ordering data into categories and subcategories and then having users interact with that structure before looking at the raw results. Consider a hungry New Yorker looking for a place to eat. A search under "New York AND restaurant" that returned only a list of actual eateries would be too long. On the other hand, if the results came packaged in an easy-to-scan collection of restaurant types?Italian, French, Asian and, if necessary, subtypes under that: Korean, Japanese, Vietnamese and so on?the whole set of New York restaurants suddenly becomes navigable.
Categorization also helps with other issues. It solves the overview problem by formatting different categories (restaurant types, locations, price ranges, ratings) side by side, presenting the searcher with a multifaceted, top-down perspective. The same formatting trick helps searchers who don’t quite know what they want by letting them examine query results from several angles at once, interactively.
Category trees are not new. Until recently, however, IT applications required paid humans to think up the category names, define their relationships and write the rules that channeled data into the proper boxes. As a result, the technique was limited to fields with big budgets, such as financial analysis or defense. During the past few years, however, several developments have made it much easier to automate or at least semiautomate categorization, sparking a small revolution in the sophistication of enterprise-level search engines and the number and kinds of users a system can help.
These systems, however, are not exactly plug and play (at least today) and may require significant time to establish rules that ultimately create the final categories. But with proper investment, autocategorization tools can reap significant benefits.
Parsing Parts
In 2000, components distributor Arrow Electronics built and started to sell subscriptions to Ubiquidata, a components database made up of information about more than 23 million items, each with as many as 50 related data elements. The company initially marketed the product to purchasing and material planning professionals within original equipment manufacturers (OEMs). For clients such as those, searching the huge data set was no problem, since they usually knew exactly what they were after, often right down to the manufacturer’s part number.



