Merely storing information in a data warehouse does a company little good, no matter how neatly the data is stacked and organized. Getting information out of the warehouse is what allows organizations to reap the benefits of data warehousing, and data mining is one of the best ways to extract meaningful trends and patterns from a vast pile of data. \n\n\nAlthough data mining is still in its infancy, companies in a wide range of industries -- including retail, finance, medicine, manufacturing, transportation and aerospace -- are already using data mining tools and techniques to take advantage of historical data gathered internally or acquired from other organizations. By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed (see boxed examples). Companies also can use mining techniques to visualize the data, or present it in an easily digestible format, as well as to check for holes in the underlying data store.\n\n\n\nThe demands of data mining easily can overwhelm today's technology and products. Conventional mainframes seldom are able to run brute-force multiple queries on large data sets, and although the memory and processing power of many PCs and workstations are increasing, the available software is not always up to the task. Most analysis routines can handle only small samples of data at a time, making wide-ranging analysis difficult and time-consuming.\n\n\n\nA methodical approach to data mining increases the chances of overcoming those barriers. Here are the main steps of Gartner Group Inc.'s data mining methodology:\n\n\n\n1. Database selection and preparation. To mine data effectively, the warehouse must be set up properly. The first step is to identify the databases and factors to be explored. If possible, a live data dictionary should be created from which required records can be retrieved into the flat files needed by most analysis routines. This step is very complex: The databases of interest may be maintained by multiple departments, on various hardware platforms and operating systems, or in separate locations.\n\n\n\nData preparation involves filling in missing values and correcting errors. The referential integrity controls of modern relational databases have improved data quality, but legacy databases may be incomplete or full of errors. Interpolating missing data can be dangerous, particularly when dealing with small samples.\n\n\n\n2. Clustering and feature analysis. The large database groups defined during the preparation phase are divided further using clustering techniques. That is followed by a more detailed feature analysis to find the factors that most obviously contribute to the formation of the clusters and to determine which factors are involved in attaining particular business goals. Clustering and feature analysis can pare down the problem scope in terms of the number of factors or records to examine.\n\n\n 3. Tool selection. Many data mining tools are available, but most are incomplete and may have to be combined with techniques or systems already developed within an enterprise. Before acquiring a tool or a technology, Gartner Group recommends conducting a thorough analysis. Important questions that must be answered include the following: \n\n\n4. Hypothesis testing and knowledge discovery. This step is most often associated with the term "data mining." During this process, hypotheses are formed and tested, new relationships are discovered and what-if analyses may be performed. Many issues come into play, such as sample size, processing time, complexity of the data and degree of confidence. The output of the data mining process depends on the product and technology used and often is in the form of rules, correlations, prediction models, relationship graphs or decision trees.\n\n\n\n\n5. Knowledge application. In most cases, tested rules created from the discovery process can be added directly to either procedural code or -- if there are many rules and updates are likely -- into a knowledge-based system. Prediction models can often be integrated directly into application code, a particularly easy process with products that output their models in common languages such as C.\n\n\n\nData mining requires substantial human effort and interaction. The mechanisms for steering the process are still relatively new and are inadequate for dealing with the myriad factors and interactions in large data stores. Shrink-wrapped packages may promise enticing ease of use, but in many cases both technology-specific expertise and relevant domain knowledge are necessary .\n\n\n\nRegardless of the technology underlying the data mining process, the value of discovered data -- especially in retail marketing and finance -- is time-sensitive. The first enterprises to exploit the data will have the upper hand in serving and attracting customers.