After considering tentative solutions to Retro's three data mining problems, you decide to look into a full-scale project's data infrastructure requirements. Retro's data infrastructure includes the computer hardware and software needed to implement a particular project or initiative. Because you already considered some of the data mining project issues related to Retro's warranty service problem, you believe continuing with this project will help you further understand a data mining project.

Simon Bigelow, Retro's director of information technology (IT), is happy to talk with you over the telephone to discuss these issues. He indicates that Retro's data warehouse receives data from several legacy databases. Statistical analysis programs run off the same server that houses the data warehouse. Financial data is stored and analyzed on a separate server.

Although Simon is supportive of data mining and all that it can do for the company, he is concerned about how it will affect the IT department, which is already stretched to its limits. "I'm worried," Simon tells you, "about running data mining software on existing Retro servers. It could stretch the data warehouse too far. We may not be able to spare the computer cycles to run data mining software on it as well. We already run a number of online analytical processing (OLAP) activities to process data for Retro."

Your discussion with Simon highlights how hardware and software issues may be a constraint to the implementation of a data-mining project. You begin gathering information about the technical requirements of data mining

From: Simon Bigelow, Director of Information Technology
Subject: Retro's Data Infrastructure

Please see attached file.

1. Given the scope of the warranty service problem, will Retro need to acquire additional hardware and software over and above that which it already owns?

Yes, and let's look at the evidence for this.

Data mining has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data" [1] and "The science of extracting useful information from large data sets or databases" [2]. Although it is usually used in relation to analysis of data, data mining, like artificial intelligence, is an umbrella term and is used with varied meaning in a wide range of contexts. A simple example of data mining is its use in a retail sales department. If a store tracks the purchases of a customer and notices that a customer buys a lot of silk shirts, the data mining system will make a correlation between that customer and silk shirts. The sales department will look at that information and begin direct mail marketing of silk shirts to that customer. In this case, the data mining system used by the retail store discovered new information about the customer that was previously unknown to the company. Used in the technical context of data warehousing and analysis, data mining is neutral. However, it sometimes has a more pejorative usage that implies imposing patterns (and particularly causal relationships) on data where none exist. This imposition of irrelevant, misleading or trivial attribute correlation is more properly criticized as "data dredging" in the statistical ...

