Data Mining

Differentiate between the following terms:

A. Independent data mart and dependant data mart
B. Fact table and dimension table

Chapter 7

1. Differentiate between the following terms:

A. Validation data and test set data
B. Positive correlation and negative correlation
C. Control group and experimental group

2. For each of the following scenarios, state the type I and type II error. Also, decide whether a model that commits fewer type I or type II errors would be a best choice. Justify each answer.

A. A model for predicting if it will snow
B. A model for selecting customers likely to purchase a television

Chapter 8

2. Section 8.1 describes two methods for categorical data conversion. Explain how you would use each method to convert the categorical attribute income range with possible values 10-20k, 20-30k, 30-40k, 40-50k, 90-100k to numeric equivalents which method is most appropriate? The two methods are Neural Networks and Sigmoid function.

3. The average number technique is sometimes used to explain the results of an unsupervised neural network clustering.

A. List the advantages and disadvantages of this approach
B. Do you see similarities between this explanation techniques and the K-Means algorithm?

