ABSTRACT: This paper reports on conceptual development in the areas of database mining and knowledge discovery in databases (KDD). The authors' efforts have also led to a prototype implementation, called MOTC, for exploring hypothesis space in large and complex data sets. Their KDD conceptual development rests on two main principles. First, they use the crosstab representation for working with qualitative data. This is by now standard in on-line analytical processing (OLAP) applications, and the authors reaffirm it with additional reasons. Second, and innovatively, they use prediction analysis as a measure of goodness for hypotheses. Prediction analysis is an established statistical technique for analysis of associations among qualitative variables. It generalizes and subsumes a large number of other such measures of association, depending on specific assumptions the user is willing to make. As such, it provides a very useful framework for exploring hypothesis space in a KDD context. The paper illustrates these points with an extensive discussion of MOTC.
Key words and phrases: data mining, data visualization, hypotheses exploration, knowledge discovery in databases, OLAP, prediction analysis