Journal of Management Information Systems

Volume 16 Number 1 1999 pp. 37-62

Choosing Data-Mining Methods for Multiple Classification: Representational and Performance Measurement Implications for Decision Support

Spangler, William E, May, Jerrold H, and Vargas, Luis G

ABSTRACT: Data-mining techniques are designed for classification problems in which each observation is a member of one and only one category. The authors formulate ten data representations that could be used to extend those methods to problems in which observations may be full members of multiple categories. They propose an audit matrix methodology for evaluating the performance of three popular data-mining techniques--linear discriminant analysis, neural networks, and decision tree induction--using the representations that each technique can accommodate. They then empirically test their approach on an actual surgical data set. Tree induction gives the lowest rate of false positive predictions, and a version of discriminant analysis yields the lowest rate of false negatives for multiple category problems, but neural networks give the best overall results for the largest multiple classification cases. There is substantial room for improvement in overall performance for all techniques.

Key words and phrases: data mining, decision support systems, decision tree induction, neural networks, statistical classification