Data Mining -- An Optimization Problem

| Major Steps in Data Mining
Applications Collection
of History Data Separability Test Data Preprocessing: data pretreatment or conditioning is adopted to find the local views of a process, or to delete outliers (noise) from original data set using various methods in mathematical statistics and robust estimation theory, such as pattern recognition, least median squares (LMS) method, least trimmed squares (LTS) method, reweighted least square (RLS) method, outlier diagnosis, single-case diagnosis, Hat matrix, genetic algorithms, and robust regression. Factor Selection - not a text-book
approach: Strictly controlled factors - many important factors are already under close control and vary very little in production process, and they should not be considered as important factors. Pay more attention to other factors. Common vs. specific factors - common factors from text book or common sense knowledge may not apply to a specific process in a specific case. Attention should be directed to finding those factors that are specific to the process under study. These specific factors are more important than the common factors. Evolutionary factors - even with the same process in the same plant, the priority of the identified factors may change. While one factor may be the deciding factor in solving the bottleneck problem, it may be less important than some other factors in solving the product quality problem of the same plant. Minimum set of effective factors: this is the min set of factors that can be used to represent the system under study. An interactive and iterative technique has been developed in MasterMiner to identify this factor set, which has been proved highly effective and efficient in many industrial applications. Factor Multiplicity - a concept borrowed
from molecular chemistry that describes the multiplicity in the phase change of a
substance. In an optimization problem, Y = f(X1, X2, X3,
...,Xi,..., Xn), facators, X1, X2,
X3, ...,Xi,..., Xn,and
target Y are interchangeable. For instance, Xi = g(X1, X2, X3,
...,Y,..., Xn) describes another optimization problem for the same system.
This means that factors are not fixed in a problem, and the challenge is to
identify the best factors for one specific problem using an efficient and effective method
in addition to expert's knowledge and experience. Pattern Recognition
Model Building
Once MaterMiner has calculated the inequalities for the optimal subspace, a back propagation based-neural network module can be used to accurately estimate the model parameters from the clean date of one cluster in the sub-space, and the resultant parameter estimation is superior to that calculated from the fixed data of two or more clusters. A genetic algorithm is built into MaterMiner to fine-tune the neurons and weights in the neural network model. Prediction and Control
The above picture shows the
cross-sectional maps of the 4-dimensional optimal |
Copyright © 1997 - 2000, ZAPTRON Systems, Inc.