|
The Data Evolution:Data
-> Databases ->Data Patterns -> Data Mining/Data Fusion -> Data Models ->
Data
Related Technologies:
- Correlation, association
- Clustering
- Factor analysis
- linear discrimination, logistic regression
- Trend prediction & forecasting
- Neural networks
- Genetic algorithms
- Fuzzy logic
- Uncertainty reasoning (Dempster-Shaffer,
rough sets)
- Bayessian nets
- Hyper space data mining
Wide Applications:
- Internet (cookies, profiler, shopping cart)
- Circuit design & optimization (EDA)
- Traffic prediction/scheduling (wireless
nets)
- Semicon process design & optimization
- Machine and equipment diagnostics
- Customer support and warranty forecast
- Advanced materials/medals design
- Petrochemical, chemical - chemometrics
- Biomedical, pharmaceutical
- Defense and space applications
|
Financial database applications
- Portfolio/Investment analysis
- Consumer price/futures prediction
- Credit/bank/insurance fraud detection
- Automated fraud explanation
- Consumer preference analysis/forecast
- Customer interest profiling
- Market research & services
- Stock prediction (hard!)
- Econometrics
|
The Common Issue - find a model to
describe the relationship in data

The Catch 21 Problem:
Data Pattern <--?-->
Data Model
Can you see it? If not, mining in a
hyperspace is needed

Principal Component Analysis (PCA)
- Not suited for nonlinear cases:
In general no good separation of data is achieved by traditional PCA or Fisher method in
nonlinear cases. See the first picture below where the red box is the 2-D space that
contains both red (good) and blue (bed) data points. A model developed using the data in
the red box would not be a good representation of the underlying process since both red
and blue data points are used in building the model.

Fig-1 PCA - No separation of
data. |
Principal Component Analysis
(PCA) or Kohenum-Louve Transform: Projection in maximum separable direction. Good for linear, Gaussian cases
without noise. All data are used in building a model.
Fisher's Method:
Line projection with maximum distance
between clusters. Result is similar to that of PCA.
An example is in Fig-1. |

Fig-2 Good separation by
hidden projection of MasterMiner |
MasterMiner: Based on a projective geometry (hidden projection) method. It
is well suited for nonlinear (or linear), non-Gaussian (or Gaussian) cases with noise.
Only a sub set of data are used in building a model for the underlying process. It's data
separability is superior to that of either PCA or Fisher. For the same data, MasterMiner
gives much better data separation.
For comparison with PCA, see result in Fig-2. |
Mathematical model built from the
date in the red box generated by MasterMiner:
When good separation is achieved, a mathematical model for the process
data can be readily generated from the data points in the red box shown in Fig-2 above.
Linear regression is used to model these data points in a sub-space, the picture
below shows 4 inequalities that separate the space and 2 linear equations that represent
the data model.

tree
|