
Market Fluctuation Forecast
 A Fuzzy Time Series Analysis
Approach to Data Mining
(Click here for a DataX™ brochure)
Abstract: Fuzzy time
series analysis offers an innovative way of forecast with higher accuracy than classical
methods. Fuzzy logic allows inclusion of linguistic concepts (unpredictable factors) in
data modeling, and prediction results are fuzzy numbers (soft values) rather than hard
numeric values. Defuzzification gives highly accurate prediction. This method has been
built into DataX™
software suite which is used for the example given to show the efficacy of this method.

The Issue of Forecast (Prediction)
Databases, data marts, and data warehouses are
increasingly used in business, finance, engineering, medicine and many other fields.
They contain tons of business activity data which can be utilized to forecast
(predict) the future trend of the various systems, such as market forecasting, stock price
prediction, and risk management. According to an IBM research, only 7%
of the corporate data are ever used and most of them are untouched. It is believed
that the 97% of data contain valuable information characteristic of the
business activity, and they can be used for forecast and prediction.


Time Series Analysis
A time series is a collection of the historical data of one
system, such as a stock price, a consumer price, and the monthly (or yearly or daily)
revenue of a company. Classical time series theory only deals with the randomness
of the data, but not the fuzziness in the underlying system model.
A time series can be used in two ways for different purposes:

looking backward  use historical data to analyze the previous
behaviors of a system. Applications include diagnosis or recognition
of machine fault or human diseases.


looking forward  use data to predict (forecast) the future
behaviors of a system. Applications include stock or price prediction and
market demand forecast.



Fuzzy Logic Combined with Time Series Analysis ?
Human being's description of physical systems are imperfect at most.
There exists fuzziness (uncertainty) in many aspects of a system model, including lack of
or partial knowledge, incompleteness or imperfectness in data collection,
inconsistency or discrepancies in data representation (such as numeric, logic, or
symbolic), and various human or natural factors such as promotion
activities to hit the sales goal, change in business rules, regulations and policies,
unpredicted competitions, weather changes, natural disasters, international conflicts and
wars, financial market crashes, and etc.
Fuzzy logic works in the gray
area between "Yes" and "No." Fourier analysis is used for
component analysis or cyclic analysis of a deterministic (certain)
physical system, such as the harmonic analysis of a speech signal using a HP
spectrometer. But most practical systems involve uncertain (but not
random) factors which call for special methodologies, such as fuzzy logic, to model and
analyze for better results. Examples of uncertain factors include fluctuations of consumer
prices, stock prices, and sales revenues.
Linear Programinng Problem  Scientists at Zaptron Systems, Inc.
have developed a proprietary technology that combines fuzzy logic theory with Fourier
analysis and time series forecasting. The distinction of this technology is the fact that
it produces fuzzy numbers,
instead of numeric values as by most other methods, as prediction results. In this way the
prediction is highly accurate due to a defuzzification procedure. This
technology has been built into the Tool Base of DataX™ software suite. The technologies
have been validated with realworld data, and the average prediction accuracy is 97%.


Simplified Mathematics of Fuzzy Time Series Analysis Techniques
Given a time series X(t) for a system, where t is the time index and X(t) is the value
(number) the system takes at time t. In general, X(t) can be represented as a linear
sum of the following 4 terms (components):
X(t) = Fl(t) + Fc(t)
+ Fu(t) + e(t)
where
Fl(t) is a
linear function of time, representing a longterm component (the growth
trend)
Fc(t) is the cyclic
component representing the seasonal or periodic
fluctuations of data
Fu(t) is the uncertain
component representing the fuzziness in the data model
e(t) is a model noise term
representing the randomness of the system
By Fourier Series theory, the above model can
represent any time series X(t) with great accuracy. In addition, by introducing the
fuzzy term Fu(t), it is possible to incorporate human experience and expertise (knowledge)
into the model. Using Fourier series theory, one can find out the harmonics  principal components (periods)  that exist
within the data.


Discover Seasonal Components in Business Activity Data
Business activity data contain information that reflect the seasonal
or periodic behaviors of business activities. This seasonal
characteristics are represented by the hidden harmonics (frequencies fi)
that exist in the data sequence, such as fi = 6 months, fi = 18 months, etc. From Fourie
theory, only the first few harmonics contain most energy of a data sequence, and they
sufficient in representing the data information. For most practical purposes, the first n
harmonics, f1, f2, ... and fn, are
used. Fishers's criterion is iteratively applied to determine the number n
of possible hidden harmonics in a data sequence, usually with a significance level of 95%.


Application Example One (realworld data):
Problem description  We were given the data of 5 consumer
prices (chicken, cucumber, eggs, beef and pork) between Jan. 1994 and Jan. 1996 for
one city in Asia . The original prices are shown in the upperleft picture below.
The goal was to use DataX™ to build
computer models that can accurately predict these 5 prices.


Solution  DataX used to perform the following tasks: (see
picture below)

1) Principal component
analysis on each price data, giving up to 3 principal harmonics. The
upperright picture below shows the first harmonic of each data. The picture shows that
the period of the change in the price

(a) for chicken (blue) is 11 months


(b) for cucumber (green) is 13 months


(c) for eggs (red) is 18 months,


(d) for beef (light blue) is 36 months (curve is edged due to lack of
consumption data), and


(e) for pork (yellow) is 6 months



2) Model Building & Validation
(wave analysis)  Based on the given price data (original data), DataX was used to build a
model for each price data separately (assuming no correlation between
these 5 price data). Then these models were used to regenerate price data to validate
the models. If the regenerated data (simulated data) are close to those given, then
the model thus built is acceptable and can be used for prediction of future prices. The
lowerleft picture below shows the original data and the regenerated data for chicken.
They are very close, with error (red) near 0.


3) Forecast  Prediction:
The above models built from using DataX for the 5 consumer prices were used to predict the
future prices, and the prediction results are shown as 5 color curves in
the lowerright picture below.. The realworld data for July 1995 to March 1996 are also
displayed as black curves in the same picture. It is
shown that DataX™ can produce very good
prediction, with average prediction error less than 3%.


Explanation of the 4 pictures shown below:

Upperleft  original data for 5 commodity prices for 5 years (5 data from
from history)


Upperright  principle component analysis (first harmonic for each
of the 5 data)


Lowerleft  build a model for each data and reconstruct data with the
models (green), error < 3%


Lowerright  forecasted prices (color) vs. realworld prices (black) of
the 5 types of goods.



