Market Fluctuation Forecast
-- A  Fuzzy Time Series Analysis Approach to Data Mining
(Click here for a DataX™ brochure)

Abstract: Fuzzy time series analysis offers an innovative way of forecast with higher accuracy than classical methods. Fuzzy logic allows inclusion of linguistic concepts (unpredictable factors) in data modeling, and prediction results are fuzzy numbers (soft values) rather than hard numeric values. Defuzzification gives highly accurate prediction. This method has been built into DataX™ software suite which is used for the example given to show the efficacy of this method.

The Issue of Forecast (Prediction)
Databases, data marts, and data warehouses are increasingly used in business, finance, engineering, medicine and many other fields.   They contain tons of business activity data which can be utilized to forecast (predict) the future trend of the various systems, such as market forecasting, stock price prediction, and risk management.  According to an IBM research, only 7% of the corporate data are ever used and most of them are untouched.  It is believed that the 97% of data contain valuable information characteristic of the business activity, and they can be used for forecast and prediction.

Time Series Analysis
A time series is a collection of the historical data of one system, such as a stock price, a consumer price, and the monthly (or yearly or daily) revenue of a company. Classical time series theory only deals with the randomness of the data, but not the fuzziness in the underlying system model.  A time series can be used in two ways for different purposes:

looking backward - use historical data to analyze the previous behaviors of a system. Applications include diagnosis or recognition of machine fault or human diseases.

looking forward - use data to predict (forecast) the future behaviors of a system. Applications include stock or price prediction and market demand forecast.

Fuzzy Logic Combined with Time Series Analysis ?  
Human being's description of physical systems are imperfect at most. There exists fuzziness (uncertainty) in many aspects of a system model, including lack of or partial knowledge, incompleteness or imperfectness in data collection,   inconsistency or discrepancies in data representation (such as numeric, logic, or symbolic), and various human or natural factors such as promotion activities to hit the sales goal, change in business rules, regulations and policies, unpredicted competitions, weather changes, natural disasters, international conflicts and wars, financial market crashes, and etc.

Fuzzy logic works in the gray area between "Yes" and "No."  Fourier analysis is used for component analysis or cyclic analysis of a deterministic (certain) physical system, such as the harmonic analysis of a speech signal using a HP spectrometer. But most practical systems involve uncertain (but not random) factors which call for special methodologies, such as fuzzy logic, to model and analyze for better results. Examples of uncertain factors include fluctuations of consumer prices, stock prices, and sales revenues. 

Linear Programinng Problem - Scientists at Zaptron Systems, Inc. have developed a proprietary technology that combines fuzzy logic theory with Fourier analysis and time series forecasting. The distinction of this technology is the fact that it produces fuzzy numbers, instead of numeric values as by most other methods, as prediction results. In this way the prediction is highly accurate due to a defuzzification procedure. This technology has been built into the Tool Base of  DataX™ software suite.  The technologies have been validated with real-world data, and the average prediction accuracy is 97%.

Simplified Mathematics of Fuzzy Time Series Analysis Techniques
Given a time series X(t) for a system, where t is the time index and X(t) is the value (number) the system takes at time t.  In general, X(t) can be represented as a linear sum of the following 4 terms (components):

X(t) = Fl(t) + Fc(t) + Fu(t) + e(t)


Fl(t) is a linear function of time, representing a long-term component (the growth trend)

Fc(t) is the cyclic component representing the seasonal or periodic fluctuations of data

Fu(t) is the uncertain component representing the fuzziness in the data model

e(t) is a model noise term representing the randomness of the system

By Fourier Series theory, the above model can represent any time series X(t) with great accuracy.  In addition, by introducing the fuzzy term Fu(t), it is possible to incorporate human experience and expertise (knowledge) into the model. Using Fourier series theory, one can find out the harmonics - principal components (periods) - that exist within the data.

Discover Seasonal Components in Business Activity Data
Business activity data contain information that reflect the seasonal or periodic behaviors of business activities.  This seasonal characteristics are represented by the hidden harmonics (frequencies fi) that exist in the data sequence, such as fi = 6 months, fi = 18 months, etc. From Fourie theory, only the first few harmonics contain most energy of a data sequence, and they sufficient in representing the data information. For most practical purposes, the first n harmonics, f1, f2, ... and fn, are used.  Fishers's criterion is iteratively applied to determine the number n of possible hidden harmonics in a data sequence, usually with a significance level of 95%.

Application Example One (real-world data):
Problem description - We were given the data of 5 consumer prices (chicken, cucumber, eggs, beef and  pork) between Jan. 1994 and Jan. 1996 for one city in Asia .  The original prices are shown in the upper-left picture below. The goal was to use DataX™ to build computer models that can accurately predict these 5 prices. 

Solution - DataX used to perform the following tasks: (see picture below)

1) Principal component analysis on each price data, giving up to 3 principal harmonics. The upper-right picture below shows the first harmonic of each data. The picture shows that the period of the change in the price

(a) for chicken (blue) is 11 months

(b) for cucumber (green) is 13 months

(c) for eggs (red) is 18 months,

(d) for beef (light blue) is 36 months (curve is edged due to lack of consumption data), and

(e) for pork (yellow) is 6 months

2) Model Building & Validation (wave analysis) - Based on the given price data (original data), DataX was used to build a model for each price data separately (assuming no correlation between these 5 price data).  Then these models were used to re-generate price data to validate the models.  If the re-generated data (simulated data) are close to those given, then the model thus built is acceptable and can be used for prediction of future prices. The lower-left picture below shows the original data and the re-generated data for chicken.   They are very close, with error (red) near 0.

3) Forecast - Prediction: The above models built from using DataX for the 5 consumer prices were used to predict the future prices, and the prediction results are shown as 5 color curves in the lower-right picture below.. The real-world data for July 1995 to March 1996 are also displayed as black curves in the same picture. It is shown that DataX™ can produce very good prediction, with average prediction error less than 3%.

Explanation of the 4 pictures shown below:

Upper-left - original data for 5 commodity prices for 5 years (5 data from from history)

Upper-right - principle component analysis (first  harmonic for each of the 5 data)

Lower-left - build a model for each data and reconstruct data with the models (green), error < 3%

Lower-right - forecasted prices (color) vs. real-world prices (black) of the 5 types of goods.

w11.gif (23879 bytes)


|Home|Forecast|Model Building|Business-Finance Applications|
|Demand Prediction|Purchase Preference Matching|Simulation|


Send us questions or comments via email.
Last updated September 05, 2000,  
Copyright 1997-2000, ZAPTRON Systems, Inc. All rights reserved.