Trading System Development: Data Transforms

By: Murray Ruggiero

The following is an excerpt from Murray Ruggiero's Cybernetic Trading Strategies

There is an almost infinite number of different types of transforms you can use to preprocess data.  Let’s discuss some of the general types of transforms in detail.

Standard Technical Indicators

Standard technical indicators and proprietary indicators used by market analysts are great sources for data transforms for preprocessing.  The most popular indicators to use in a neural network are MACD, stochastics, and ADX.  Usually, these three are used together because stochastics work in trading range markets and MACD works in trending markets.  ADX is used to identify when the market is trending versus not trending.

Another use of technical indicators as data transforms is to further transform the output of one or more indicators to extract what an indicator is trying to reveal about the market.  When using technical indicators in a neural network, I have found that the intermediate calculations used to make the indicators are often powerful inputs for a model.  As an example, let’s look at RSI, which is calculated as follows: RSI = 100 (100/(1 + RS)), where RS = average of net up closes for a selected number of days/average of net down closes for a selected number of days.  We can use each of these averages as input to a neural network.

Data Normalization

Data normalization is a very important data transform in developing preprocessing for almost every modeling method.  Let’s now examine two classic methods of normalization:

  • Normalize the data from 0 to 1 and use the formula as follows:
    X = Value – Lowest (Value, N)/(Highest (Value, n) – Lowest (Value, n))

    If you want to scale between -1 and 1, subtract 0.5 and then multiply by 2.

  • Normalize relative to the mean and standard deviation.  An example of this calculation is as follows:

Percent or Raw Differences

One of the most common transforms used in developing any predictive model is the difference or momentum type of transform.  There are several widely used transforms of this type, as follows:

Percent or Raw Differences Relative to the Mean

Percent or raw differences from the mean are also popular data transforms.  There are many different variations on these transforms, and many different types of moving averages can even be used.  Some of the basic variations for this type of transform are shown in the table below.  The moving average (MA) can be any type or length.

X = Value – MA

X = Log(Value/MA)

X = MAShort – MALong

X = Log(MAShort/MALong)

X = (Value – MA)/Value

Another variation on this theme is the difference between price and a block moving average.  An example of this is as follows: X = Value1 – MA[centered n], where MA[centered n] is a moving average center n days ago.

We can also have a log transform or percent transform on this theme:

X = Log(Value1/MA[centered n])
X = (Value1 – MA[centered n]/MA centered n

Multibit Encoding

The next type of data transform we will discuss is multibit encoding, a type of encoding that is valuable in many different types of transforms.  One good use of it is for encoding day of week or month of year.  When developing a model, you should not code days of the week by using a single number.  For instance, you should not use 2 for Tuesday because a 3 for Wednesday might be considered (incorrectly) a higher value for output.  The effects of the day of week coding are not based on the actual day’s values.  Instead, these values need to be encoded into discrete values, as shown here:

M

T

W

T

F

0

1

0

0

0

This encoding would be for a Tuesday.

Another type of encoding uses a thermometer-type scale.  Here is how we would encode ADX into a thermometer-type encoding.  This encoding would represent an ADX value between 30 and 40:

>10

>20

>30

>40

1

1

1

0

This type of encoding works well when the critical levels of raw input are known.  This encoding also makes a good output transform because if the encoding output from a model is not correct, we know that the forecast may not be reliable.  For example, if one bit was a 0, we could not be sure that ADX is really over 30 (it does not follow our encoding).  The reliability of a forecast can often be judged by designing a multiple bit output – for example, two outputs, one of which is the opposite of the other.  We would take the predictions only when they agree.  This method is frequently used in neural network applications.

Pre-filtering Raw Data before Further Processing

One of my best methods, when I am predicting short-term time frames all the way from daily to intraday data, is to first process the data using a low-lag filter, before applying other data transforms.  Low-lag filters are a special type of adaptive moving average.  For example, a Kalman filter is a moving average with a predictive component added to remove most of the lag.

In my research in this area, I use a moving average called the Jurik AMA, developed by Jurik Research.  I apply it to the raw data with a very fast smoothing constant of three.  This removes a lot of the noise and induces only about one bar of lag.  After applying the Jurik AMA, I then transform the data normally.  When developing a target for these models, I use the smooth data.

Trading System Signals

Another data transform you can use is the trading signals from simple trading systems.  For example, you can use the signals generated by some of the systems found in my book as inputs to a neural network.  To illustrate, if your system is long, output a 1; if it is short, output a -1.  You could also output a 2 for the initial long signal, a -2 for the initial short signal, and just a 1 or -1 for staying long or short.  If you don’t use the signal, the components that generate the signals are often very good transforms for your models.

Correlation Analysis

Intermarket relationships are a very powerful source of preprocessing for your models.  Though intermarket relationships do not always work, by using Pearson’s correlation we can judge how strong the relationship currently is and how much weight to put on it.  One of the classic ways to do this is to take the correlation between another transform and your target shifted back in time, and use that correlation as an input to your neural network.

Outputs of Various Other Modeling Methods

Another powerful method is to use the input produced by various other modeling methods.  For example, you can use the dominant cycle, prediction, or phase produced from memory (MEM), and then apply data transform to those data.  One valuable transform is to take a simple rate of change in the dominant cycle.  This transform is valuable because, without it, when the dominant cycle is getting longer, you will be too early in predicting turning points, and when it is getting shorter, you will be too late.