Using Machine Induction for Developing Trading Rules

By: Murray Ruggiero

The following is an excerpt from Murray Ruggiero's Cybernetic Trading Strategies

Machine induction methods extract rules from data.  The classic use of machine induction is to develop sets of rules that classify a target output based on input variables.  We can use machine induction methods to develop trading strategies by developing rules that predict the output class of a target.  For example, we can predict whether the market will be higher or lower five days from now.  When developing rules using a machine induction method such as C4.5 or rough sets, which are based on supervised learning, the process is very similar to the one used for a neural network.  Our first step is to select the target we want to predict.  Next, we need to develop preprocessing that is predictive of that target.  The main difference between developing a machine induction application and one using neural networks is that both the inputs and outputs must be made into discrete variables.

When developing our output classes, we use human expertise to either select a discrete set of outputs or convert a continuous output into a series of discrete values.  One very simple but useful method is to use the sign of a standard continuous output class like a five-day percentage change – for example, negative output = -1 and positive output = +1.  When using more complex output classes, we should limit the number of classes to less than 1 class per 500 training cases.

Let’s now discuss how we can use a human expert to develop discrete values for our input variables.  A human expert might set the number of classes and their levels, or just the number of classes.  We can then use statistical analysis or machine learning methods to find the correct levels.  As an example, if we were using Slow K as an input, we could break it into three classes: (1) overbought, (2) neutral, and (3) oversold.  We could set the level based on domain expertise.  Normally, in this case, we would use 30 and 70.  We could also use various statistical and machine learning methods to find these levels.  For example, when analyzing the T-Bond market, I found that the critical levels for Slow K are 30 and 62.  These are close to our standard values, and if we were to collect these statistics across multiple markets, they would probably be even closer to the standard levels, but we can fine-turn performance in a given market using these analytical methods.

One problem with using automatic methods for generating classes is that sometimes the classes will cover a very small range of values and be based on a statistical artifact without any cause-and-effect relationship.  For example, if we have 10 cases when stochastics were between 51 and 53, and all 10 times the market rose, our statistical methods might develop an input class with a range between 51 and 53.  A human expert would know that this is just curve fitting and would not use such a class.  Machine generated classes are usually good, but they should be filtered with the expertise of a human trader.

After developing our input and output classes, we need to divide our data into the development, testing, and out-of-sample sets.  We then apply the machine learning method of data.  If we are using C4.5 or another decision-tree type method, we can develop these trees and then prune their leaves in order to develop the best rules with the fewest terms.  These rules then need to be tested further.  If we are using rough sets, which are currently available only as DataLogic/R from Reduct Systems, we need to assign a roughness level and a precision to our model.  The roughness level controls the complexity of the rules, and the precision determines the accuracy of the rules in identifying the output classes.  If we use a high level of roughness, such as .90, we will develop very simple rules with few terms, and these should generalize well.  This removes the step of pruning that is normal when developing rules based on a commercial decision-tree type product.

After we have developed our rules, we will select the best candidates for trading.  When we have finished this selection process, we need to develop a trading strategy.  For example, if we are predicting whether a market will be higher or lower five days from now, we must decide how we are going to exit the trade.  We can use any number of classic exit methods, or we can just hold our position for the look-ahead period of the target.  We must be careful not to write our exits so that we exit and reenter in the same direction and at the same price on the same day.  This would happen, for example, if we exited on the next open after being in a trade for five days, and our rule for entering the market also generated our exit.  We also must realize that when we predict a classic target like percentage change and use it as a part of a trading strategy, the number of trades will be much less than the number of supporting cases for the rules, because a rule will often be true for several days in a row.