Homer: Kids, kids. As far as Daddy's concerned, you're both potential murderers.
Backtesting/Optimization PDF Print E-mail

 

                Now that you have a strategy and some data, it is time enter into the backtesting/optimization phase. In my opinion, this is where Matlab trounces the competition. Writing code to backtest your strategy in Matlab is very intuitive, but there are a few pitfalls that I know I have fallen into in the past and I would like to take a minute to warn you about them.

 

 

Discretizing the Data

                So you have your data read into Matlab (if you don’t click on the data acquisition tab above), but if it is intraday data then it is probably not in a useful form. Backtesting on data that is in uniform time intervals is preferable to data that is not in uniform time intervals, as in tick-by-tick intraday data. I suggest you bundle the data based on the amount of granularity your strategy requires. That is, a lower frequencty strategy might need the data bundled into 1 minute intervals while a higher frequency strategy may need the data bundled into 1 second intervals.

                Bundling the data allows you to implement time based strategies more efficiently. A classic example is that of the moving average strategy. Let’s say you want to go long when the 5 minute moving average crosses over the 20 minute moving average- if you do not bundle your data it is going to be very hard to test this. Bundling data also allows you go obtain a more precise Sharpe Ratio as you can easily annualize your returns based on the size of your bundle.

                Bundled data should looks something like (date , price, volume). The way you calculate the price and volume is up to you. Your strategy might call for the price to be the last price executed before a specific time or a WAP (weighted average price) of the stock during the time between two bundles. Different strategies call for different bundling techniques.

 

Look Ahead Bias

                This is a common mistake that every trader has made at least once. Essentially, the look ahead bias comes from make a strategy that trades on information that would not have been available at the time of the trade. A super simple example would be a program that trades buys the SPY when it is 5% below the day’s high and sells at the day’s close then you would be committing a look ahead crime. That is, it is impossible to know what the day’s high will be before the close. Look ahead bias is just trading in the present with data from the future. It can be easily avoided by asking yourself the question

 

Over Fitting Bias

                The over fitting bias goes by many names: the data snooping bias, the data mining bias, the type three error biased, the data dredging bia. Simply put, the over fitting bias is when you over fit a model to a certain amount of data. That is, you arrive at a strategy that gives you a very high Sharpe ratio only because you have tested a huge number of hypotheses against a dataset. The resulting strategy does not have any value in the real world, but worked in the past simply because something had to have worked and you have snooped around the data long enough to find it!

 

Cross Validation the Fitting Biased Cure

                There are two ways to be sure you are not committing the over fitting biased – the scientific one and the common sense one. The common sense one is easy and should be employed first. Ask yourself this question – why does your strategy make money? That is, why should your strategy work? If you can’t answer this question then you potentially have a big problem. There should be a logical basis for every trade your program makes other than “it worked in the past.” I should not here that sometimes strategies do work for reasons that we can’t figure out at the present time. We can separate these strategies from the happy accidents using the methodology in the next section.

                The scientific way to determine if your program is wrought with data fitting bias is called cross-validation. That means testing your strategy on out of sample data. The best way to do this in the real world is to backtest/optimize your strategy on 70% or so of the data you have then test in on the 30% that you held out to see how it performs there. If you are not completely satisfied by the performance on the last 30%, then you can paper trade(Forward Test) your strategy to see how it fares in the real world. I actually recommend that you always paper trade a strategy before you go live with it just to make sure there are no unexpected kinks.  You could also take the data you are going to use to backtest/optimize your strategy and cut it into four pieces based on time. So you should have four datasets and each one should be 1/4th the time of the original dataset. Now you should backtest/optimize with one of these four and then check with the other three.

                Another way to test for a possible data mining bias in your strategy is to test your strategy on other sample data(from the same time but from a different origin). By way of example if you have a program that buys Coke when a particular volume event occurs, then try it on Pepsi and see what happens there. Though the strategy might need to be tweaked based on differences between the two stocks.

 

Transaction Costs

                A lot of people omit transaction costs when they are backtesting a strategy (especially academics, shame on you Ph.D students). I think this is a dangerous practice as a strategy’s cumulative profit and loss can change dramatically when transaction costs are introduced. What I like to do is put in transaction costs as a percentage of the value of the entry and exit price. So I will add a few basis points to the entry price and subtract a few from the exit price (assuming a long strategy here for a short one, reverse). I also like to add in a counter that tracks the number of times my strategy trades.

 

Things to Look for in Backtesting

                When back testing you want to maximize your cumulative profit, Sharpe Ratio while minimizing drawdowns and the number of times you trade (within reason). Remember, the highest Sharpe does not always mean the highest cumulative profit.

 Here is some sample output from a two parameter strategy that I backtested using Matlab.

 

SH =

    6.5973    4.8659    3.2746    2.1934    2.2125
    8.4949    5.9482    3.9258    3.0108    3.1914
    8.8918    6.2442    4.7281    3.3945    4.0030
    8.7027    5.9408    3.6884    3.1583    3.3426
    8.4008    5.7474    3.7208    3.4291    3.6619
    7.9230    5.0652    3.1812    3.0746    4.1792
    7.5886    4.8614    3.5246    2.4515    4.2918
    7.4725    4.8202    2.8709    2.4900    4.8525
    7.6883    4.5197    3.2909    2.5665    4.7693
    7.5066    4.7778    3.2504    2.5638    3.7130
 
A matrix of the Sharpe Ratios

 

trades =

    23    12     6     6     4
    22    10     6     6     4
    19     8     6     4     4
    19     8     6     4     4
    19     8     6     4     4
    19     7     4     4     4
    21     7     4     2     4
    21     7     4     2     4
    21     7     6     2     4
    21     7     6     2     4

 A matrix of how many times each strategy trades.

 

 

A 3-D Graph of those Sharpes

 

 

 

The cumulative Profit and Loss of the best Sharpe

 

 

 

 

Simple Optimization

                The simplest optimization in Matlab is done using nested for loops (one for each parameter.) This can take a long time so if you are lucky enough to have the parallel computing toolbox you should convert those for loops to parfor loops (if possible) to save some time. If it is not possible to use the parallel computing toolbox then you just have to wait it out – I have had optimizations that have taken over 10 hours to run completely!

                I should also note that another way you can optimize a strategy (depending on what kind of trading you are doing) is by using Weka. Weka is an open source project from the University of Waikato (not a made up place, it’s in New Zealand) which allows you to easily build factor models and do a bunch of other stuff. It’s free, you should check it out (http://www.cs.waikato.ac.nz/ml/weka/).