99 Problems But A Backtest Ain’t One

Backtesting is a very important step in strategy development. But if you have ever went through the full strategy development cycle, you may have realized how difficult it is to backtest a strategy properly.

People use different tools to implement a backtest depending on their expertise and goals. For those with a programming background, Quantstrat (R), Zipline, PyAlgoTrade (Python) or TradingLogic (Julia) are sure to be favorite options. For those preferring a retail product that involves less conventional programming, Tradestation or TradingBlox are common options.

One of the problems with using a third party solution is often the lack of flexibility. This doesn’t become apparent until one tries to backtest a strategy that requires more esoteric details. Obviously this will not be an issue backtesting the classics like moving averages or donchian channel type strategies, but I am sure some of you have hit your head on the backtest complexity ceiling more than once. There is also the issue of fill assumption. Most backtests I see posted on the blogosphere (including the ones present on this humble website) assume trade on the close price as a simplifying assumption. While this works well for the purpose of entertaining a conversation on the internet, it is not robust enough to be used as the basis for decision making to deploy significant capital.

The first time one can actually realize how good (bad) his chosen backtesting solution is when the strategy is traded live. However I am always amazed how little some traders pay attention to how closely their backtest match their live results. To some, it is like the strategy is the step following the backtest. I think this is missing on some crucially important part of the trading process, namely the feedback loop. There is a lot to be learned in figuring out where the difference between simulation and live implementation. In addition to the obvious bugs that may have passed through testing, it will quickly become apparent whether your backtest assumptions are any good and whether or not they must be revisited. Ideally backtested results and live results for the period which the overlap should be closely similar. If they are not, one should be asking serious questions and try to figure out where the discrepancies come from. In a properly designed simulation on slow frequency data (think daily or longer) you should be able to reconcile both to the penny. If the backtester is well designed, the difference is probably going to center on the fill price at the closing auction being different from the last traded price which is typically what gets reported as the close price. I always like to pay particular attention to the data I use to generate live signals and compare it to the data fed to the simulation engine to find potential signal differences as I often find that the live implementation trades off data that doesn’t always match with the simulation dataset. Obviously, as the time frame diminishes the problems are magnified. Backtesting intraday trading strategies is notoriously difficult and beyond the scope of this blog. Let’s just say that a good intraday backtester is a great competitive advantage for traders/firms willing to put in the development time and money.

It would be negligent of me to complain about backtesting problem without offering some of the processes that I use to improve their quality and ultimately usability. First I personally chose not to use a third party backtesting solution. I use software that I write, not because it is better than other solutions out there but because it allows me to fully customize all aspects of the simulation in a way that is intuitive to me. That way I can tune any backtest as part of the feedback loop I was referring to earlier to more accurately model live trading. Furthermore, as I refined the backtester over time, it slowly morphed into an execution engine that could be used with proper adapters to trade live markets. Effectively I have a customized backtest for each strategy but they all share a common core of code that forms the base of the live trading engine. I also spend quite some time looking at live fill vs simulated fills and try to reconcile the two.

Please do not think that I am trying to tell you that do-it-yourself solution is the best. I am simply saying that it is the one that fits me best. The point I am trying to make herein is that no matter what solution you decide to use, it is valuable to consider the difference between simulated and live results, who knows perhaps it will make you appreciate the process even more.I would be tremendously interested to hear what readers think on the subject, please share some insight in the comment section below so everybody can benefit.


Hello Old Friend

Reports of my death have been greatly exaggerated ~Mark Twain

Wow, it has been a while. Roughly four years have gone by since the my last post. It might seem like a long time for some, but coming out of college and hitting the ground running as a full-time trader made it seem like the blink of an eye for me. That being said, I have recently come to miss it and have the intention to start blogging again albeit on an irregular schedule that will evolve based on my free time.

What to expect

Obviously since I have been trading full time my skill set has evolved so I can only imagine that the new perspective I hope to bring to the analysis contained moving forward will be more insightful.

You will notice a few changes, the biggest one being that I no longer use R as my main language. I have all but fully moved my research stack to Python, so you can expect to see a lot more of it moving forward. As for the content, I think the focus will remain the same for the most part; algorithmic trading for the Equities markets.

Special Thank You

Finally I want to take the time to thank the readers that kept emailing and kept in touch during my absence from the blogosphere. I can only hope that the number of people that somehow find value in these short articles will grow over time and that I will meet other interesting people. You are after all the reason I write these notes. So a big Thank You to all of you.


2012 Wishes

As we look to 2011 through the rear view mirror, I make a quick divergence from my recent lethargy to send my best wishes for 2012 to all readers. May 2012 be a trading year filled with opportunities you capitalize on.

See you in the new year.


Ensemble Building 101

In continuation with last post, I will be looking at building a robust ensemble for the S&P 500 ETF and mini-sized future. The goal here will be to set-up a nice framework that will be (hopefully) of good use for readers interested in combining systems in creating strategies. In order to make it more tangible, I want to create a simplified example that will be used as we move forward in development. I would encourage readers to actively comment and contribute to the whole process.

Before we do anything, as with any project, I want to have an idea of the scope so I don’t get entangled in an infinite loop of development where I am never satisfied and continuously trying to replace indicator after indicator for a marginal increase in CAGR on the backtest. For this example, I want to use 4 indicators (two with a momentum flavour and two with a mean-reversion flavour), a volatility filter and an environmental filter. The strategy will also have an adaptive layer to it.

Now that the foundation idea has been laid, let’s examine the mechanics at a high level, keep in mind that this is only one way you could about it. Basically I will be evaluating each strategy individually, looking at performance under different volatility environment and also with the global environment filter. Based on historical performance and current environment, exposure will be determined.

This series will have R code for the backtest (still debating using quanstrat/blotter) and the simple example will be available for download. My point is to provide reader a way to replicate results and extend the framework significantly. More to follow!


One Size Does Not Fit All

In our eternal search for the trading Holy Grail it is often tempting to try and find the “ultimate” signal (indicator) and apply it to as many instruments we can. This single solution approach for the most part fails miserably, think of a carpenter with only a hammer in his (or her; QF is an advocate for equal opportunity) toolbox. While making signals adaptive is definitely an improvement however, I think we sometimes miss the point.

Instead of harassing and optimizing a signal ad absurdum to improve the backtest, one would be better served by looking at the big picture. One signal in itself only contains so much information. While there are a lot of good indicators that perform very well by themselves available (the blogosphere is a really rich ecosystem in that regard); their power is only magnified when combined with other signals containing different information. The secret is in the signal aggregation. In other words, in how we form and use an ensemble of signals isolating different pieces of information to build a profitable strategy (note the use of the word strategy as opposed to system, careful wordsmithing aside, the difference is paramount). This is a topic I have been taking a close look at recently and I think the blogosphere is a perfect tribune to share some of my findings. As a starter, here are some points I will be touching on in the upcoming series of post.
1. What are the basic intuitions behind ensembles and why can they help in building trading strategies?

2. How do we isolate and quantify specific pieces of information and then observe their effect on the instruments we trade.

3. How to we evaluate current pertinence of the signals.

4. Finally, how do we aggregate all the useful information and build a strategy from the ground up.

The mechanics are going to be explained using a simplified example for readers to follow along but the intuition will be the same that the one behind the first QF strategy to be tracked real-time on the blog. I still don’t have a fancy name for it but it’ll get one for its official launch.


Update, Milestone, and Unfinished Business

First of all let me apologize for being off the grid for so long and not providing you with any of my geek prose recently. My final university semester classes are coming to an end today and I will resume regular posting after finals. However rest assured that my absence of the blogosphere has not caused quant power atrophy, it was merely a by-product of how busy I was with school and interviewing. Thank you all for sticking up with me during this dry spell.

I recently obtained 50,000 views, while not an incredibly big number compared to the blogosphere’s behemoths (see blogroll), I personally am really happy. I never really thought that this blog would take these proportions and it keeps surprising me by bringing opportunities I never thought would be possible, and for that I must thank you.

Unfinished Business
I know some of you were eagerly waiting for the TAA system post I kept saying would be coming soon; I am sorry to tell you that it will not. My services were hired by a private firm and the intellectual property developed is protected by a non-disclosure agreement. I might however discuss the intuition at a high level if the interest is still present.

Finally some of you might have noticed the new “QF Strategies” tab up top. For now it shines by its emptiness but it will not for much longer. I will be tracking strategies there soon, bear with me.


Model Scalability

When designing a model, an aspect that I often overlook is scalability. First a definition from Investopedia: “A characteristic of a system, model or function that describes its capability to cope and perform under an increased or expanding workload. A system that scales well will be able to maintain or even increase its level of performance or efficiency when tested by larger operational demands.”

Now most of you probably wonder why I would overlook such a crucial aspect of model building. The reason is very simple; I never had to. Most of the models I design are for my personal trading and since I don’t have millions of dollar in capital to trade (yet!), the scalability requirements are very insignificant. Most of my trading is on the mini futures and I only trade a few contracts per symbol. Keeping this in mind, I don’t have to worry too much about slippage when I place my orders since the effect of the order book are negligible. However, chances are I am not going to design models solely for my personal trading during my career.

The scalability requirement for a hedge fund for example is however, very different. Imagine trading a high-turnover strategy on a single symbol, for the sake of example, consider a RSI2 strategy. It is a very short term strategy that has a relatively high turnover for an end-of-day strategy. Now trading this strategy with 50k is feasible (not optimal but not too bad), now think about trading the RSI2 signal on a single symbol with 100mm; very impractical. Think how much slippage would affect the strategy. At time of writing, the SPY opened at 133.02, trading 100mm would end up being about 751,654 shares at open quote. Admittedly, I don’t have the exact number, but I doubt that the order book opened almost a million deep at the ask and therefore we would expect some slippage effect. Presumably, it would be quite significant and would significantly change our expectation of return for our RSI2 strategy.

Now I know that few hedge funds would trade a RSI2 on the SPY alone, it is only a conceptual example to support understanding, and I want to point out that I am not saying that RSI2 is non-scalable (nor that it isn’t). To evaluate scalability, we need robust backtesting clearly estimating the impact of the order book, latency (if intraday), and other relevant factors on the returns. Another angle to consider is scalability across assets, following our RSI2 example, if we allocate 50% to both the SPY and the QQQQ, in theory we reduce the weighted impact of slippage and other transaction costs on a given symbol (ie. the marginal transaction cost per symbol is decreasing when we diversify across assets). However, that effect is not necessary a linear one as nicely explained by Joshua in the comment section below.

Other avenues to consider in scalability are left to the interested reader who can always contact me via email if they desire since I have recently paid closer attention to the issue myself. Furthermore, for readers with similar career desires to mine, remember that models scalability is directly related to employability!