99 Problems But A Backtest Ain’t One

Backtesting is a very important step in strategy development. But if you have ever went through the full strategy development cycle, you may have realized how difficult it is to backtest a strategy properly.

People use different tools to implement a backtest depending on their expertise and goals. For those with a programming background, Quantstrat (R), Zipline, PyAlgoTrade (Python) or TradingLogic (Julia) are sure to be favorite options. For those preferring a retail product that involves less conventional programming, Tradestation or TradingBlox are common options.

One of the problems with using a third party solution is often the lack of flexibility. This doesn’t become apparent until one tries to backtest a strategy that requires more esoteric details. Obviously this will not be an issue backtesting the classics like moving averages or donchian channel type strategies, but I am sure some of you have hit your head on the backtest complexity ceiling more than once. There is also the issue of fill assumption. Most backtests I see posted on the blogosphere (including the ones present on this humble website) assume trade on the close price as a simplifying assumption. While this works well for the purpose of entertaining a conversation on the internet, it is not robust enough to be used as the basis for decision making to deploy significant capital.

The first time one can actually realize how good (bad) his chosen backtesting solution is when the strategy is traded live. However I am always amazed how little some traders pay attention to how closely their backtest match their live results. To some, it is like the strategy is the step following the backtest. I think this is missing on some crucially important part of the trading process, namely the feedback loop. There is a lot to be learned in figuring out where the difference between simulation and live implementation. In addition to the obvious bugs that may have passed through testing, it will quickly become apparent whether your backtest assumptions are any good and whether or not they must be revisited. Ideally backtested results and live results for the period which the overlap should be closely similar. If they are not, one should be asking serious questions and try to figure out where the discrepancies come from. In a properly designed simulation on slow frequency data (think daily or longer) you should be able to reconcile both to the penny. If the backtester is well designed, the difference is probably going to center on the fill price at the closing auction being different from the last traded price which is typically what gets reported as the close price. I always like to pay particular attention to the data I use to generate live signals and compare it to the data fed to the simulation engine to find potential signal differences as I often find that the live implementation trades off data that doesn’t always match with the simulation dataset. Obviously, as the time frame diminishes the problems are magnified. Backtesting intraday trading strategies is notoriously difficult and beyond the scope of this blog. Let’s just say that a good intraday backtester is a great competitive advantage for traders/firms willing to put in the development time and money.

It would be negligent of me to complain about backtesting problem without offering some of the processes that I use to improve their quality and ultimately usability. First I personally chose not to use a third party backtesting solution. I use software that I write, not because it is better than other solutions out there but because it allows me to fully customize all aspects of the simulation in a way that is intuitive to me. That way I can tune any backtest as part of the feedback loop I was referring to earlier to more accurately model live trading. Furthermore, as I refined the backtester over time, it slowly morphed into an execution engine that could be used with proper adapters to trade live markets. Effectively I have a customized backtest for each strategy but they all share a common core of code that forms the base of the live trading engine. I also spend quite some time looking at live fill vs simulated fills and try to reconcile the two.

Please do not think that I am trying to tell you that do-it-yourself solution is the best. I am simply saying that it is the one that fits me best. The point I am trying to make herein is that no matter what solution you decide to use, it is valuable to consider the difference between simulated and live results, who knows perhaps it will make you appreciate the process even more.I would be tremendously interested to hear what readers think on the subject, please share some insight in the comment section below so everybody can benefit.


15 thoughts on “99 Problems But A Backtest Ain’t One”

  1. I would be interested to know how you would model limit order fills without access to order book data. Conservatively you can wait for a trade-through, and optimistically you can wait for a trade-at, but in between there seems to be some room for creativity…

    1. Hi Experquisite,

      Thank you for commenting. Backtesting limit orders is obviously difficult. I tend to generally err on the side of caution and wait for trade through when my data doesn’t really allow me to estimate my position in the book. I have in the past dabbled in modelling the fill on trades at my limit prices using point processes to model aggressive order arrival. Based on the arrival intensity I would then only model a fill once a certain intensity threshold was met. While I found this approach very interesting I did not like it much as far as producing results that were useful. I haven’t looked into that problem again in a while since I have access to book data now.


  2. Whoa welcome back JP. I always laugh when I see closing price used as the backtest assumption. It is one of the most costly assumption.

    1. Hey Henry,

      Thank you for commenting. The fill at close assumption is probably the second worse thing I see in backtests after look-ahead bias. Most people are simply unaware of the mechanics of the closing auction and are surprised by their results being different than what they expected. Far too often I see them write it off as “slippage” when in fact that slippage is purely caused by bad fill assumptions.


  3. I guess people who were able to write good backtests – don’t trade anymore ))

    Once you fix all silly mistakes like looking into the future and trading on a day close, added realistic execution of market and limit orders and trading commision calculation to your backtester. You’ll quickly realize that none of your intraday algorithms could beat the market. And you should either look at much lower frequency trades for which you won’t have enough statistics or tick level HFT for which you won’t have a speed and commission discount.

    1. Hi Sergey,

      Thank you for commenting. I think it is not as bad as it sounds. There are plenty of strategies that are profitable, it’s only a matter of applying good scientific method when researching. But there is no doubt in my mind that if a strategy wouldn’t be profitable, I would much rather know at the backtesting stage than at any other time when there is real money involved!


  4. Good post, I do exactly the same things, I.E. write my own software & reconcile live fills against back-test and paper-trade.

    1. Hi Craig,

      Thank you for commenting. Do you find that it is difficult to reconcile or are you having good success with it? What types of strategies do you find the trickiest?


  5. I couldn’t agree more with your post. I come from a development background (having spent a number of years building execution platforms commercially), so the decision to build my own tools was a no-brainer for me. Working through different papers I find it hard to imagine that there’s ready made platforms out there that could give me the flexibility I require.

    Moving beyond the initial simplified strategy development, i.e. incorporating transaction costs, remaining cashflow positive, determining your sizing, etc, is hard. It takes a lot of time, and persistence, but it’s an essential part of the process which cannot be overlooked, and I’m sure I’d have hit a number of significant blockers along the way if I wasn’t writing my own tools.

    1. Hi Conor,

      Thank you for commenting. I completely agree. Initially those complexity ceilings is what drove me to write my own. I always ended up spending more time trying to hack the tool than backtest strategies. Drove me absolutely nuts! Obviously it wasn’t easy to get started but as time went by it became better and better, to a point now where I am comfortable with the results and my time to market has been significantly reduced.


  6. Hey QF,
    Great post. Stumbled across your blog while implementing some SVM trading strategies for a project and have maybe skimmed through almost all the posts.
    I would really love to read your take on Bayesian Market Making algos, something along the line of the Hanson scheme and their effectiveness in a non simulated environment with liquidity challenges.

  7. I have a problem at the moment to go from good looking backtested, WFO developed strategies into live trading. I have enourmous “slippage” at times. I trade mostly the most liquid futures markets. There is not so much slippage in the indexfutures but going into softs, energy and a few others it can be huge when it takes of. I am currently struggling in how I should handle this. They are still profitable my strats but not by much. I should add that my strats are intraday. I would really appreicate to get some tips into how to lower this “slippage” or what I should call it… I am currently implementing more limit orders and a faster internet and starting do develop strategies in higher timeframe and hoping that would lower it. What are your tricks?

    Thanks for a great article by the way.



    1. Hi Sam,

      Thanks for commenting. You pose an interesting question to which I have no clear cut answer. I define slippage as the difference between my desired price and my live filled price. In that sense, reducing slippage will entail having better execution algorithms.

      However, from my understanding of your question you want to reduce the difference between your backtested fills and your live fills as I mention in the post. I think if you see big differences between your backtested fills and your executed fills but your fills are obtained where your algorithm sent them using live data, then the problem is with the backtest assumptions. I don’t have tricks per se but I always like to capture the live data so I can run the backtest on it and see the difference that there might be with my normal research data source. Backtest results are a deterministic function of the input. Obviously if the input is different there will be discrepancies. Most times I see big differences the data difference is the cause, running paper trades on live data typically shows that and you can then revise your expectations as necessary.

      Hopefully that helps,


Comments are closed.

%d bloggers like this: