It is in the human nature to seek the path of least resistance. While this might be good in some instances, when dealing with my capital I usually try to keep it simple but I try to always steer clear from intellectual laziness.
Many top tier bloggers have mentioned the traps of assumptions and the limitations of parametric statistics when dealing with market data. A good thing to do before using a certain method or model is always to do a little research on the underlying assumptions. If they don’t fit our data, then we know we have to be more careful, however they can still be quite useful, do not automatically disregard a method or model when your assumptions aren’t met.
For example, consider the great workhorse of econometrics: the least square model. It is widely used in academia. It is actually quite hard to find a finance paper without the mention of regression in a certain way or form. That’s just what we do, we like to try and model phenomenon in simple and elegant ways. It is often used in its simplest form; the ordinary least square model that you of you may know as the linear regression. I am sure that most of the readers of this blog used it before in some fashion. I also think that some may have used it without really paying attention to some of its assumptions.
1. Population regression function is linear in parameters
2. The independent variable and the errors are independent:
3. Homoscedasticity (ie. constant variance) of the errors
4. No autocorrelation:
5. The regression model is correctly specified, all relevant variables are included
6. The error is normally distributed
Now with this in mind, we see how the ols has some assumptions that we would need to address before we blindly apply it. The big two for financial time series are number 3 and 4. See the post series on GARCH modeling for a more specific discussion on the matter here.
The point here is not to invalidate the least squares method at all; I use it frequently. The point is to show that sometimes assumptions can be really restrictive and need to be considered regardless of what method or model you want to use, and also remember that sometimes, the path of least resistance in trading is not always the best. A good habit when stumbling upon a new promising tool for your trader’s toolbox is to dig a bit more and understand the underlying process and the assumptions you make every time you use it. It is also a nice plug for the non-parametric and non-linear statistical methods, who usually tends to have looser base assumptions.
Here is a nice video I found on the Quantitative Finance Collector website. Have a look if you are looking to use R and Interactive Brokers to trade automatically. People already familiar with algorithmic trading might want to start the video at the 15:00 mark.
The amount of noise in raw financial data makes it very hard to model. This is partly why we use indicators. They summarize information in a concise way that is easier to interpret that the raw market data. A lot of them are actually transforms borrowed from other fields adapted to financial markets. Engineering is an obvious one, considering the ever-growing number of engineers hired by hedge funds and trading firms. They are also quite appealing to the system designer also as they give a systematic way to code trading rules. I also found that designing indicator is one of the best exercises to further understanding of a certain market phenomena. Trying to find a way to process the financial data to expose a certain type of information in a novel way forces us to stop and think about the underlying mechanics of the market which can only be beneficial for a serious investor. It is this process we will look at in the remainder of the post.
In their basic sense, indicators are nothing more than a particular transform applied to the data to put into evidence selected aspects of it. Take the relative strength index (RSI) for example. It does nothing more than isolates and normalize the magnitude and the velocity of the recent price movements. Also note that there is many ways to extract the same type of information, think DVO.
First thing we need to do when looking at developing indicators or systems is to consider the statistical attributes of the process we want to model, in this case the market. Here are a few commonly accepted ones to get you started (by no mean an exhaustive list):
- Values tend to center around primary clusters (think market regimes)
- Outliers exist between clusters (think fat tails).
- Common statistical measures greatly diverge from their historical tendencies in times of crisis. Think about volatility clustering or correlation converging to 1 when the market goes south.
All these patterns (and a lot more but it is beyond the scope of the post) need to be considered when developing indicators or system (see Michael Stokes’ TAA series for a good example). Doing this kind of analysis of the underlying process we want to model/evaluate, we get valuable insight and we are better prepared to tackle the creative thinking process by knowing more about what we are after. It also offers a framework to work with when thinking of the market in general, which in turn can only spark more ideas and keep the creative juices flowing!
First of all, apologies for the lack of posts the past weeks. I have several things in the works a slightly less time to produce posts with midterms and all. For now, I thought I would answer some frequently asked questions I get by e-mail. And let you know some things I have in the works for the blog.
Some of you may know, I use R for must of my analysis and research, though I do use excel from time to time. I also use Python for some application: trading in particular. When the porting to python 3.x is at a more advaced stage, you can expect some of the research here to be in python (though I am by no means getting rid of R!).
My data sources
I mostly use free data from Yahoo! or Pi Trading. If the analysis requires more specific data I tend to use Bloomberg professional or Thomson Reuters’ Datastream.
In the works at Quantum Financier
In the short term, I want to focus on new things coming for the blog. From the start most of what I have been doing is taking ideas published elsewhere and taking another look at them, or also introduction posts to different tools and techniques one can used in their own analysis. I want to slightly diverge from that and focus more on bringing new things on the table. I won’t stop what I was doing before, but I will put more emphasis on new approaches. Hopefully this will appeal to a broader audience and also make the blog a more valuable resource for the readers.
On the medium term, I plan to start tracking a couple strategies I am developing with TimerTrac and also tracking them on the blog with real-time signals if they turn out performing in-line with my expectations. I might also tweak some of the strategies already introduced to improve them and offer a similar service. This might be done on a separate page on the blog, or also on the whole twitter thing which I am still not a part of; whichever is more convenient!
That’s it for now, but expect a post soon on the random forest: a machine learning algorithm that hasn’t been discussed too much around the blogosphere that is part of a strategy I currently have in the works as part of one of my medium term projects.