Earlier this year we worked with a couple of data scientists on the prediction of financial markets. Our project, nicknamed Back2Future, aimed at getting a better understanding of the price movements at foreign exchange markets (Forex).
With team members scattered around the globe, we needed a simple infrastructure to communicate effectively between team members and for sharing our data, programming code and project updates:
- we used Slack as our collaboration platform;
- we used GitLab for version control;
- we shared a database containing historical OHLC data on several currency pairs on Amazon Webservices, using this procedure;
- we used Skype for regular project updates.
We decided to time box the project to 12 weeks, dividing it into four time periods (timeboxes) of three weeks, each part having its own deliverables. Finally, we decided to use R as the common scripting language for data management and modeling.
A summary of the project approach follows:
- data: daily prices for the following 5 currency pairs: EUR/USD, AUD/USD, GBP/JPY, GBP/USD and USD/CAD;
- in order to keep the process flexible, we built a data pipeline consisting of a sequence of scripts, as shown in figure 1. Data were passed from one step to the next by storing objects temporarily on disk, using R's RDS facility;
Figure 1: data pipeline
- we trained 3 autoregressive models: a time series model(ARIMA), a neural net(ANN) and a simple exponential smoother(ETS);
- we also combined the 3 models mentioned above into a random forest ensemble, as show in figure 2;
- each model was tuned on three hyper-parameters: the transformation function, the look-back period and the buy/sell threshold.
Figure 2: ensemble model
We arrived at the following conclusions:
- Forex time series are very noisy. However, all investigated currency pairs (except AUD/USD) seem to contain some signal. The out-of-sample predictions for some models give positive cumulative results, outperforming the majority of the gamblers;
- the investigated RF ensemble did not improve the predictions compared to their individual counterparts, except for the GBP/USD where the ensemble does seem to lead to some additional predictive power;
- for most currency pairs, short look-back periods (around 6 days) result in better predictions than long(er) look-back periods;
- for most currency pairs, ANN and ARIMA outperform ETS. For GBP/JPY, ANN clearly outperforms both ARIMA and ETS.
Figure 3: out-of-sample cumulative returns
Figure 3 shows the out-of-sample predictions for the EUR/USD; the gray lines show the cumulative results for 100 gamblers who just place buy/sell orders at random moments in time. Although the ANN clearly outperforms the most lucky gambler, long term cumulative returns seem to be insufficient to justify the risk that losses will be incurred as a result of unpredictable price movements.