In this post we will first design an algorithmic forex binary options trading strategy. After designing the forex binary options strategy we are going to backtest it and calculate the accuracy of this strategy along with a few other important statistic that includes the very important kappa statistic. We will use the Random Forests algorithm and try to predict price in the next 5 minutes in this model. In this model, we can change the time horizon for prediction to 10 minute, 15 minute, 30 minute, 60 minute etc. It can be easily done by changing the n variable in the model. **Watch this 23 minute video on a binary options strategy that has got 90% average winrate**. We build this strategy model using R language. You should be familiar with this language.

### The Rise Of Algorithmic Trading

Algorithmic trading is on the rise. Today more trades are being placed by algorithms than by human traders. In algorithmic trading we build machine learning models that can predict the market. These machine learning algorithms are usually build using R or Python. R is a powerful open source machine learning and data analysis software. You should download it and install it on your computer. You should also download RStudio. RStudio is an IDE for R programming.

MT4 provides a Meta Editor that can be used to code EAs and Indicators using MQL4 language. MQL4 language is pretty limited when it comes to data analysis. There are pretty much no libraries that implement the different machine learning and data analysis algorithms in MQL4. How do we overcome this limitation? We can overcome this limitation by using R or Python. Python is another powerful data science and machine learning language that you can use. In this post we use R for designing our forex binary options strategy and then backtesting it. Read the post on **how to predict the weekly candle high, low and close using a neural network**. Basically we will be building an algorithmic trading system. Watch the video below and discover how much algorithmic trading has become popular on Wall Street.

### Algorithmic Forex Binary Options Strategy For 5 Minute Expiry Explained

Before we continue, we would like to explain the forex binary options strategy. At every minute end, we want to predict price close after 5 minute. If our strategy tells us that closing price after 5 minute will be 2 pips or more above the present closing price, we will buy a 5 minute call binary options. In the same manner, if our strategy tells us that closing price after 5 minute will be 2 pips or more below the present closing price, we will buy a put option. If our strategy predicts that price is going to be between 2 pips above or below the present price. We don’t open any trade. **Learn this candlestick trading strategy that makes 100-200 pips with 10-20 pip stop loss**.

So in nutshell our trading strategy is very simple. If price after 5 minute is predicted to close above 2 pips, we buy a 5 minute call and if the price after 5 minute is predicted to close below 2 pips, we buy a put option. We are interested in knowing the winrate of our strategy before we actually start trading live with it. We use the Random Forest algorithm to build our predictive model. Random Forests is based on ensemble learning. It builds multiple decision trees until it achieves the required predictive accuracy. Watch the video below that explains Random Forests!

Now after having watch the video tutorial on what are random forests. let’s run our model in R. We will download the 1 minute EURUSD csv file from MT4 and read it on R. R will then use the quantmod library to build the model using RSI, MACD, Aroon, Bollinger Bands, Williams %R, CCI, ATR, Volatility etc. These indicators will then be used by the random forest algorithm to build decision trees. **Discover a double in a day strategy that make 100% gain in a day.**

> ###Random Forest Classification With 3 Classifiers### > #import the data > > data <- read.csv("E:/MarketData/EURUSD1.csv", header = FALSE) > > > > colnames(data) <- c("Date", "Time", "Open", "High", + "Low", "Close", "Volume") > > > x1 <- nrow(data) > > > #convert this data to n timeframe > > n=5 > > #define lookback > > lb=500 > > #define the minimum pips > > pip <- 2 > > #define a new data frame > > data1 <-data.frame(matrix(0, ncol=6, nrow=300)) > > colnames(data1) <- c("Date", "Time", "Open", "High", + "Low", "Close") > > # run the sequence to convert to a new timeframe > > for ( k in (1:lb)) + { + data1[k,1] <- as.character(data[x1-lb*n+n*k-1,1]) + data1[k,2] <- as.character(data[x1-lb*n+n*k-1,2]) + data1[k,3] <- data[x1-lb*n+n*(k-1),3] + data1[k,6] <- data[x1-lb*n+n*k-1,6] + data1[k,4] <- max(data[(x1-lb*n+n*(k-1)):(x1-lb*n+k*n-1), 4:5]) + data1[k,5] <- min(data[(x1-lb*n+n*(k-1)):(x1-lb*n+k*n-1), 4:5]) + } > > > library(quantmod) > > data2 <- as.xts(data1[,-(1:2)], as.POSIXct(paste(data1[,1],data1[,2]), + format='%Y.%m.%d %H:%M')) > > > > data2$rsi <- RSI(data2$Close) > data2$MACD <- MACD(data2$Close) > data2$will <- williamsAD(data2[,2:4]) > data2$cci <- CCI(data2[,2:4]) > data2$STOCH <- stoch(data2[,2:4]) > data2$Aroon <- aroon(data2[, 2:3]) > data2$ATR <- ATR(data2[, 2:4]) > data2$SMI <- SMI(data2[, 2:4]) > data2$BB <- BBands(data2[, 2:4]) > data2$ChaikinVol <-Delt(chaikinVolatility(data2[, 2:3])) > data2$CLV <- EMA(CLV(data2[, 2:4])) > data2$Volatility <- volatility(data2[, 1:4], calc="garman") > > > data2$Pips <- 10000*diff(data2$Close) > > > > for (i in (1:(lb-1))) + { + + data2[i,20] <- data2[i+1,20] + + + } > > data3 <- as.data.frame(data2) > > > # convert the returns into factors > > nn <- ncol(data3) > > data3$Direction <- ifelse(data3[ ,nn] > pip, 1, + ifelse(data3[ ,nn] > -pip, 0, -1)) > > > > > > data3 <- data3[,-nn] > > data3$Direction <- factor(data3$Direction) > > > > #load the randomForest library > > > library ("randomForest") > > #define number of trees that randomForest will built > > num.trees =1000 > > > > # train a randomForest Model > > > fit <- randomForest(Direction~., data=data3[100:(lb-1),1:nn], + ntree = num.trees , importance =TRUE, + proximity =TRUE ) > #print the model details > print(fit) Call: randomForest(formula = Direction ~ ., data = data3[100:(lb - 1), 1:nn], ntree = num.trees, importance = TRUE, proximity = TRUE) Type of random forest: classification Number of trees: 1000 No. of variables tried at each split: 5 OOB estimate of error rate: 29.75% Confusion matrix: -1 0 1 class.error -1 40 35 6 0.5061728 0 12 202 17 0.1255411 1 8 41 39 0.5568182 > > > > ## predict the next candle size > pred <-predict (fit , newdata =data3[lb, 1:(nn-1)], type ="class") > pred 2016-11-16 07:31:00 0 Levels: -1 0 1 > > ## predict the probability of each class > pred <-predict(fit , newdata =data3[lb, 1:(nn-1)], type ="prob") > pred -1 0 1 2016-11-16 07:31:00 0.007 0.813 0.18 attr(,"class") [1] "matrix" "votes" > > data1[lb,2] [1] "07:31" > data1[lb,6] [1] 1.07478

R build the model and then made the prediction. It also predicted the OOB (out of bag) error which is 29.75% in this case. What this means is that the predictive accuracy of our model should be around 70%. Predictive accuracy above 70% is something good. It means our algorithmic trading strategy will be winning more than losing. If this is true then we have a good model. What we need is a robust strategy that has a high win rate at least something like 70% before we start trading live with it.

So we need to carry out a back test of our strategy and see if the average winrate is really 70% or not. We do it now below. We run the model 500 times. At the end of each 1 minute, the model runs and makes prediction for the price close after 5 minute. This prediction is then compared with the actual price after 5 minutes. We do it 500 times and then we build a confusion matrix that we then use to calculate the predictive accuracy of the strategy which is the average winrate. **Discover a 1 hour forex strategy that is set and forget**.

> ###Random Forest Classification With 3 Classifiers Strategy Backtest### > #import the data > > data <- read.csv("E:/MarketData/EURUSD1.csv", header = FALSE) > > > > colnames(data) <- c("Date", "Time", "Open", "High", + "Low", "Close", "Volume") > > > x11 <- nrow(data) > > x12 <- x11-500 > > #load the libraries > > library(quantmod) > > #load the randomForest library > > library ("randomForest") > > > #convert this data to n timeframe > > n=5 > > #define lookback > > lb=500 > > #define the minimum pips > > pip <- 2 > > #define a new data frame > > data1 <-data.frame(matrix(0, ncol=6, nrow=300)) > > colnames(data1) <- c("Date", "Time", "Open", "High", + "Low", "Close") > > Results <-data.frame(matrix(0, ncol=2, nrow=500)) > > colnames(Results) <- c("Predicted", "Actual") > > Results$Predicted <- factor(Results$Predicted, levels=c(-1,0,1)) > > # run the sequence to convert to a new timeframe > > for (j in (1:500)) + { + x1 <- x12-1000-j+1 + + for ( k in (1:lb)) + { + data1[k,1] <- as.character(data[x1-lb*n+n*k-1,1]) + data1[k,2] <- as.character(data[x1-lb*n+n*k-1,2]) + data1[k,3] <- data[x1-lb*n+n*(k-1),3] + data1[k,6] <- data[x1-lb*n+n*k-1,6] + data1[k,4] <- max(data[(x1-lb*n+n*(k-1)):(x1-lb*n+k*n-1), 4:5]) + data1[k,5] <- min(data[(x1-lb*n+n*(k-1)):(x1-lb*n+k*n-1), 4:5]) + } + + + library(quantmod) + + data2 <- as.xts(data1[,-(1:2)], as.POSIXct(paste(data1[,1],data1[,2]), + format='%Y.%m.%d %H:%M')) + + + + data2$rsi <- RSI(data2$Close) + data2$MACD <- MACD(data2$Close) + data2$will <- williamsAD(data2[,2:4]) + data2$cci <- CCI(data2[,2:4]) + data2$STOCH <- stoch(data2[,2:4]) + data2$Aroon <- aroon(data2[, 2:3]) + data2$ATR <- ATR(data2[, 2:4]) + data2$SMI <- SMI(data2[, 2:4]) + data2$BB <- BBands(data2[, 2:4]) + data2$ChaikinVol <-Delt(chaikinVolatility(data2[, 2:3])) + data2$CLV <- EMA(CLV(data2[, 2:4])) + data2$Volatility <- volatility(data2[, 1:4], calc="garman") + + + data2$Return <- 10000*diff(data2$Close) + + + + for (i in (1:(lb-1))) + { + + data2[i,20] <- data2[i+1,20] + + + } + + data3 <- as.data.frame(data2) + + #convert return into factors + + rr1 <- pip + + # convert the returns into factors + + nn <- ncol(data3) + + data3$Direction <- ifelse(data3[ ,nn] > rr1, 1, + ifelse(data3[ ,nn] > -rr1, 0, -1)) + + + + + + data3 <- data3[,-nn] + + data3$Direction <- factor(data3$Direction) + + + + #load the randomForest library + + + library ("randomForest") + + #define number of trees that randomForest will built + + num.trees =1000 + + + + # train a randomForest Model + + + fit <- randomForest(Direction~., data=data3[100:(lb-1),1:nn], + ntree = num.trees , importance =TRUE, + proximity =TRUE ) + + + ## predict the next candle size + pred <-predict (fit , newdata =data3[lb, 1:(nn-1)], type ="class") + pred[1] + + + + Results[j,1] <- pred[1] + Results[j,2] <- 10000*(data[x1+n,6]-data[x1,6]) + + } > > Results$Direction <- factor(ifelse(Results[ ,2] > rr1, 1, + ifelse(Results[ ,2] > -rr1, 0, -1)))

Above is the backtesting R code. It took R about 30 minutes to backtest the data. The end results is a data frame Results that has the columns Predicted and Actual. We now use this data frame to make the confusion matrix. We use the table command that builds the confusion matrix.

> table(Results$Predicted, Results$Direction) -1 0 1 -1 9 31 15 0 62 247 58 1 13 43 22

Above table shows how many predictions were correct and how many predictions were false. Did you notice the class imbalance? Class 0 has more members than classes 1 and -1. The predictive accuracy is simple the sum of True Positive and True Negative divided by the total observations which in this case is 500. This comes out to be 55.6%, Error rate is one minus the predictive accuracy which comes out to 44%. So our model has got a predictive accuracy of 55% that is sightly more than random chance. Now let’s use the gmodels library which makes a much more detailed cross table as compared to the above table.

> library(gmodels) > CrossTable(Results$Predicted, Results$Direction) Cell Contents |-------------------------| | N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 500 | Results$Direction Results$Predicted | -1 | 0 | 1 | Row Total | ------------------|-----------|-----------|-----------|-----------| -1 | 9 | 31 | 15 | 55 | | 0.006 | 0.526 | 1.981 | | | 0.164 | 0.564 | 0.273 | 0.110 | | 0.107 | 0.097 | 0.158 | | | 0.018 | 0.062 | 0.030 | | ------------------|-----------|-----------|-----------|-----------| 0 | 62 | 247 | 58 | 367 | | 0.002 | 0.550 | 1.973 | | | 0.169 | 0.673 | 0.158 | 0.734 | | 0.738 | 0.769 | 0.611 | | | 0.124 | 0.494 | 0.116 | | ------------------|-----------|-----------|-----------|-----------| 1 | 13 | 43 | 22 | 78 | | 0.001 | 1.000 | 3.479 | | | 0.167 | 0.551 | 0.282 | 0.156 | | 0.155 | 0.134 | 0.232 | | | 0.026 | 0.086 | 0.044 | | ------------------|-----------|-----------|-----------|-----------| Column Total | 84 | 321 | 95 | 500 | | 0.168 | 0.642 | 0.190 | | ------------------|-----------|-----------|-----------|-----------|

As you can see the above cross table is much more detailed as compared to the first table that we made. Now don’t think we have a good strategy.**Watch this video that reveals a simple trick that increases the winrate**.

### What Is Kappa?

We use the Caret library to calculate the kappa statistic. Kappa is an important statistic that adjusts predictive accuracy for randomness. Did we point out the class imbalance? Yes we did a few paragraphs above in which we pointed out that most of the observations belong to the 0 class. This class predicts price movement between 2 pips above and below the present price. So just by predicting this class the model can achieve a good predictive accuracy. Watch the video below that explains Kappa more.

Above video explains how kappa can help us in take out chance from the predictive accuracy of our account. We use the Caret library from R that does the calculation of kappa for us. Kappa measure above 70% is considered acceptable and a kappa measure below 40% is considered poor. Between 40% and 70%, it is average.

> library(caret) Loading required package: lattice Loading required package: ggplot2 Attaching package: ‘ggplot2’ The following object is masked from ‘package:randomForest’: margin > confusionMatrix(Results$Predicted, + Results$Direction, positive = "Direction") Confusion Matrix and Statistics Reference Prediction -1 0 1 -1 9 31 15 0 62 247 58 1 13 43 22 Overall Statistics Accuracy : 0.556 95% CI : (0.5112, 0.6001) No Information Rate : 0.642 P-Value [Acc > NIR] : 0.999969 Kappa : 0.0763 Mcnemar's Test P-Value : 0.005323 Statistics by Class: Class: -1 Class: 0 Class: 1 Sensitivity 0.1071 0.7695 0.2316 Specificity 0.8894 0.3296 0.8617 Pos Pred Value 0.1636 0.6730 0.2821 Neg Pred Value 0.8315 0.4436 0.8270 Prevalence 0.1680 0.6420 0.1900 Detection Rate 0.0180 0.4940 0.0440 Detection Prevalence 0.1100 0.7340 0.1560 Balanced Accuracy 0.4983 0.5495 0.5467

Kappa for this strategy is just 0.0763 which is very poor indeed. A kappa above 0.7 is considered to be good. In our case, kappa is just 7% which is not good. So the model is just predicting based on chance. **Watch these videos how to setup your day trading work stations. **So this was it. By reading this post, you should develop a good idea how you are going to design and then test your algorithmic trading strategy.

### How To Improve The Algorithmic Trading Strategy?

Can we improve this algorithmic trading strategy? We can make a try. Data analysis and machine learning is a difficult science and most of the time you will fact disappointment when your model doesn’t perform as well as you thought in practical trading. We include a few more indicators in our model and increase the pip size to 3 pips. We run the backtesting model again. It again took 30 minutes for R to do the calculations. Below are the results,

> table(Results$Predicted, Results$Direction) -1 0 1 -1 3 24 1 0 33 355 38 1 0 35 11 > > library(gmodels) > CrossTable(Results$Predicted, Results$Direction) Cell Contents |-------------------------| | N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 500 | Results$Direction Results$Predicted | -1 | 0 | 1 | Row Total | ------------------|-----------|-----------|-----------|-----------| -1 | 3 | 24 | 1 | 28 | | 0.480 | 0.029 | 1.157 | | | 0.107 | 0.857 | 0.036 | 0.056 | | 0.083 | 0.058 | 0.020 | | | 0.006 | 0.048 | 0.002 | | ------------------|-----------|-----------|-----------|-----------| 0 | 33 | 355 | 38 | 426 | | 0.177 | 0.015 | 0.497 | | | 0.077 | 0.833 | 0.089 | 0.852 | | 0.917 | 0.857 | 0.760 | | | 0.066 | 0.710 | 0.076 | | ------------------|-----------|-----------|-----------|-----------| 1 | 0 | 35 | 11 | 46 | | 3.312 | 0.250 | 8.904 | | | 0.000 | 0.761 | 0.239 | 0.092 | | 0.000 | 0.085 | 0.220 | | | 0.000 | 0.070 | 0.022 | | ------------------|-----------|-----------|-----------|-----------| Column Total | 36 | 414 | 50 | 500 | | 0.072 | 0.828 | 0.100 | | ------------------|-----------|-----------|-----------|-----------| > > library(caret) Loading required package: lattice Loading required package: ggplot2 Attaching package: ‘ggplot2’ The following object is masked from ‘package:randomForest’: margin > confusionMatrix(Results$Predicted, + Results$Direction, positive = "Direction") Confusion Matrix and Statistics Reference Prediction -1 0 1 -1 3 24 1 0 33 355 38 1 0 35 11 Overall Statistics Accuracy : 0.738 95% CI : (0.6971, 0.776) No Information Rate : 0.828 P-Value [Acc > NIR] : 1.0000 Kappa : 0.0686 Mcnemar's Test P-Value : 0.4673 Statistics by Class: Class: -1 Class: 0 Class: 1 Sensitivity 0.08333 0.8575 0.2200 Specificity 0.94612 0.1744 0.9222 Pos Pred Value 0.10714 0.8333 0.2391 Neg Pred Value 0.93008 0.2027 0.9141 Prevalence 0.07200 0.8280 0.1000 Detection Rate 0.00600 0.7100 0.0220 Detection Prevalence 0.05600 0.8520 0.0920 Balanced Accuracy 0.51473 0.5160 0.5711

This time the winrate or the predictive accuracy has improved to 73%. Did we succeed now? We have a model that has a predictive accuracy of above 70%. This is precisely what we wanted. But kappa has gone even lower now to 6%. What this means is that the model is predicting based on chance. Since the 0 class is the biggest class, by predicting it the model has achieved 73% predictive accuracy but kappa has gone even lower. Sensitivity of the model gives the proportion of predictions that were made correctly. In the above results you can check the sensitivity of the model. You can see the model failed to correctly predict -1 and 1 class. Specificity of a model gives the negative cases that the model correctly identify.

## 0 Comments