Launching Gekkoga on high-end EC2 Spot machine

So now that we know how to launch an EC2 instance from an Amazon EC2 AMI with batched gekko/gekkoga app/conf deployment, we want to learn how to use it on better CPU-sized machine, at a good price (Amazon EC2 Spot feature), so that we can -basically- bruteforce all possible parameters and inputs of a given trading strategy, using Gekkoga’s genetic algorithm.

The main documentations we will use:

As explained in first Amazon’s documentation, we first create a new role AWSServiceRoleForEC2Spot in our AWS web console. This is just a few clicks, please read their doc.

Handling Amazon’s VMs automatic shutdown for Spot instances

Next we need to take care of Amazon’s automatic shutdown of Spot instances, as it depends on market price, or on the fixed duration of usage you specified in your instantiation request. When Amazon decides to shutdown an instance, it will send a technical notification to the VM, that we can watch and query using an URL endpoint. Yes it means that we will need to embed a new script on our EC2 VM & Refence AMI, to handle such a shutdown event, and make it execute appropriate actions before the VM is stopped (Amazon announces a 2mn delay between the notification and the effective action of shut down, this is short).

The way I chosed to do it (but there are others) is to launch a ‘backgrounded’, recurrent, and permanently deployed script at VM boot through our already modified rc.local, and this script will poll every 5 seconds the appropriate metadata using a curl call. If the right metadata is provided by Amazon, we then execute a customized, specific, shutdown script which needs to be embedded in the customized package our VM will automatically download from our reference server at boot time.

So, as you saw in previous article, we insert those 3 lines just before the last “exit 0” instruction in our /etc/rc.local file:

Then we create our /etc/rc.termination.handling script, based on indications from amazon’s documentation:

We make it executable:

We will now test if this is working. The only thing we won’t be able to test right now is the real URL endpoint with terminations informations. First, we reboot our EC2 reference VM and we verify that our rc.termination.handling script is running in the background:

Now we will test its execution, but we need to slightly change its trigger as our VM is not yet a Spot instance and the URL we check won’t embed any termination information, therefore it will return a 404 error. I also disabled the loop so that it will just execute once.

We manually execute it, and we check the output log in $HOME/AWS/logs:

Now we check on our Reference server @home if the results were uploaded by the EC2 VM.

That’s perfect !

Don’t forget to cancel the modifications you made previously in rc.termination.handling to test everything.

Checking Spot Instances price

First we need to know what kind of VM we want, and then we need to check the Spot price trends to decide for a price.

For my first test, I will choose a c5.2xlarge, it embeds 8 vCPU and 16Gb of memory. Should be enough to launch Gekkoga with 7 concurrent threads.

Then we check the price trends and we see that the basic market price is -at the moment- around $0.14, this will be our base price in our request as we just want to test for now.

It is also interesting to look at the whole price trend over a few months, and we can see it actually increased a lot. Maybe we could get better machines for the same price.

Let’s check the price for the c5.4xlarge :

Conclusion: for $0.01 more, we can have a c54xlarge with 16 vCPU and 32Gb of RAM instead of a c5.2xlarge with 8 vCPU and 16Gb of RAM. Let’s go for it.

Requesting a “one-time” Spot Instance from AWS CLI

On our Reference server @home, we will next use this AWS CLI command (I’ve embedded it in a shell script called start_aws.sh). For details on the json file to provide with the request see Amazon’s documentation.

Note:

  • The –dry-run parameter: it asks the CLI to simulate the request and display any error instead of really trying to launch a Spot instance.
  • This time, for my first test, I used a “one-time” VM with a fixed duration execution time, to make sure to know how long it will run, and therefore the price is not the same as the one we saw above ! It is higher.
  • Once our test is successfull, we will use “regular” VMs with prices we can “decide” depending on the market, but also with a run time we can’t anticipate (it may be stopped anytime by Amazon if our calling price becomes lower than the market price. Otherwise you will have to stop it).

Then we create a $HOME/AWS/spot_specification.json file and we use appropriate data, especially our latest AMI reference:

We try the above aws cli command line to simulate a request …

Seems all good. let’s remove the –dry-run and launch it.

On the EC2 Web console we can see our request, and it is already active, it means the VM was launched !

Now in the main dashboard we can see it running, and we get its IP (we could do it via AWS CLI also):

Let’s SSH to it and check the running process:

It seems to be running .. And actually you can’t see this but this is running well, as I keep receiving emails with new “top result” found from Gekkoga as I activated the emails notifications. This is a good way to also make sure you will backup the latest optimum parameters found (once again: this is a backtest and in no way it means those parameters are good for the live market).

Let’s check the logs:

It’s running well ! Now I’ll just wait one hour to check if our termination detection process is running good and backups the result to our Reference server @home.

While we are waiting though … Let’s have a look in the logs, on the computation time:

Epoch #13: 87.718s. So, it depends on the epoch and computations requested by the Strategy, but last time we had a look at it, epoc #1 took 1280.646s to complete, for the same Strat we customized as an exercise.

One hour later, I managed to come back 2mn before the estimated termination time and I could manually check with curl that the termination meta-data were sent by Amazon to the VM.

Our termination script performed well and uploaded the data on my Reference server @home, both in the ~/AWS/<timestamp>_results and in gekkoga/results dir so that it will be reused at next launched by Gekkoga.

Using a Spot instance with a “called” market price

So we previously used a “one-time” spot instance with a fixed duration time, which means we ensured it would run for the specified duration, but at a higher and fixed price. Because we managed to backup the data before its termination, and because Gekkoga knows how to reuse it, we will now use lower priced Spot instance, but with no guarantee it will run long.

Let’s modify our AWS cli request .. We will test a c5.18xlarge (72 CPU, 144Gb of RAM) at a market price of $0.73 per hour.

Amazon started it quite immediately. Let’s SSH it and check the number of process launched.

So … This is quite interesting because parallelqueries in gekkoga/config/config-MyMACD-backtester.js is well set to 71, BUT it seems the maximum node processes launched will cap to 23. Don’t know why ! But until we find out, it means there is no need to launch a 72 CPU VM. A 16 or 32 CPU may be enough for now.

It seems this is the populationAmt which is limiting the number of threads launched. Setting it up to 70, still with a parallelqueries to 71, will make the number of threads to increase, and stabilize around 70 but with some periods down to 15 threads. Would be interesting to graph it & study it. Maybe there is also a bottleneck on Gekko’s UI/API which has to handle a lot of connections from all Gekkoga’s backtesting threads.

Now, I’ll need to start to read a little bit more littérature on this subject to find a good tweak … Anyway right now i’m using 140 for populationAmt and I still have some “low peaks” down to 15 concurrent threads for nodejs.

Result

After 12 hours without any interruption (so, total cost = $0.73*12 = $8,76), those are all the mails I received for this first test with our previously customized MyMACD Strategy.

As you can se if the screenshot is not too small, on those data from the past, with this strategy, the profit is much better with long candles and long signals.

Again, this needs to be challenged on smaller datasets, especially “stalling” markets, or bullish markets as nowadays. This setup and automation of tests will not guarantee you, in any way, to earn money.

A few notes after more tests

Memory usage:

  • I tried several kind of machines, right now I’m using a c4.8xlarge which has still good CPUs but lower RAM than the c5 family. And I started to test another customized Strat. I encountered a few crashes.
    • I initially thought it was because of the CPU usage caping to 100 as I increased the number of parallelqueries and populationAmt. I had to cancel my spots requests to kill the VMs.
    • Using the EC2 Console, I checked the console logs, and I could clearly see some OOM (Out of Memory) errors just before the crash.
    • I went into my Strat code and tried to simplify everything I could, by using ‘let’ declarations instead of ‘var’ (to reduce the scope of some variables), and managed to remove one or two variables I could handle differently. I also commented out every condition displaying logs, as I like to have log in my console when Gekko is trading live. But for backtesting, avoid it. No log at all, and reduce every condition you can.
  • I also reduced the max_old_space_size parameter to 4096 in gekko/gekkoga/start_gekkoga.sh. It has a direct impact on Node’s Garbage Collector. It will make the GC collect the dust twice often than with the 8096 I previously configured.
  • Since those two changes, i’m running a session on a c4.8xlarge for a few hours, using 34 parallelqueries vs 36 vCPUs. The CPUs are permanently 85% busy which seems good to me.

Improvements: in a next article I will detail two small changes I made on my Reference AMI:

  • The VM will send me a mail when it starts, with its details (IP, hostname, etc.) or when a termination is announced
  • I added a file monitoring utility to “live” detect any change in the Gekkoga’s result directory, to upload it immediately on my Reference Server @home. I had to do this because I noticed that when you ask Amazon to resiliate a Spot request whith a running VM associated to it, it immediately kills the VM, without announced termination, so previous results where not synched on my Home server (but I had the details of the Strat configuration by email).

Also, important things to remember:

  • Amazon EC2 will launch your Spot Instance when the maximum price you specified in your request exceeds the Spot price and capacity is available in Amazon’s cloud. The Spot Instance will run until it is interrupted or you terminate it yourself.  If your maximum price is exactly equal to the Spot price, there is a chance that your Spot Instance remains running, depending on demand.
  • You can’t change the parameters of your Spot Instance request, including your maximum price, after you’ve submitted the request. But you can cancel them if their status is either open or active.
  • Before you launch any request, you must decide on your maximum price, and what instance type to use. To review Spot price trends, see Amazon’s Spot Instance Pricing History
  • For our usage, you shall request “one-time request” instances, not “persistent” request (we only used it for testing), which means that you need to embbed a way in your EC2 VM to give you feedback about latest optimized parameters found for your Strat (by email for example or by tweaking Gekkoga to send live results (note for later: TODO))

And remember: nothing is free, you will be charged for this service, and there is NO GUARANTEE that you will earn money after your tests.

v2 – How to create an Amazon EC2 “small” VM and automate Gekko’s deployment

Note (18/02/2019): this is an updated version of the initial post about automating the launch of an Amazon EC2 Instance.

We tried Gekkoga’s backtesting and noticed it is a CPU drainer. I never used Amazon EC2 and its ability to quickly deploy servers, but I was curious to test, as it could make a perfect fit for our needs: on-demand renting of high capacity servers, by using Amazon’s “Spot instance” feature. Beware, on EC2 only the smallest VM can be used for free (almost). The servers I would like to use are not free.

Our first step is to learn how to create an Amazon EC2 VM, and to deploy our basic software on it. Then we will manage the automatic deployment of all packages we need to make Gekko & Gekkoga run and automatically start with the Strat we want to test. We will test this on a small VM -the t2.micro- and using the standard AMI (Amazon Machine Image, the OS) “Amazon Linux 2”.

Once this step is complete, we will make a new AMI based on the one we deployed, including custom software and part of its configuration.

Next we will try to automate in a simple batch file the request, ordering, and execution of a new instance based on our customized AMI, with Gekkoga automatic launching & results gathering. This batch file would be used from my own personal/home gekko server that I use to modify and quickly test new Strats.

Launching a new free Amazon EC2 t2.micro test VM

I won’t explain everything here. First you need to create an account and yes you will need to enter some credit cards info, as most of the services can be used for free at the beginning, but some of them will charge a few cents when used (eg. map an Elastic IP on a VM and release it, when it not in used, you are charged, it’s cheap, but you will be charged; also you are allowed your free small VM only a few hours, so you need to stop it as soon as you can to make it available only whern you need it, this is “on-demand” Amazon’s policy, like it or don’t use it:) ).

Then we choose the AMI and then the smallest VM available, as it is allowed in the “free” Amazon package.

At the bottom of the page, click “Next: configure instance details”. On the following page, you can use all default values, but check:

  • The Purchasing option: you can ask for a Spot Instance, this is Amazon’s market place to request your VM to run on a fixed price you will provide, assuming Amazon’s got free ressources and will allow your VM to run at that price (neds to be superior to the demand)
  • The Advanced Details at the bottom.

The User data field is a place where we can use a shell script which be executed at boot by the VM. As sometimes the VM can be started when Amazons detect they should be (eg. Spot instances), this is a very nice place to use to automatically make your instance download some specific configuration stuff when it boots, for example our Gekko’s strats and conf files to use for automagcally launch our backtests. We will try this later (I did not try it yet myself at the moment I’m writing this but this is well documented by Amazon).

NExt we want to configure the storage, as Amazon allow us to use 30Gb on the free VMs instead of the default 8gb.

Next, I will add a tag explaining the purpose of this VM and storage (not sure about its exact future utility yet but whatever …).

Next, we configure a security group. As I already played a little bit with another VM I created a customized Security Group which allows ports 22 (SSH), 80 (HTTP) and 443 (HTTPS). I choose it but you will be able to do that later and map your own security group to your VM.

Next screen is a global review before VM creation and launching by Amazon. I won’t copy/paste it, but click on Launch at the bottom.

Next is a CRITICAL STEP. Amazon will create you some SSH keys that you need to store and to use to connect on the VM through SSH. Do not loose them. You will be able to use the exact same key for other VMs you would want to create, so one key can match all your VMs.

As I already generated one for my other VM (called gekko), I reuse it.

And next is a simple status page explaining the instance is launching and linking to a few documentation pages, that you should of course read.

Now when we click on “View instance” we are redirected on EC2 console (you will use it a lot) and we can see that our new instance is launched, and its name is the tag we defined earlier during setup (you also see my other VM, stopped).

Next we will connect to the VM shell by SSH. On my laptop running W10 I’ll use putty. I assume you downloaded your private key. With putty the PEM file needs to be converted using PuttyGen to generate a .ppk file it will be able to use.

You’ll also need to grab the public IPv4 address from EC2 console, by clicking on your instance and copying the appropriate field.

Now in Gekko you just have to save a session with your private .ppk key configured and ec2-user@<public IPv4 hostname grabbed from the console> as host. Keep in mind that this hostname and associated IP could change. If you can’t connect anymore to your VM, the first thing to do is to check your EC2 console to check its hostname.

We launch the session. Putty will ask you if you want to trust the host, click Yes.

Woohoo ! we are connected ! This was fast and simple.

Updating the VM & deploying our software

OK so now we need to deploy all the basic things we saw in previous posts, but also more things like Nginx to protect the access to Gekko’s UI. Later we will have to implement a way for the VM to automagically download updated Strats to run it.

The goal is to deploy all what we need to launch a functionnal Gekkoga VM, and then we will create a customized AMI to be reused on a better VM specialized in CPU computations. Note that EC2 can also supply VMs with specific hardware like GPU’s if you need to run software able to offload the computation on GPU cards, this is not our case here unfortunately but it might somedays as I would like to start to experiment AI.

I won’t explain everything below, this can be put in a shell script, and you can use the links to my blog to download a few standard things not compromising security, but there are some private parts that you will need to tweak by yourself, especially the ssh connection to my home servers of course.

All the steps below do not require manual operations but some are customized for my own need, read the comments.

First we update the VM and deploy generic stuff.

Next we deploy NGinx which will act as a Reverse Proxy to authenticate requests made to Gekko’s UI.

Now we need to define some very customized stuff, I won’t explain all as this article is not a complete how-to, you need sysadmin knowledges.

  • Create a user/passwd to be used by Nginx reverse proxy
  • To automate downloading of stuff from our home server using scp or the launch of actions on our home server through ssh (to automatically make a tarball of our gekko’s strats for exemple before downloading them) we will need to import our home server & user SSH key /home/ec2-user/.ssh/ and don’t forget to change its permissions with chmod 600

This is an example of what you could do once your reference server’s ssh key was successfully imported on your EC2 instance:

Now we just need to launch ngix .. and eventually save pm2 sessions so that it will be relaunched at boot.

Testing the VM

If everything was OK -And yes I know a lot of parts could have been wrong for you, but for me at the moment I was testing it it was OK- you should be able to launch your favorite web browser and target https://<Your VM FQDN> and see a login prompt. You need to enter the login/password you defined in /etc/nginx/.htpasswd

You should now see this …

My test dataset was correctly downloaded, it is well detected by Gekko. I will just give it a little update by asking gekko to download data from the 2019-01-07 22:30 to now and then upload it back on my reference server at home.

Next, let’s give a try to the strats we downloaded from our reference server at home …

All is running well …

We now have a good base to clone the AMI and make it a template for higer-end VMs. We will need to make it:

  • Able to download uptodate data from markets
  • Able to download up to date strats from our reference server@home
  • Launch one particular Gekkoga startup script
  • Make it upload or send the data somewhere

Please remember to stop your VM either from command line or from Amazon EC2 console so that it won’t drain all your “free” uptime credits !

Playing with AWS CLI

First, we need to install AWS CLI (Amazon Command Line Interface). On my server I had to install pip for Python.

Now we can install AWS CLI using pip, as explained in Amazon’s documentation. The –user will install it in your $HOME.

We add the local AWS binary directory to our user PATH so that we can launch it without having to use its full path. I’m using a Debian so i’ll add it in .profile

Now, we need to create some IAM Admin User & Groups from our EC2 console, to be able to use AWS CLI. Please follow Amazon’s documentation “Creating an Administrator IAM User and Group (Console)“. Basically you will create a Groupe, a Security Policy, and an Administrator user. At the end, you must obtain and use an Access Key ID and a Secret Access Key for your Administrator user. If you loose it, you won’t be able to retrieve those keys, but you will be able to create new ones for this user (and propagate the change on every system using them). So keep them safe.

Then we will use those keys on our VM, and on our home/reference server from which we want to control our instances. You can specify the region that Amazon attributed to you also if you want (hint: do not use the letter at the end of the region, eg. if your VM is running in us-east-2c, enter ‘us-east-2’).

Let’s test it with a few examples I got in the docs:

  • Fetch a JSON list of all our instances, with a few key/value requested:
  • Stopping an instance
  • Starting an instance
  • Ask the public IP of our running VM (we need to know its InstanceID):

To send remote commands to be executed on a specific VM, you will need to create a new IAM role in your EC2 Console, and make your VM use it, so that your remote calls will be authorized.

Give your VM an IAM role with the Administrator Group you defined before, and in which there is also the Administrator user we are using the keys on AWS CLI. Now we should be able to access the VM and send it informations and request data.

  • To make the vm execute ‘ifconfig’:
  • To check the output we use the commandID in another request:
  • And … -took from the doc, I just added the jq at the end-, If we want to combine both queries:

Making a new AMI from our base VM & instantiate it

Creating a new AMI

First we stop our VM.

Now in EC2 Console we will create a new AMI from our instance.

By default the images you create are private, you can change it if you want and share your AMI to the region you are using in Amazon’s cloud.

The real deployment scenario

We will request an instance creation, the launch and stop of our VM remotely, from a remote server or workstation.

When a new instance is created, we would like it to automatically execute a script at boot, using user-data, to eventually download fresh data from our reference server. User-data is nothing more than a shell script which is executed once. As you can see by clicking on previous links, this is pretty well documented by Amazon. User-data is only exectued at the very forst boot of your newly created instance, not at subsequent boots.

Therefore, we will also need to include something else to make our instance execute some stuff each time it will boot: we will use a basic /etc/rc.local script which will use rsynch to download our whole Gekko’s installation directory from our reference server, tweak it a little bit, and then launch Gekkoga.

We will also need to make a background script to carefully monitor Amazon’s indicators about the incoming shutdown of our instance. Spot instances are automatically shutdown by Amazon, and there is maximum a 2mn delay after its announcement. This will be detailled in the next article.

The whole process is:

Instantiating & executing actions at first boot

We want to tell the new instance of this image to execute a shell script at its very first boot. This could be very useful later. First we will create this script on our local reference server and put a few commands in it, but also activate logging on the VM (outputs will be available both to the /var/log/user-data.log and to /dev/console).

I create a script called 0.user_data.sh in a $HOME/AWS directory on my reference server, and put this inside:

We request the creation & launch of a new instance based on our Image ID. Note that I use the name of the key I defined earlier (gekko), I used the same subnet as my previous VM (don’t really know if that is mandatory, have to test), the security group ID can be checked on EC2 console “Security Groupes” menu, and we also specify what IAM role we want to allow to control the VM with AWS CLI (you created it earlier as it was mandatory for some CLI commands to run).

Also note that we pass our previously created 0.user_data.sh bash script as a parameter: its content will be transmitted to Amazon, which will make it launched at the first boot of the instance. If you want anything to be performed at the very first boot, just think to add it in this script.

Our new InstanceId is i-0c6d1148adebf33c3 . From EC2 console I can see it is launched. I want to check if my user-data script was executed.

This is quite good ! I also double checked on my reference server if I could see incoming ssh connections by adding an ssh execution + scp downloading request command to the script, and it’s ok: 2 connections as expected (one for the ssh, the other one for the scp).

We have a working “first time script” that the VM will execute upon its instantiation, and that we could customize later on to perform one-shot specific actions. Now, we want our VM to connect to our reference server at each boot, to make it prepare a package, then download it, then untar it, and execute a start.sh script that may be embedded inside.

Automatically download a Gekko/Gekkoga installation, tweak it, launch it, at each boot

First, on our EC2 reference VM (the one from which we created a new AMI so yes either we will have to later on create a new AMI, or you can perform this step while you are still preparing the first AMI), we will perform this:

Then we will edit /etc/rc.local (which is a symlink to /etc/rc.d/rc.local) and add this:

A few comments:

  • During my tests I encoutered a problem with sqlite, it seems linked to the type of platform used. To avoid this, I automatically rebuild the dependencies after the rsynch synchronization
  • I update the UIConfig files as on the EC2 instance I use NGinx, which I don’t use on my reference server @home
  • I added a line to restart Nginx as I noticed that I had to relaunch it manually before I could access Gekko’s UI. I didn’t investigate further to understand why, maybe later.
  • As some of you may have noticed, we are synching a remote Gekko’s installation in a local $HOME/gekko one. Therefore we need to delete the Gekko’s installation we previously made on our EC2 instance. IT was just deployed to test it 🙂

On our Reference server @home:

  • We create a $HOME/gekko/start_ui.sh script which contains, if this is not already the case:
  • We create a $HOME/gekko/gekkoga/start_gekkoga.sh which contains:

Remarks:

  • You shouldn’t have to modify the start_ui.sh script.
  • In the start_gekkoga.sh script:
    • You should only have to modify the TOLAUNCH variable, and pass it the name of your Gekkoga’s config file to be used, that all.
    • I wanted to keep one vCPU free to handle synchronization stuff, or other tasks required by the OS, so I dynamically check the number of CPU on the machine, reduce it by 1, and change the appropriate line in Gekkoga’s config file.
    • This has a side effect: on a 1 CPU machine, and this is the case of the smaller EC2s VMs, it will become “0”, and Gekkoga will fail to start, but pm2 will keep trying to relaunch it. This is why I added the pm2 “–no-autorestart” option to this script on the last line.

We reboot our EC2 reference instance:

After a few seconds, we check the rc.local log on our EC2 instance for the downloading of our reference package from our reference server. In rc.local, we redirected the logs to $HOME/AWS/logs/<date>_package.log :

Seems all good. Let’s check pm2’s status:

Gekkoga’s error is probably normal as we requested it to run with 0 parrallel queris … Let’s check its logs:



And yes, I can confirm after a quick test with 0 parralelqueries on my Reference server @home that this error is raised in this case. Good !

One more thing, let’s check if Gekko’s UI is remotely reachable:

Seems Perfect !

Now, before we make a new version of our reference AMI (it won’t be the last one :)), I will :

  • Make some cleaning in AWS/logs, AWS/logs/old, but also (on my Reference server @home) in gekko/history, gekko/strategies and gekko/gekkoga/config and /gekko/gekkoga/results, as I made a lot of tests.
  • Add a shellscipt to automatically update the Dynamic DNS handling my Gekko’s EC2 FQDN. As it is 99% personal, I won’t detail it here. What it does is checking the external IP of the machine, check if it is different than the latest knew one, if yes update the A record of the FQDN on the DNS server.

To create a new AMI from your reference VM, you know the procedure, we already did it above, as well as for instantiating it through your AWS CLI installed somewhere.

Next step will be to try to launch a Gekkoga backtesting session by instantiating our AMI on a much better VM in terms of CPU and memory.

But be warned, this will be charged by Amazon !

This will be next article’s topic.

How to create an Amazon EC2 “small” VM and automate Gekko’s deployment

Note (18/02/2019): a simpler deployment process is under active redaction.

We tried Gekkoga’s backtesting and noticed it is a CPU drainer. I never used Amazon EC2 and its ability to quickly deploy servers, but I was curious to test, as it could make a perfect fit for our needs: on-demand renting of high capacity servers, by using Amazon’s “Spot instance” feature. Beware, on EC2 only the smallest VM can be used for free (almost). The servers I would like to use are not free.

Our first step is to manage the automatic deployment of all packages we need to make Gekko & Gekkoga run and automatically start with the Strat we want to test. I want a one command process. We will test this on a small VM -the t2.micro- and using the standard AMI (Amazon Machine Image, the OS) “Amazon Linux 2”.

Once this step is complete, we will make a new AMI based on the one we deployed, including custom software and part of its configuration.

Next we will try to automate in a simple batch file the request, ordering, and execution of a new instance based on our customized AMI, with Gekkoga automatic launching & results gathering. This batch file would be used from my own personal/home gekko server that I use to modify and quickly test new Strats.

Launching a new free Amazon EC2 t2.micro test VM

I won’t explain everything here. First you need to create an account and yes you will need to enter some credit cards info, as most of the services can be used for free at the beginning, but some of them will charge a few cents when used (eg. map an Elastic IP on a VM and release it, when it not in used, you are charged, it’s cheap, but you will be charged; also you are allowed your free small VM only a few hours, so you need to stop it as soon as you can to make it available only whern you need it, this is “on-demand” Amazon’s policy, like it or don’t use it:) ).

Then we choose the AMI and then the smallest VM available, as it is allowed in the “free” Amazon package.

At the bottom of the page, click “Next: configure instance details”. On the following page, you can use all default values, but check:

  • The Purchasing option: you can ask for a Spot Instance, this is Amazon’s market place to request your VM to run on a fixed price you will provide, assuming Amazon’s got free ressources and will allow your VM to run at that price (neds to be superior to the demand)
  • The Advanced Details at the bottom.

The User data field is a place where we can use a shell script which be executed at boot by the VM. As sometimes the VM can be started when Amazons detect they should be (eg. Spot instances), this is a very nice place to use to automatically make your instance download some specific configuration stuff when it boots, for example our Gekko’s strats and conf files to use for automagcally launch our backtests. We will try this later (I did not try it yet myself at the moment I’m writing this but this is well documented by Amazon).

NExt we want to configure the storage, as Amazon allow us to use 30Gb on the free VMs instead of the default 8gb.

Next, I will add a tag explaining the purpose of this VM and storage (not sure about its exact future utility yet but whatever …).

Next, we configure a security group. As I already played a little bit with another VM I created a customized Security Group which allows ports 22 (SSH), 80 (HTTP) and 443 (HTTPS). I choose it but you will be able to do that later and map your own security group to your VM.

Next screen is a global review before VM creation and launching by Amazon. I won’t copy/paste it, but click on Launch at the bottom.

Next is a CRITICAL STEP. Amazon will create you some SSH keys that you need to store and to use to connect on the VM through SSH. Do not loose them. You will be able to use the exact same key for other VMs you would want to create, so one key can match all your VMs.

As I already generated one for my other VM (called gekko), I reuse it.

And next is a simple status page explaining the instance is launching and linking to a few documentation pages, that you should of course read.

Now when we click on “View instance” we are redirected on EC2 console (you will use it a lot) and we can see that our new instance is launched, and its name is the tag we defined earlier during setup (you also see my other VM, stopped).

Next we will connect to the VM shell by SSH. On my laptop running W10 I’ll use putty. I assume you downloaded your private key. With putty the PEM file needs to be converted using PuttyGen to generate a .ppk file it will be able to use.

You’ll also need to grab the public IPv4 address from EC2 console, by clicking on your instance and copying the appropriate field.

Now in Gekko you just have to save a session with your private .ppk key configured and ec2-user@<public IPv4 hostname grabbed from the console> as host. Keep in mind that this hostname and associated IP could change. If you can’t connect anymore to your VM, the first thing to do is to check your EC2 console to check its hostname.

We launch the session. Putty will ask you if you want to trust the host, click Yes.

Woohoo ! we are connected ! This was fast and simple.

Updating the VM & deploying our software

OK so now we need to deploy all the basic things we saw in previous posts, but also more things like Nginx to protect the access to Gekko’s UI, and later we will have to implement a way for the VM to automagically download updated Strats to run it. First let’s make sure we can run a simple backtest with Gekko.

The goal is to deploy all what we need to launch a functionnal Gekkoga VM, and then we will create a customized AMI to be reused on a better VM specialized in CPU computations. Note that EC2 can also supply VMs with specific hardware like GPU”s if you need to run software able to offload the computation on GPU cards, this is not our case here unfortunately but it might somedays as I would like to start to experiment AI.

I won’t explain everything below, this can be put in a shell script, and you can use the links to my blog to download a few standard things not compromising security, but there are some private parts that you will need to tweak by yourself, especially the ssh connection to my home servers of course.

All the steps below do not require manual operations but some are customized for my own need, read the comments.

First we update the VM and deploy generic stuff.

Next we deploy NGinx which will act as a Reverse Proxy to authenticate requests made to Gekko’s UI.

Now we need to define some very customized stuff, I won’t explain all as this article is not a complete how-to, you need sysadmin knowledges.

  • Create a user/passwd to be used by Nginx reverse proxy
  • To automate downloading of stuff from our home server using scp or the launch of actions on our home server through ssh (to automatically make a tarball of our gekko’s strats for exemple before downloading them) we will need to import our home server & user SSH key /home/ec2-user/.ssh/ and don’t forget to change its permissions with chmod 600

This is an example of what you could do once your reference server’s ssh key was successfully imported on your EC2 instance:

Now we just need to launch ngix .. and eventually save pm2 sessions so that it will be relaunched at boot.

Testing the VM

If everything was OK -And yes I know a lot of parts could have been wrong for you, but for me at the moment I was testing it it was OK- you should be able to launch your favorite web browser and target https://<Your VM FQDN> and see a login prompt. You need to enter the login/password you defined in /etc/nginx/.htpasswd

You should now see this …

My test dataset was correctly downloaded, it is well detected by Gekko. I will just give it a little update by asking gekko to download data from the 2019-01-07 22:30 to now and then upload it back on my reference server at home.

Next, let’s give a try to the strats we downloaded from our reference server at home …

All is running well …

We now have a good base to clone the AMI and make it a template for higer-end VMs. We will need to make it:

  • Able to download uptodate data from markets
  • Able to download up to date strats from our reference server@home
  • Launch one particular Gekkoga startup script
  • Make it upload or send the data somewhere

Please remember to stop your VM either from command line or from Amazon EC2 console so that it won’t drain all your “free” uptime credits !

Playing with AWS CLI

Now we want to control the launch and stop of our VM remotely, from a remote server or workstation, and we would like it to automatically execute a script at boot, using user-data, to download fresh data from our reference server. As you can see by clicking on previous links, this is pretty well documented by Amazon.

First, we need to install AWS CLI (Amazon Command Line Interface). On my server I had to install pip for Python.

Now we can install AWS CLI using pip, as explained in Amazon’s documentation. The –user will install it in your $HOME.

We add the local AWS binary directory to our user PATH so that we can launch it without having to use its full path. I’m using a Debian so i’ll add it in .profile

Now, we need to create some IAM Admin User & Groups from our EC2 console, to be able to use AWS CLI. Please follow Amazon’s documentation “Creating an Administrator IAM User and Group (Console)“. Basically you will create a Groupe, a Security Policy, and an Administrator user. At the end, you must obtain and use an Access Key ID and a Secret Access Key for your Administrator user. If you loose it, you won’t be able to retrieve those keys, but you will be able to create new ones for this user (and propagate the change on every system using them). So keep them safe.

Then we will use those keys on our VM, and on our home/reference server from which we want to control our instances. You can specify the region that Amazon attributed to you also if you want (hint: do not use the letter at the end of the region, eg. if your VM is running in us-east-2c, enter ‘us-east-2’).

Let’s test it with a few examples I got in the docs:

  • Fetch a JSON list of all our instances, with a few key/value requested:
  • Stopping an instance
  • Starting an instance
  • Ask the public IP of our running VM (we need to know its InstanceID):

To send remote commands to be executed on a specific VM, you will need to create a new IAM role in your EC2 Console, and make your VM use it, so that your remote calls will be authorized.

Give your VM an IAM role with the Administrator Group you defined before, and in which there is also the Administrator user we are using the keys on AWS CLI. Now we should be able to access the VM and send it informations and request data.

  • To make the vm execute ‘ifconfig’:
  • To check the output we use the commandID in another request:
  • And … -took from the doc, I just added the jq at the end-, If we want to combine both queries:

Making a new AMI from our base VM & instantiate it

Creating a new AMI

First we stop our VM.

Now in EC2 Console we will create a new AMI from our instance.

By default the images you create are private, you can change it if you want and share your AMI to the region you are using in Amazon’s cloud.

Instantiating & executing actions at first boot

We want to tell the new instance of this image to execute a shell script at its very first boot. This could be very useful later. First we will create this script on our local reference server and put a few commands in it, but also activate logging on the VM (outputs will be available both to the /var/log/user-data.log and to /dev/console).

I create a script called 0.user_data.sh in a $HOME/AWS directory on my reference server, and put this inside:

We request the creation & launch of a new instance based on our Image ID. Note that I use the name of the key I defined earlier (gekko), I used the same subnet as my previous VM (don’t really know if that is mandatory, have to test), the security group ID can be checked on EC2 console “Security Groupes” menu, and we also specify what IAM role we want to allow to control the VM with AWS CLI (you created it earlier as it was mandatory for some CLI commands to run).

Our new InstanceId is i-0c6d1148adebf33c3 . From EC2 console I can see it is launched. I want to check if my user-data script was executed.

This is quite good ! I also double checked on my reference server if I could see incoming ssh connections by adding an ssh execution + scp downloading request command to the script, and it’s ok: 2 connections as expected (one for the ssh, the other one for the scp).

We have a working “first time script” that the VM will execute upon its instantiation, and that we could customize later on to perform one-shot specific actions. Now, we want our VM to connect to our reference server at each boot, to make it prepare a package, then download it, then untar it, and execute a start.sh script that may be embedded inside.

Automatically download an updated Gekko customized package & deploy it, at each boot

First, on our EC2 reference VM (the one from which we created a new AMI so yes either we will have to later on create a new AMI, or you can perform this step while you are still preparing the first AMI), we will perform this:

Then we will edit /etc/rc.local (which is a symlink to /etc/rc.d/rc.local) and add this:

Note that I also added a line to restart Nginx as I noticed that I had to relaunch it manually before I could access Gekko’s UI. I didn’t investigate further to understand why, maybe later.

On our reference server:

  • We also have to create the same directories
  • And to create the $HOME/AWS/1.make_package.sh shell script, this is the script called by our EC2 instance at each boot:
  • And to create a $HOME/AWS/3.package_start.sh shell script, this is the script which will be embedded in the package built by 1.make_package.sh and that the EC2 instance will execute locally after downloading & unpacking of the package.

Note that we could improve it, as it is the only of our script which will not produce a log file prefixed with the same timestamp as the others. IT takes the local hour of execution, instead of receiving the timestamp as a parameter, as 1.make_package.sh.

We reboot our EC2 reference instance:

After a few seconds, we check the rc.local log on our EC2 instance for the downloading of our reference package from our reference server. In rc.local, we redirected the logs to $HOME/AWS/logs/<date>_package.log :

Seems all good. Now still on our EC2 VM, we check the 3.package_start.sh logfile :

Seems also good, no error. Of course we will have to tweak this batch file later so that the files we want to include in the downloaded package will be deployed at the right place before launching Gekko or other actions.

Now on our reference server at home:

Seems also good, no error. Same thing here, our 1.make_package.sh script on our reference server will need to be tweaked so that we include all the files we need in the package which will be created locally, then downloaded by the EC2 instance at next boot, including the other
3.package_start.sh which will be executed by the EC2 instance.

Now, before we make a new version of our reference AMI (it won’t be the last one :)), I will :

  • Make some cleaning in AWS/logs, AWS/logs/old, AWS/package and AWS/package/old dirs, but also in gekko/history, gekko/strategies and gekko/gekkoga/config and /gekko/gekkoga/results, as I made a lot of tests for the various shell scripts you found above and also because I plan to deploy through my reference server only what is necessary to Gekko to perform a backtest. It includes: the history database, its conf file, the strategy and indicators it needs, any other thing needed, same for Gekkoga. But now it’s up to you to decide what you want to do on your reference AMI !
  • Add a shellscipt to automatically update the Dynamic DNS handling my Gekko’s EC2 FQDN. As it is 99% personal, I won’t detail it here. What it does is checking the external IP of the machine, check if it is different than the latest knew one, if yes update the A record of the FQDN on the DNS server.

To create a new AMI from your reference VM, you know the procedure, we already did it above, as well as for instantiating it through your AWS CLI installed somewhere.

Next step will be to try to launch a Gekkoga backtesting session by instantiating our AMI on a much better VM in terms of CPU and memory.

But be warned, this will be charged by Amazon !

This will be next article’s topic.

Automate Gekko’s Strats parameters backtesting (with Gekkoga)

We saw in previous posts how to install gekko, use it, and customize our first strategy.

But, as we figured out, every strategy, shall it be your own custom one or any Strat you will find on Internet with excellent backtests results showed by its creator, also needs to be tweaked, for a specific market, currency, asset, and it means we need to find the good parameters to be used with this specific Strat. And you will need a lot of backtesting, then news tests on the live market with simulated orders (paperTrader mode), before being launched “live”.

Note that finding the perfect parameters for a backtest (the ones which will provide you the best profit and best sharpe ratio) does not mean that it will perform well on a live market, as trends and volumes can simply not be known by advance. Would be too easy. Therefore, the tools we will use here do have a strong limitation: they will help you to find the best parameters for a Strat, using data from the past, but in no way it means it will perform well in the future (see overfitting or curvefitting).

So, first of all, we need to define a good backtest strategy, whatever the way (automated or not) we will find and test parameters. IMO a good testing strategy -this is what is done in AI learning & testing phases- is to split your backtest dataset in several parts: one long dataset to make a general backtest and reach good profit & sharpe; then test it on smaller datasets, of course still from the same market/currency/asset, but with different kind of trends. This way we will be able to understand how well will perform the Strat with parameters X or Y on this or that kind of trend.

We could also run Gekkoga sessions on datasets “specialized” in a kind of trend, and check if the optimized parameters found will change a lot and how between each dataset.
With those kind of results and knowledge, we could imagine implement a strat which would dynamically change and auto-adapt its parameters to the current trend, if it’s a long term one. Remember the parameters won’t only depend on the trend, depending on the indicators used, it could also depends on the market prices or other things.

In any case, each “good” test should require a stronger -manual- analysis from you: you will need to study the trades (when they were made). Is it accurate or not ? Were the larges losses controlled by a stop-loss implementation or not ? If you change a little bit one parameter, won’t it make your Strat less profitable on your past dataset, but also less risky and more profitable for the future ? The main key is probably to control large market losses. Then to add some bonuses to the Strat.

Let’s come back to this post: I wanted to double my theoretical studies on various indicators -to better understand them and eventually find an appropriate way to mix them- with technical tools to improve the backtesting phase. When I test a Strat, I need to test it a lot of time; therefore I naturally searched for tools which would allow me to automate that, and I found -among others- Gekkoga.

Gekkoga is told to be a Genetic Algorithm (GA) trainer, it means that it will:

  1. Automagically test random parameters using controlled backtests it will launch through your Gekko installation,
  2. Automagically try to mix some of the parameters which seemed to perform well and launch new tests, and study the results, and/or mutate them or others,
  3. Log the best result it finds in terms of profit (may not be the most accurate target !) with the associated parameters used during the test,
  4. Until … I don’t know yet if it actually can ends sometime ! And I did not check the code to understand it yet, I simply used it.

Gekkoga Installation

Enough talks … Let’s install it. It’s quite simple, BUT you need a fully functionnal Gekko. Do not try to use Gekkoga if you don’t have a working Gekko and if you don’t masterize its use yet.

cd <gekko_installdir>

git clone https://github.com/gekkowarez/gekkoga.git && cd gekkoga

Now we need to deploy a fix made to make Gekkoga compatible with latest Gekko v0.6x we installed previously, as some changes were made in its API.

git fetch origin pull/49/head:49
git checkout 49

We manually download a fix in index.js to support nested Gekko’s parameters and fix something in mutations

mv index.js index.js.orig
curl -L -O https://raw.githubusercontent.com/gekkowarez/gekkoga/stable/index.js

We manually download a fix in package.json to support nested config-parameters

mv package.json package.json.orig
curl -L -O https://raw.githubusercontent.com/gekkowarez/gekkoga/stable/package.json

Then we install it. Once again, beware: don’t run ‘npm audit fix’ as suggested at the end of the npm install command below; it would break things.

npm install

Note: Gekkoga will need either Gekko’s full UI mode to be launched (use the PM2 startup script start_ui.sh we created in Gekko’s installation post), or the API server which is found in <gekko_installdir>/web, and it will make an intensive use of it. This is why in Gekko’s installation post, I recommended to raise the stock timeouts in <gekko_installdir>/web/vue/dist/UIconfig.js and in
<gekko_installdir>/web/vue/public/UIconfig.js to 600000.

Gekkoga Configuration

Gekkoga’s configuration file is located in <gekko_installdir>/config/. We will copy the original one to a new one dedicated to our previously customized strategy (MyMACD) and symlink it with the name of the config file we defined in the start.sh script. It will make our life easier later when we will have new strats to backtest: we will just need to copy in gekkoga/config one config file with the filename containing the name of the strat used, and update the symbolic link config/config-backtester.js to point on this specific config.file.

cp <gekko_installdir>/gekkoga/config/sample-config.js
gekko_installdir>/gekkoga/config/config-MyMACD-backtester.js

ln -s
<gekko_installdir>/gekkoga/config/config-MyMACD-backtester.js
<gekko_installdir>/gekkoga/config/config-backtester.js

Now we will edit <gekko_installdir>/gekkoga/config/config-MyMACD-backtester.js. It is not complicated BUT we will need to define EXACTLY the same parameters as in your gekko config file or toml file. Otherwise Geekkoga will start, but with no trades if anything is wrong. Beware of the typos, beware of the type of data you will use and their type (integer vs float) & eventual decimals.

Hint: try to use as much integers as possible in your Strat parameters, and avoid floats when you can. This is why in our customized MACD Strat, I defined the stoploss percentage as an integer, and then in MyMACD.js when we need to use it, we divide it by 100. If we used a float to allow very accurate stoploss, it would have force us to tell Gekkoga to generate randomized floats, and even if we can try to fix the number of decimals used, the number of possible combinations and subsequent backtests to perform would be exponential. also, the .toFixed(2) we will sometimes used in Gekkoga conf file is an artefact: the library used to generate the random numbers will actually generate floats with a much higher precision than 2 decimals, but we will artificially truncate or round it to 2 digits. It means that Gekkoga will indeed perform a lot of backtests with the same float rounded to 2 digits, because the floats it actually generated in backend were indeed not equals.

First we change the config section, once again we want it to reflect EXACTLY our gekko config file. Same parameters, same values.

const config = {
stratName: ‘MyMACD‘,
gekkoConfig: {
watch: {
exchange: ‘kraken‘,
currency: ‘EUR,
asset: ‘ETH
},

We use the scan functionnality to automatically the daterange of the dataset to use, as we only have one dataset for kraken; and for now we want to test Gekkoga on the whole dataset. Later on when we know Gekkoka works, you will be able to change that in order to to reduce the dataset and reflect the testing strategy I explained before.

daterange: ‘scan’,

/*
daterange: {
from: ‘2018-01-01 00:00’,
to: ‘2018-02-01 00:00’
},
*/

Now we update our balance and fees.

simulationBalance: {
‘asset’: 0,
‘currency’: 100 //note that I changed this since initial confs in other posts
},

slippage: 0.05,
feeTaker: 0.16,
feeMaker: 0.26,
feeUsing: ‘taker’, // maker || taker

The apiURL should be OK.

apiUrl: ‘http://localhost:3000’,

We won’t change the standard populationAmt, variation, mutateElements, minSharpe or mainObjective.

parallelqueries needs to be updated to reflect your CPU configuration, as it is the number of parallel backtests Gekkoga will be able to launch. the more CPU you have, the better it is. But align this on your number of CPU. If you have 4 CPU or vCPUs, use 3 or 4. With 4, your whole CPU capacity will be filled by Gekkoga, it could make your computer almost unusable for other tasks while Gekkoga is running (and your CPU fan will start to make noise). If you have a dedicated Gekoga computer this is fine, if you don’t, this may be a problem so consider a lower value. It’s up to you.

In my case,

  • On my regular laptop, I have 2 CPUs but 4 seen by the OS thanks to hyper threading, so I’ll use 3 as I wan’t one CPU to be available for other tasks;
  • At the time I’m writing this article, I tried to run Gekkoga on an Amazon EC2 t2.micro with this setting to 1, I lost control on the VM and had to restart it;
  • For this test I will launch it on my Intel NUC VM, powered with 2 vCPU, but I’ll keep the setting to 1 to not stress it too much as a NUC is not designed for intensive CPU computations (I’m afraid the fan won’t cool enough the case & CPU).

parallelqueries: 1,

I don’t use emails notifications for now so I leave it to false.

Now we enter the interesting part. We will explain Gekkoga all the parameters we need our Strat to be filled with, and their values. If you enter a fixed value, Gekkoga will use this value all the time, in every backtest. It won’t change it. But we can define:

  • Some ranges, by using arrays, eg. [5,10,15,30,60,120,240]
  • Randomized values, by using functions as randomExt.integer(max, min) or randomExt.float(max, min).toFixed(2)

Did you notice the .toFixed(2) ? It forces the randomized float to be rounded to 2 decimals and this is the rounded value which will be used in Gekko’s backtest. But keep in mind that the float will still be generated with a higher number of digits, and 2.0003 or 2.0004 will in both cases be rounded to 2.00. It will lead to duplicate backtests. This is why I recommanded you to use as much integers as possible instead of floats.

First, the candleValues. I’m not really confident about short candles, but as the tool will test it for us, why not. Let’s extend it a little bit and remove a few values.

candleValues: [5,15,30,60,120,240,480,600,720],

Now the Strat parameters …

getProperties: () => ({

historySize: randomExt.integer(50, 0),

short: randomExt.integer(30,5),
long: randomExt.integer(100,15),
signal: randomExt.integer(20,6),

thresholds: {
//up: randomExt.float(20,0).toFixed(2),
//down: randomExt.float(0,-20).toFixed(2),
up: randomExt.integer(400,0)/100,
down: randomExt.integer(0,-400)/100,
persistence: randomExt.integer(9,0),
stoploss: randomExt.integer(50,0),
},

Let’s give it a try …. we will need to carefully check the console information.

gekko@bitbot:~/gekkoga/gekkoga$ node run.js -c config/config-MyMACD-backtester.js
No previous run data, starting from scratch!
Starting GA with epoch populations of 20, running 1 units at a time! node run -c config/config-MyMACD-backtester.js

Woohoo ! It started. Now I stop it and will create a nice PM2 startup script as I want to easily make it run in the background and easily get information on it.

echo #!/bin/bash > start.sh
echo “rm logs/*” >> start.sh
echo “pm2 start run.js –name gekkogaMyMACD –log-date-format=\”YYYY-MM-DD HH:mm Z\” -e logs/err.log -o logs/out.log — -c config/config-MyMACD-backtester.js max_old_space_size=8192 >> start.sh

chmod 755 start.sh

We restart it using this script …

./start.sh

Let’s check its logs …

It’s running in the background, good. Now let’s have a look at Gekko’s UI logs as it is supposed to receive API calls from Gekkoga:

We can see some calls to the backtest API, perfect. Now let’s check other informations, while we are waiting for some first results:

Conclusion: Gekkoga is pushing hard on one of the 2 CPU, this is conform to what we defined in conf. Memory consumption is low.

And finally, 21 minutes later, the first epoch completed:

You can find an explanation of an epoch here. What we see here is the winner of this epoch, and Gekkoga keeps running to compute more and compare them. It logs the best combination found in <gekkoga_installdir>/results in a JSON format, so to display it we will use jq (run ‘apt-get install jq’ as root if you don’t have it yet):

So we see here that the winner for now used a long value of 65, short 29, signal 9, 5mn candlesize, 15 candles history size, an up threshold of 2.44, a down threshold of -2.13, a stoploss of 16% and a persistence of 0. Our Gekkoga config file using only integers is well formated !

Now, the sharpe is very high, but in terms of estimated profits, on this whole dataset we actually performed better (1101%) than the market (817%), with 68 trades.

Now the problem is that it will take a very very .. very … long time to run. so, for now, I don’t have much more to say, we have to wait. At the same time we can continue to work on improving our knowledge about indicators and how they work, and imagine improvements to our Strats.

The only way to optimize the runtime seems to be to make it run on a higher number of CPUs and adapt the parallelqueries setting. I’ve never done that before but it gave me the idea to try to make it run on an Amazon EC2 machine. this will be detailed in another article.

%d bloggers like this: