Launching Gekkoga on high-end EC2 Spot machine

So now that we know how to launch an EC2 instance from an Amazon EC2 AMI with batched gekko/gekkoga app/conf deployment, we want to learn how to use it on better CPU-sized machine, at a good price (Amazon EC2 Spot feature), so that we can -basically- bruteforce all possible parameters and inputs of a given trading strategy, using Gekkoga’s genetic algorithm.

The main documentations we will use:

As explained in first Amazon’s documentation, we first create a new role AWSServiceRoleForEC2Spot in our AWS web console. This is just a few clicks, please read their doc.

Handling Amazon’s VMs automatic shutdown for Spot instances

Next we need to take care of Amazon’s automatic shutdown of Spot instances, as it depends on market price, or on the fixed duration of usage you specified in your instantiation request. When Amazon decides to shutdown an instance, it will send a technical notification to the VM, that we can watch and query using an URL endpoint. Yes it means that we will need to embed a new script on our EC2 VM & Refence AMI, to handle such a shutdown event, and make it execute appropriate actions before the VM is stopped (Amazon announces a 2mn delay between the notification and the effective action of shut down, this is short).

The way I chosed to do it (but there are others) is to launch a ‘backgrounded’, recurrent, and permanently deployed script at VM boot through our already modified rc.local, and this script will poll every 5 seconds the appropriate metadata using a curl call. If the right metadata is provided by Amazon, we then execute a customized, specific, shutdown script which needs to be embedded in the customized package our VM will automatically download from our reference server at boot time.

So, as you saw in previous article, we insert those 3 lines just before the last “exit 0” instruction in our /etc/rc.local file:

Then we create our /etc/rc.termination.handling script, based on indications from amazon’s documentation:

We make it executable:

We will now test if this is working. The only thing we won’t be able to test right now is the real URL endpoint with terminations informations. First, we reboot our EC2 reference VM and we verify that our rc.termination.handling script is running in the background:

Now we will test its execution, but we need to slightly change its trigger as our VM is not yet a Spot instance and the URL we check won’t embed any termination information, therefore it will return a 404 error. I also disabled the loop so that it will just execute once.

We manually execute it, and we check the output log in $HOME/AWS/logs:

Now we check on our Reference server @home if the results were uploaded by the EC2 VM.

That’s perfect !

Don’t forget to cancel the modifications you made previously in rc.termination.handling to test everything.

Checking Spot Instances price

First we need to know what kind of VM we want, and then we need to check the Spot price trends to decide for a price.

For my first test, I will choose a c5.2xlarge, it embeds 8 vCPU and 16Gb of memory. Should be enough to launch Gekkoga with 7 concurrent threads.

Then we check the price trends and we see that the basic market price is -at the moment- around $0.14, this will be our base price in our request as we just want to test for now.

It is also interesting to look at the whole price trend over a few months, and we can see it actually increased a lot. Maybe we could get better machines for the same price.

Let’s check the price for the c5.4xlarge :

Conclusion: for $0.01 more, we can have a c54xlarge with 16 vCPU and 32Gb of RAM instead of a c5.2xlarge with 8 vCPU and 16Gb of RAM. Let’s go for it.

Requesting a “one-time” Spot Instance from AWS CLI

On our Reference server @home, we will next use this AWS CLI command (I’ve embedded it in a shell script called start_aws.sh). For details on the json file to provide with the request see Amazon’s documentation.

Note:

  • The –dry-run parameter: it asks the CLI to simulate the request and display any error instead of really trying to launch a Spot instance.
  • This time, for my first test, I used a “one-time” VM with a fixed duration execution time, to make sure to know how long it will run, and therefore the price is not the same as the one we saw above ! It is higher.
  • Once our test is successfull, we will use “regular” VMs with prices we can “decide” depending on the market, but also with a run time we can’t anticipate (it may be stopped anytime by Amazon if our calling price becomes lower than the market price. Otherwise you will have to stop it).

Then we create a $HOME/AWS/spot_specification.json file and we use appropriate data, especially our latest AMI reference:

We try the above aws cli command line to simulate a request …

Seems all good. let’s remove the –dry-run and launch it.

On the EC2 Web console we can see our request, and it is already active, it means the VM was launched !

Now in the main dashboard we can see it running, and we get its IP (we could do it via AWS CLI also):

Let’s SSH to it and check the running process:

It seems to be running .. And actually you can’t see this but this is running well, as I keep receiving emails with new “top result” found from Gekkoga as I activated the emails notifications. This is a good way to also make sure you will backup the latest optimum parameters found (once again: this is a backtest and in no way it means those parameters are good for the live market).

Let’s check the logs:

It’s running well ! Now I’ll just wait one hour to check if our termination detection process is running good and backups the result to our Reference server @home.

While we are waiting though … Let’s have a look in the logs, on the computation time:

Epoch #13: 87.718s. So, it depends on the epoch and computations requested by the Strategy, but last time we had a look at it, epoc #1 took 1280.646s to complete, for the same Strat we customized as an exercise.

One hour later, I managed to come back 2mn before the estimated termination time and I could manually check with curl that the termination meta-data were sent by Amazon to the VM.

Our termination script performed well and uploaded the data on my Reference server @home, both in the ~/AWS/<timestamp>_results and in gekkoga/results dir so that it will be reused at next launched by Gekkoga.

Using a Spot instance with a “called” market price

So we previously used a “one-time” spot instance with a fixed duration time, which means we ensured it would run for the specified duration, but at a higher and fixed price. Because we managed to backup the data before its termination, and because Gekkoga knows how to reuse it, we will now use lower priced Spot instance, but with no guarantee it will run long.

Let’s modify our AWS cli request .. We will test a c5.18xlarge (72 CPU, 144Gb of RAM) at a market price of $0.73 per hour.

Amazon started it quite immediately. Let’s SSH it and check the number of process launched.

So … This is quite interesting because parallelqueries in gekkoga/config/config-MyMACD-backtester.js is well set to 71, BUT it seems the maximum node processes launched will cap to 23. Don’t know why ! But until we find out, it means there is no need to launch a 72 CPU VM. A 16 or 32 CPU may be enough for now.

It seems this is the populationAmt which is limiting the number of threads launched. Setting it up to 70, still with a parallelqueries to 71, will make the number of threads to increase, and stabilize around 70 but with some periods down to 15 threads. Would be interesting to graph it & study it. Maybe there is also a bottleneck on Gekko’s UI/API which has to handle a lot of connections from all Gekkoga’s backtesting threads.

Now, I’ll need to start to read a little bit more littérature on this subject to find a good tweak … Anyway right now i’m using 140 for populationAmt and I still have some “low peaks” down to 15 concurrent threads for nodejs.

Result

After 12 hours without any interruption (so, total cost = $0.73*12 = $8,76), those are all the mails I received for this first test with our previously customized MyMACD Strategy.

As you can se if the screenshot is not too small, on those data from the past, with this strategy, the profit is much better with long candles and long signals.

Again, this needs to be challenged on smaller datasets, especially “stalling” markets, or bullish markets as nowadays. This setup and automation of tests will not guarantee you, in any way, to earn money.

A few notes after more tests

Memory usage:

  • I tried several kind of machines, right now I’m using a c4.8xlarge which has still good CPUs but lower RAM than the c5 family. And I started to test another customized Strat. I encountered a few crashes.
    • I initially thought it was because of the CPU usage caping to 100 as I increased the number of parallelqueries and populationAmt. I had to cancel my spots requests to kill the VMs.
    • Using the EC2 Console, I checked the console logs, and I could clearly see some OOM (Out of Memory) errors just before the crash.
    • I went into my Strat code and tried to simplify everything I could, by using ‘let’ declarations instead of ‘var’ (to reduce the scope of some variables), and managed to remove one or two variables I could handle differently. I also commented out every condition displaying logs, as I like to have log in my console when Gekko is trading live. But for backtesting, avoid it. No log at all, and reduce every condition you can.
  • I also reduced the max_old_space_size parameter to 4096 in gekko/gekkoga/start_gekkoga.sh. It has a direct impact on Node’s Garbage Collector. It will make the GC collect the dust twice often than with the 8096 I previously configured.
  • Since those two changes, i’m running a session on a c4.8xlarge for a few hours, using 34 parallelqueries vs 36 vCPUs. The CPUs are permanently 85% busy which seems good to me.

Improvements: in a next article I will detail two small changes I made on my Reference AMI:

  • The VM will send me a mail when it starts, with its details (IP, hostname, etc.) or when a termination is announced
  • I added a file monitoring utility to “live” detect any change in the Gekkoga’s result directory, to upload it immediately on my Reference Server @home. I had to do this because I noticed that when you ask Amazon to resiliate a Spot request whith a running VM associated to it, it immediately kills the VM, without announced termination, so previous results where not synched on my Home server (but I had the details of the Strat configuration by email).

Also, important things to remember:

  • Amazon EC2 will launch your Spot Instance when the maximum price you specified in your request exceeds the Spot price and capacity is available in Amazon’s cloud. The Spot Instance will run until it is interrupted or you terminate it yourself.  If your maximum price is exactly equal to the Spot price, there is a chance that your Spot Instance remains running, depending on demand.
  • You can’t change the parameters of your Spot Instance request, including your maximum price, after you’ve submitted the request. But you can cancel them if their status is either open or active.
  • Before you launch any request, you must decide on your maximum price, and what instance type to use. To review Spot price trends, see Amazon’s Spot Instance Pricing History
  • For our usage, you shall request “one-time request” instances, not “persistent” request (we only used it for testing), which means that you need to embbed a way in your EC2 VM to give you feedback about latest optimized parameters found for your Strat (by email for example or by tweaking Gekkoga to send live results (note for later: TODO))

And remember: nothing is free, you will be charged for this service, and there is NO GUARANTEE that you will earn money after your tests.

1 thought on “Launching Gekkoga on high-end EC2 Spot machine

  1. nice article, i do remind that the inventer of gekko, also noted that it isnt optimal for small candle sizes. I’ve noticed most neural nets behave terrible with short times as well. On the other hand, i think candles can be added as well (5+5=10min candle), so then it just comes then down to optimal smoothing effects / while one would think a good algorithm would finds profit in all time smooth ranges; and if it doesnt, then the algorythm just is a ‘randomly-data’ best fit. (ea it wont work as good if you change the times ranges, or time scales. It just happened to be a fit for that specific graph.
    In those cases we lack the knowledge of what we do, and try random fits of math..

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.