Backtesting is very slow? Can we speed it up?


#1

On a small data set (2 days of minute data) catalyst normally takes about 30 seconds to execute with a very basic strategy.

  • Is there a way to speed up testing?
  • will there ever be the possibly using GPU acceleration?
  • is there a way to re-execute only part of the code without unloading / re-loading data, so that memory management is performed better?

I’m executing the catalyst code from another bit of python code and will be re-executing hundreds / thousands of times. (using an evolving algorithm to tune parameters)


#2

Hey,
I am totally with you as we are facing the same issue. Testing different parameters with 2 years of minute data takes a long time while daily data is not detailed enough. Is there any plan to provide hourly data? That would be a good compromise between calculation time and detail.
Jan


#3

As far as I know python only uses one thread of your CPU, which makes it slow.

There are some ways to enable multithreading but it is complex and I don’t know how its done.

Maybe you can start here: https://stackoverflow.com/questions/2846653/how-to-use-threading-in-python

Hope it helps :slight_smile:


#4

I figured out that Catalyst takes a bit of time providing the values via ‘data.current()’.

So I exported all desired minute data to a JSON-File, which I then import into my own data structure in the ‘initialize()’ function.
Now I’m just calling these values in the ‘handle_data()’ function instead of asking Catalyst for it.

I just tested this for one month and managed to save more than 50% of computation time.


#5

@Thomas, I was thinking of the same thing. I am just starting with Catalyst and getting reacquainted with Python. Would you mind sharing an example of what you are describing. :slight_smile:


#6

Found the following:

I can grab OHLC data. Then I discovered catalyst ingest can import from csv.

There IS hope. :slight_smile:


#7

Do we know the reason? This looks to be a deal breaker for me.


#8

Thank you all for your reports and sorry for the delay in our response.
@Thomas, your findings sound interesting. Could you please elaborate on the improvements you have done? How many fields/symbols were read at a one data.current call?
I also suggest we move this discussion to Github: could anyone of you please open an issue there?


#9

I have found a speed improvement proposed for quantopian on Github. What it does is basically skip the perf tracker for each iteration of handle_data and just collects the portfolio value. That means you have to calculate all performance metrics from the portfolio value at the end of the backtest. But I got a speed improvement of about 3-5x.

Unfortunately the link to the branch is not available any more, but I have a implementation of the code in Catalyst which I’ll put on Github


#10

Thanks for sharing your improvement @wafram!
Can’t wait to see it.

I just opened a Github issue for this discussion.


#11

Hi, I only started playing around with Catalyst but I had the same remark/question. Hourly data would be very nice to have.
Is there any plan to add (or ETA) for hour resolution ?


#12

If you are OK with trades only executing hourly, you can use schedule_function instead of handle_data, and backtesting will go much quicker.