Placing orders using zscore - statistical arbitrage. Help needed


#1

Firstly, great job thus far! I’ve been looking for a solution to implement my already back-tested strategy but as I’m a python novice the catalyst engine seems like an ideal solution.

My problem - I’m trying to run a quick and dirty version of a statistical arbitrage strategy, just to familiarize myself with catalyst and see if the backtest results match those in my other non-catalyst script. The other script I mention screens the polo universe of cryptos, conducts pairwise analysis throughout to determine cointegrated pairs, runs a quick backtest to find pairs with a CAGR >10% and sharpe >0.5% and spits out combinations which meet these criteria. It’s one of these pairs I’m now trying to test using catalyst but I can’t make it open orders! I feel like it’s probably something simple I’m missing but I’ve been fiddling with it for 2 days no and can’t get anywhere so any help would be appreciated.

FYI I’m looking at BTCBCN & BTCLTC, 1m frequency, poloniex (2 days at the start of July).


# -*- coding: utf-8 -*-
"""
Created on Tue Sep 18 16:34:41 2018

@author: Alex

removed kalman filter regression for state means - too slow.


"""

# for collecting data & running catalyst

import pandas as pd

from datetime import datetime
from catalyst.api import order, record, symbol, symbols
from catalyst.utils.run_algo import run_algorithm

#for statistical tests
import statsmodels.tsa.stattools as ts
import statsmodels.api as sm
import matplotlib.pyplot as plt
from pykalman import KalmanFilter
import numpy as np
from numpy import log, polyfit, sqrt, std, subtract



def initialize(context):
    
    context.asset = symbol('ltc_btc')
    context.asset2 = symbol('bcn_btc')
    
    #context.entryZscore = 2 
    #context.exitZscore = 0
    
def handle_data(context, data):
    
    context.i = 0
    
    ltcbtchist = data.history(context.asset,
                              'price',
                              bar_count=2800,
                              frequency = '1T')
    
    bcnbtchist = data.history(context.asset2,
                              'price',
                              bar_count=2800,
                              frequency = '1T')
    
    context.ltcbtchist = ltcbtchist
    
    #ltcbtc = data.current(context.asset, 'price')
    #bcnbtc = data.current(context.asset2, 'price')
    
    
    #print('Data: {}'.format(data.current_dt))
    #print('LTCBTC: {}'.format(ltcbtc))
    #print('BCNBTC: {}'.format(bcnbtc))
    
#%%

#Spread
    
    
    est = sm.OLS(bcnbtchist, ltcbtchist)
    est = est.fit()
    hr = -est.params[0]
    context.spread = bcnbtchist + (ltcbtchist * hr)

   
#%%

#Halflife
    spread = context.spread
    
    spread_lag = spread.shift(1)
    spread_lag.iloc[0] = spread_lag.iloc[1]
    
    spread_ret = spread - spread_lag
    spread_ret.iloc[0] = spread_ret.iloc[1]
    
    spread_lag2 = sm.add_constant(spread_lag)
     
    model = sm.OLS(spread_ret,spread_lag2)
    res = model.fit()
    context.halflife = int(round(-np.log(2) / res.params[1],0))
 
    if context.halflife <= 0:
        context.halflife = 1    
        
#%%
        
    #Zscore
    context.meanspread=spread.rolling(window=context.halflife).mean()
    context.stdspread = spread.rolling(window=context.halflife).std()
    
    context.zscore = (spread-context.meanspread)/context.stdspread
    
    
#%% 

# Trading logic
    
    context.i += 1
    if context.i < context.halflife:
        return

    if context.zscore >= 1:
        order(context.asset, amount=1)
        order(context.asset2, amount= -1)
        
    elif context.zscore <= -1:
        order(context.asset, amount=-1)
        order(context.asset2, amount= -1)
        
    
    


def analyze(context, perf):
    
   
    
    ax1 = plt.subplot()
    perf.portfolio_value.plot(ax=ax1)
    
    
    #print(context.ltcbtchist)
    #print(context.spread.shape)
    #print(context.halflife)
    print(context.zscore)
    #print(context.meanspread)
    #print(context.stdspread)
    #print(context.spreadlength)
    
    
results = run_algorithm(initialize=initialize,
                        handle_data=handle_data,
                        analyze=analyze,
                        live=False,
                        start=pd.to_datetime('2018-7-1', utc=True),
                        end=pd.to_datetime('2018-7-2', utc=True),
                        exchange_name='poloniex',
                        data_frequency='minute',
                        quote_currency ='btc',
                        capital_base=10000 )
    

The zscore seems to be computed, it’s printable (context.zscore) and if you do this you’ll see plenty values above and below 1 and -1 (sell the spread and buy the spread thresholds) but no orders get opened. Any advice would be greatly appreciated.


#2

It looks like it isn’t ordering because context.i always equals 1.


#3

Also, a couple of other issues:

  • If you remove the halflife requirement, it will give an error message, as you want to use the current zscore, which would be context.zscore[-1].

  • Your order logic potentially has issues. First, it looks like for bcn_btc, if it meets either requirement, it only sells BCN. In backtesting, it would short BCN, but never buy it. This is a problem because you can’t short using Catalyst, nor is BCN eligible for margin trading on Poloniex. Also, the order logic doesn’t check for available funds, and using order, it would buy or sell 1 LTC and/or 1 BCN every transaction. Did you mean to have it buy or sell 100% or 0% when it meets those requirements? Finally, your strategy relies on market orders, which is fine for backtesting, but doesn’t work in live trading on Poloniex – you would have to use limit orders. You would probably have to set up another requirement that cancels open orders should one not fill (either within a certain timeframe, or within a certain price range), otherwise, in live trading, your strategy would potentially just get “stuck” if the price moves too far away from your order price.

Anyway, I made the following changes and ran a backtest for 7-1-18 to 7-2-18, and it saw about a 2% gain. Then I realized that you didn’t have trading fees accounted for, ran it again, and it actually went up to 5%.

Trading logic

#context.i += 1

#if context.i < context.halflife:
    #return

if context.zscore >= 1 and context.asset not in context.portfolio.positions:
    order_target_percent(context.asset, 1)
    order_target_percent(context.asset2, 0)
    
elif context.zscore <= -1 and context.asset2 not in context.portfolio.positions:
    order_target_percent(context.asset, 0)
    order_target_percent(context.asset2, 1)

#4

Hi @SOG35 , thanks for the help and the pointers.

I’ve yet to include any of my trading logic (other than what you see) or money management, I was just smashing my head against the wall trying to get it to open any orders at all which, thankfully, now I can do! I didn’t realise you can’t short using catalyst, that might throw a spanner in the works. As for the order size, the idea would be to open equal ($ or btc value) buys and sells. I’m thinking I’ll take this up to daily frequency - before using catalyst the lowest timeframe I’d used was 5m. It’s a work in progress with a long way to go but the examples here are great for grabbing snippets so now I’ve got some orders opening I can get cracking with padding it out.

Thanks again and if you’ve got any more advice, chuck it my way!


#5

I’ve been working with zipline (first Quantopian, then Catalyst) for about a year and a half, and I still don’t have something that I’m confident enough to live trade with, but I’ve learned a ton. My advice:

  • Don’t trade using 1m prices. There are 1440 minutes in a day and much of that is noise. You can trade on a minute by minute basis, but I’d use larger bars.

  • For the most accurate backtesting results, pick markets that have high volume. You may be able to make money on low volume pairs, but the backtesting results will be extremely exaggerated. For example, Poloniex USDT pairs have virtually no volume in 2015 and most of 2016, so without slippage being accounted for, you can just write something that can go from 1 BTC to like 900 million (not kidding) in a year, which clearly isn’t realistic.