Marketcap-weighted long-only sample strategy


#1

Hey everyone! This sample strategy might help new users looking to get started using the powerful Catalyst platform and infrastructure, and moreover, get a flavor of tapping into the wide array of data sources offered via the decentralized data marketplace.

Background
This strategy is a marketcap-weighted long-only strategy that holds 10 cryptoassets at a time, rebalancing every 30 days.

Algorithm

import os
import tempfile
import time

import pandas as pd
from logbook import Logger

from catalyst import run_algorithm
from catalyst.api import symbol, record, order_target_percent, get_dataset
from catalyst.exchange.utils.stats_utils import set_print_settings, \
    get_pretty_stats
# We give a name to the algorithm which Catalyst will use to persist its state.
# In this example, Catalyst will create the `.catalyst/data/live_algos`
# directory. If we stop and start the algorithm, Catalyst will resume its
# state using the files included in the folder.
from catalyst.utils.paths import ensure_directory

NAMESPACE = 'marketcap_weighting'
log = Logger(NAMESPACE)


# To run an algorithm in Catalyst, you need two functions: initialize and handle_data. You can add any additional
# helper functions to refactor code.

def rebalance(context, data):
    # Marketcap df is indexed by timestamp in hourly increments, thus we need to find the most recent marketcap
    # data by flooring the current time to nearest hour obtaining the marketcap data for this hour
    df = context.enigma_marketcap_df.loc[context.datetime.floor('1D')]

    # Find valid tradable symbols on this exchange for this date
    symbols = [a.symbol for a in context.exchange.assets if a.start_date < context.datetime]
    assets = []
    for currency, price in df['market_cap'].iteritems():
        if len(assets) >= context.n_assets:
            break

        for quote_currency in context.quote_currencies:
            s = '{}_{}'.format(currency.decode('utf-8'), quote_currency)
            if s in symbols:
                # Append this valid trading pair to assets list
                assets.append(symbol(s))
                break

    asset_base_currencies = [asset.base_currency.encode() for asset in assets]
    # Determine the assets that were previously in the portfolio, but not in the top 10 anymore and remove them
    removed_assets = list(set(context.previous_assets) - set(assets))
    for removed_asset in removed_assets:
        order_target_percent(removed_asset, target=0)
        record(f'{removed_asset.base_currency}_pct', 0)
        log.info(f'Removing {removed_asset.base_currency} from portfolio')

    # Determine each asset's respective weighting to construct rebalanced portfolio
    market_caps = df.loc[asset_base_currencies, 'market_cap']
    market_caps.drop_duplicates(inplace=True)
    contribution_pct = market_caps/market_caps.sum()
    for asset in assets:
        alloc_pct = contribution_pct.loc[asset.base_currency.encode()]
        # Set order target percentage to be the asset's marketcap-based weighting
        order_target_percent(asset, target=alloc_pct)
        record(f'{asset.base_currency}_pct', alloc_pct)
        log.info(f'Ordering {asset.base_currency} at a marketcap-weighted portfolio percentage of {alloc_pct:.3f}')
    context.previous_assets = assets


def initialize(context):
    # This initialize function sets any data or variables that you'll use in
    # your algorithm.  For instance, you'll want to define the trading pair (or
    # trading pairs) you want to backtest.  You'll also want to define any
    # parameters or values you're going to use.

    # We create a marketcap-weighted long-only index comprised of 10 cryptoassets
    context.n_assets = 10

    # Obtain initial dataset to determine index holdings for remaining simulation
    context.enigma_marketcap_df = get_dataset('coinmarketcap historical data')

    # Data cleaning of marketcap data - remove nan and non-numeric values
    data_clean_mask = (context.enigma_marketcap_df['market_cap'] != '-') & \
                      (context.enigma_marketcap_df['market_cap'] != b'-') & \
                      (~context.enigma_marketcap_df['market_cap'].isnull())
    context.enigma_marketcap_df = context.enigma_marketcap_df[data_clean_mask]
    context.enigma_marketcap_df['market_cap'] = context.enigma_marketcap_df['market_cap'].astype(int)
    context.enigma_marketcap_df.sort_values(by=['market_cap'], ascending=False, inplace=True)
    context.enigma_marketcap_df.reset_index(level=1, inplace=True)
    # Lowercase symbols to make indexing into the dataframe and symbol generation easier from now on
    context.enigma_marketcap_df['symbol'] = context.enigma_marketcap_df['symbol'].str.lower()
    context.enigma_marketcap_df.set_index(['symbol'], append=True, inplace=True)
    context.exchange = context.exchanges[next(iter(context.exchanges))]

    # Set quote currencies to try for a usdt denomination, otherwise btc
    context.quote_currencies = ['usdt', 'btc']
    context.previous_assets = []

    context.rebalance_period = 30
    context.i = 0


def handle_data(context, data):
    # Check if counter indicates a rebalance date (every 30 days)
    if context.i == 0 or context.i % context.rebalance_period == 0:
        log.info(f'Rebalancing on date ({context.datetime})')
        rebalance(context, data)
    record('rebalanced', context.i == 0 or context.i % context.rebalance_period == 0)
    context.i += 1


def analyze(context=None, perf=None):
    perf.to_hdf('./marketcap_weighted_perf.h5', 'df')
    stats = get_pretty_stats(perf)
    print('the algo stats:\n{}'.format(stats))
    perf.loc[:, ['portfolio_value']].plot()
    pass


if __name__ == '__main__':
    # The execution mode: backtest or live
    live = False

    if live:
        run_algorithm(
            capital_base=1000,
            initialize=initialize,
            handle_data=handle_data,
            analyze=analyze,
            exchange_name='poloniex',
            live=True,
            algo_namespace=NAMESPACE,
            base_currency='usdt',
            live_graph=False,
            simulate_orders=False,
            stats_output=None,
        )

    else:
        folder = os.path.join(
            tempfile.gettempdir(), 'catalyst', NAMESPACE
        )
        ensure_directory(folder)

        timestr = time.strftime('%Y%m%d-%H%M%S')
        out = os.path.join(folder, '{}.p'.format(timestr))

        run_algorithm(
            capital_base=1000,
            data_frequency='daily',
            initialize=initialize,
            handle_data=handle_data,
            analyze=analyze,
            exchange_name='poloniex',
            algo_namespace=NAMESPACE,
            base_currency='usdt',
            start=pd.to_datetime('2017-01-01', utc=True),
            end=pd.to_datetime('2018-03-29', utc=True),
        )
        log.info('saved perf stats: {}'.format(out))

Results

Hopefully this serves as a good launch point for many of you looking to get started!


#2

Is it the Bitwise HOLD 10 Index Fund — without the 2.5% management fee:wink: ?


#3

Hey adi,
thank you a lot for the code. I am a total beginner in coding with python so this helped me a lot to understand python as well as catalyst.

However,I receive the following error:
FileNotFoundError: [Errno 2] No such file or directory: ‘C:\Users\User\.catalyst\data\marketplace\coinmarketcap historical data\rootdirs

As far as I understand, I need to get the data from the marketplace first, but this is the point where I am stuck … can you help me understanding how I download the data into the file first via the enigma marketplace? And do I need to pay for the data or is it still for free?

Thanks a lot in advance, your help would be highly appreciated
Jan


#4

im running into the same problem


#5

Hey sorry for the delay, especially for you @janor!

In order to read in data available on the marketplace into your algorithm, there is a two step process. The first is to make sure you’ve subscribed to the dataset, which indeed costs ENG tokens, the amount specified by the data provider. You can subscribe to a dataset using the command:

catalyst marketplace subscribe --dataset="coinmarketcap historical data"

Now that you’ve subscribed, you ingest the data to your local machine (data is compressed and stored in an efficient manner!) using the command:

catalyst marketplace ingest --dataset="coinmarketcap historical data"

It’s this step that will create that file which is producing the error above. We are releasing some fairly decent updates to the python environment recommended/needed to run the marketplace in a couple days, so be on the lookout for that and use this new release to run the commands above. Let me know if you have any additional questions regarding either of these steps, and feel free to PM me on discord if need be!