Accuracy of Enigma Catalyst Data (Differences found)


#1

Hello,
I have been extracting minute-interval ohlcv data from Binance using Binance’s API before I found Enigma Catalyst. It seems that there is some differences between the data I have extracted and the data from Enigma Catalyst. Usually, “OHLC” varies a little, but volume can vary significantly. Has anyone encountered similar issues?
Also, what is the source (and calculation) of Enigma Catalyst’s Binance minute interval data?
Thank you!


#2

Hi,

The Binance minute data is being retrieved and saved from the exchange using the CCXT library.
Could you please elaborate more on the differences you have observed?

Thanks,

Lena


#3

Hello Lena,

May I know how the Binance minute interval data is being retrieved please? Is it using OHLCV directly from Binance api, or historical trades from Binance api, or processed realtime observations on Binance?

On the differences I have observed, I took 59-minute interval data from Catalyst, and for this example, IXCBTC for 8/12 23:54

then, highlighting the 59 cells of volume from Binance API extracted by myself,
(restriction for new users: I cannot put more than 1 image here)

The sum should have matched the volume, but it didn’t.


#4

The image missing from last post
image


#5

Thanks for the information.

Catalyst retrieves the OHLCV data directly from the exchange (minute interval for minute bundles and day interval for the daily bundles).
I have checked the timestamp you have mentioned - 12-08-2018 23:54 - and the data from the bundles is consistent with the one returned from the exchange.

From the bundles:

                           close        low       open    volume
2018-08-12 23:54:00+00:00  0.0001085  0.0001085  0.0001085      0.00 

Directly from Binance:

[1534118040000,“0.00010850”,“0.00010850”,“0.00010850”,“0.00010850”,“0.00000000”,1534118099999,“0.00000000”,0,“0.00000000”,“0.00000000”,“0”]

Please note that the volume in the bundles is quoted at the base currency, as we are following the CCXT standard.
So Perhaps the inconsistency that you’ve observed in the volume is due to the different currencies?

Lena


#6

Hello Lena, Thank you for looking into this.
However, I believe that the differences are still unresolved, and the differences not being due to differences in currencies.
The screenshots are for 59-minutes interval, which should be the sum of 59 minutes of volume starting 23:54. That explains why in my screenshot I am getting a volume larger than 0.
I observe that minute-by-minute, there are only very occasional differences, so it is more likely to find the differences using larger intervals as it matches the data for every minute in the interval.


#7

If there is a way of reading the ingested data directly into dataframes or CSVs, I might be able to find more instances of the differences.


#8

How are you performing the comparison?

The ingested data is stored using the bcolz 3 format, which is highly optimized for handling columnar data.
Catalyst reads it into dataframes, you can look at this section of the documentation to save them into a CSV.


#9

I have been doing pandas df.to_csv(output) from data.history queries of the ohlcv data, similar to what has been shown in the documentation. For some reason the code in the documentation doesn’t output anything if I try to get ohlcv but I guess that doesn’t matter for now.

If there is code readily available to read bcolz in python that would help save some time too.

I think I have figured out what might be at least part of the cause of the issue is. During Paper Trading, if more than 1000 candlesticks are requested (for binance), it gives bad results for any candlesticks past 1000. I guess it is because of how ccxt and the binance api works. Maybe catalyst can do more than 1 query to binance when more than 1000 candlesticks are required, or even better ingest data into local bcolz regularly and make the part of the queries past 1000 candlesticks local?


#10

Hi,

You are raising several important points and I’ll try to address them.

  1. In paper trading and live, catalyst fetches OHLCV candles directly from each exchange - using CCXT - and is limited by the restrictions imposed by the exchanges APIs.
    Meaning that if you request more than 1000 (the Binance limit) you should not receive more than this amount. The behaviour you are describing is a bug, and I’ve opened a github issue for it. Thanks for reporting this.

  2. Feel free to open a new github issue or submitting a PR for the proposed enhancement to support more candles than the exchange’s limit.

  3. I am not sure what issue you have encountered while trying to save the pricing data to a CSV file but you should be able to save the Pandas Series/DataFrame/Panel returned by data.history or the Float/Series/DataFrame returned by data.current.

Thanks,

Lena


#11

Hi Lena,

I have been testing for the past day on paper trading data and it seems that I am not encountering the issue anymore, now that I wouldn’t query past 1000 candles. I would require 1000 candlesticks for my actual algorithm though, so I will have to find a workaround for the moment.

I must have been unclear in the previous message. Just to clarify, I have managed to output pandas df to csv (which was how I did the data validation), but previously didn’t get the code from the documentation to work (was probably some pycharm issue, but I haven’t tried again since).

Thank you very much for your help, Lena.