Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding additional dataset for training #40

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

ferdinandl007
Copy link

@ferdinandl007 ferdinandl007 commented Apr 25, 2022

I want to train spacetimeforme on a custom data said however, I'm struggling a little bit to understand the training code, as there is not much documentation yet.
I have the following columns which I wanna predict for each next time step for a university project.

[
'ETH_open',
'ETH_high',
'ETHT_low',
'ETH_close',
'Volume BTC',
'Volume USDT',
'ETH_tradecount',
'BTC_open',
'BTC_high',
'BTC_low',
'BTC_close', 'BTC_tradecount',
'LTC_open',
'LTC_high',
'LTC_low',
'LTC_close',
'Volume LTC',
'LTC_tradecount'
]

Any help would be greatly appreciated!

Sample notebook can be found here, which creates the dataset,
https://colab.research.google.com/drive/19PKi0gQvVbtI7eZOELNby1mveiSXMhNX?usp=sharing

@jakegrigsby jakegrigsby self-assigned this Apr 25, 2022
@ferdinandl007
Copy link
Author

@jakegrigsby thanks for picking up the issue, excited to see it work, let me know if I can be of assistance in any kind!

@jakegrigsby
Copy link
Member

Hey @ferdinandl007, in general our training script was meant more as a way to replicate the paper results than to help on new datasets. It's all pretty hardcoded as you can see. I figured people would typically have their own training/eval loops and use the spacetimeformer_model.nn.Spacetimeformer pytorch module directly.

That being said you should probably be able to make a csv and hack the training script in a very similar way to how the asos weather dataset is currently handled. Only catch I can think of is you may need to name your csv's time column specifically "Datetime".

Can you provide the raw csv here? I'll try to run it if I have time in the next few days.

@ferdinandl007
Copy link
Author

@jakegrigsby thank you for your reply! I got it working now the issue was with missing values in the data set.
I have one question though how can I target the prediction plots to only one particular column, however I still want to predict all other columns to give more training signal.

@ferdinandl007
Copy link
Author

ferdinandl007 commented Apr 27, 2022

I now got a running pretty well, however the results I'm getting a still not particularly great would you have an idea what the issue could be?
python train.py spacetimeformer crypto --run_name spatiotemporal_temporal_crypto_loss --start_token_len 3 --context_points 160 --target_points 40 --start_token_len 8 --grad_clip_norm 1 --gpus 0 1 --batch_size 64 --d_model 200 --d_ff 800 --enc_layers 3 --dec_layers 3 --local_self_attn none --local_cross_attn none --base_lr 1e-3 --l2_coeff 1e-2 --dropout_emb .1 --time_resolution 1 --dropout_ff .2 --n_heads 8 --trials 1 --embed_method temporal --early_stopping --wandb --attn_plot --plot

I also attached a sample data set.

Kind regards,
crypto_converted-5.csv

@jakegrigsby
Copy link
Member

@ferdinandl007 I added a demo of how to use this dataset and plot specific variables in #41 .

Was only able to mess around with your dataset for a couple of runs but I was seeing a lot of overfitting. Easy demo command to test:

python train.py lstm crypto --teacher_forcing_anneal_steps 400 --context_points 200 --target_points 40 --run_name lstm_crypto --gpus 0 --batch_size 64 --wandb --plot

@ferdinandl007
Copy link
Author

ferdinandl007 commented Apr 30, 2022

I constructed a lot of data set now with key indicator such a sentiment reaching two years in hourly intervals.

New features

'ETH_open', 'ETH_high', 'ETHT_low', 'ETH_close',
'Volume ETH', 'Volume USDT', 'ETH_tradecount', 'BTC_open', 'BTC_high',
'BTC_low', 'BTC_close', 'Volume BTC', 'BTC_tradecount', 'LINK_open',
'LINK_high', 'LINK_low', 'LINK_close', 'Volume LINK', 'LINK_tradecount',
'EOS_open', 'EOS_high', 'EOS_low', 'EOS_close', 'Volume EOS',
'EOS_tradecount', 'XMR_open', 'XMR_high', 'XMR_low', 'XMR_close',
'Volume XMR', 'XMR_tradecount', 'NEO_open', 'NEO_high', 'NEO_low',
'NEO_close', 'Volume NEO', 'NEO_tradecount', 'LTCUSDT_open',
'LTCUSDT_high', 'LTCUSDT_low', 'LTCUSDT_close', 'Volume LTC',
'LTCUSDT_tradecount', 'sntiments'

Maybe this might help with the over fitting it also includes more cryptocurrencies which are known to be in relation to each other in terms of movement.

crypto_converted.csv.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants