You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to original paper it contains "hourly time series of the electricity consumption of 370 customers" and we can see there are 370 time series in train part with different start dates and common end date. We could also guess feat_static_cat consists of customer's id and is equal to item_id:
from datetime import datetime, timedelta
print('Series count in train', len(dataset.train))
unique_end_dates=[]
for elem in list(dataset.train)[:10]:
end_date=datetime.strptime(str(elem['start']), "%Y-%m-%d %H:%M")+timedelta(hours=len(elem['target']))
unique_end_dates.append(end_date)
print(elem['item_id'], len(elem['target']), elem['start'], end_date, elem['feat_static_cat'], sep="\t")
print(set(unique_end_dates))
Now look an test entity. Series count in test part of dataset is much bigger: 2590.
If my guess "feat_static_cat is customer id" is correct, there are several time series for each customer id with intersecting periods:
print('Series count in test', len(dataset.test))
for elem in list(dataset.test):
if elem['feat_static_cat'][0]==0:
end_date=datetime.strptime(str(elem['start']), "%Y-%m-%d %H:%M")+timedelta(hours=len(elem['target']))
unique_end_dates.append(end_date)
print(elem['item_id'], len(elem['target']), elem['start'], end_date, elem['feat_static_cat'], sep="\t")
The next interval is shifted for 24 hours to previous one, intersecting values are the same for all intersecting intervals and this can be checked by lame the script:
i=0
for elem in list(dataset.test):
if elem['feat_static_cat'][0]==0:
print(elem['item_id'], len(elem['target']), "\t", *list(elem['target'][(7-i)*24:200]))
i+=1
Since I'm going to make my own custom dataset for iTransformers in future, i have several questions. Please give me a link to read or key words to search about this unusual dataset structure:
There are seven intersecting intervals in test dataset for each customer. How to choose this number for my custom ds? What does it mean from math POV? How can i define this parameter while creating gluonts dataset though ListDataset?
In glounts documentation test dataset is described as train dataset + extra dots we are going to test model on. Where can i read more detailed information about requirement to have those strange intersecting intervals?
In glounts documentation test dataset has the same start datetime as train one. Is this not a strict requirement? May i assign different start dates to train and test datasets? If so do we have any restrictions on this dates? Have two datasets have common time intervals at all?
# train dataset: cut the last window of length "prediction_length", add "target" and "start" fields
start_1111111= pd.Period("01-01-2019", freq=freq)
start_2222222= pd.Period("21-12-2019", freq=freq)
train_ds = ListDataset(
[{"target": x, "start": start_1111111} for x in custom_dataset[:, :-prediction_length]],
freq=freq,
)
# test dataset: use the whole dataset, add "target" and "start" fields
test_ds = ListDataset(
[{"target": x, "start": start_2222222} for x in custom_dataset], freq=freq
)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I use electricity_nips dataset to learn gluonts library, but my questions is applied to many other in-build datasets.
dataset = get_dataset("electricity_nips", regenerate=False)
According to original paper it contains "hourly time series of the electricity consumption of 370 customers" and we can see there are 370 time series in train part with different start dates and common end date. We could also guess
feat_static_cat
consists of customer's id and is equal toitem_id
:Now look an test entity. Series count in test part of dataset is much bigger: 2590.
If my guess "feat_static_cat is customer id" is correct, there are several time series for each customer id with intersecting periods:
The next interval is shifted for 24 hours to previous one, intersecting values are the same for all intersecting intervals and this can be checked by lame the script:
Since I'm going to make my own custom dataset for iTransformers in future, i have several questions. Please give me a link to read or key words to search about this unusual dataset structure:
Beta Was this translation helpful? Give feedback.
All reactions