-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch IterableDataset are not fully supported #594
Comments
Didn't know about this one. I always wondered whether Datasets really needed Splitting the way skorch (or rather, sklearn) does it, can't be easily supported with We could think about a wrapper class that allows to split |
Did you mean
If I was a user of
I agree. Currently, is there an issue with passing an class MyDataset(torch.utils.data.IterableDataset):
def __init__(self, X, y):
super().__init__()
self.X = X
self.y = y
self._i = -1
def _generator(self):
if self._i == len(X):
raise StopIteration()
self._i = self._i + 1
yield self.X[self._i], self.y[self._i]
def __iter__(self):
return self._generator()
X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)
dataset = MyDataset(X, y)
net = NeuralNetClassifier(ClassifierModule, train_split=None)
net.fit(dataset, y=None) Moving forward, we can raise an error when train_split is not None and |
I meant
I agree to both.
I don't think it's too hacky. Maybe this could be added to
I agree to this too. |
With PyTorch 1.2.0 came
IterableDataset
which only implements__iter__
but no__len__
and certainly no__getitem__
. This is definitely a problem since we are usingSubset
to split the input dataset and wraps the original dataset, introduces__getitem__
and delegates the call to the wrapped dataset - which doesn't implement that method since it is iterable.The simplest solution to this is to not split
IterableDataset
in any way. What do you think?The text was updated successfully, but these errors were encountered: