-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNT] windows compatibility #1623
Comments
Libuv issues seems to be introduced by torch 2.4.0
Source: https://pytorch.org/tutorials/intermediate/TCPStore_libuv_backend.html Let me try to figure out how to configure it correctly with libuv... |
So according to the tutorial it should be possible to switch off by:
Other option would be to not test with DDP Strategy, or to downgrade PyTorch.. Unfortunately, I have no windows system right now so I cannot produce a minimal example to perhaps create an issue at pytorch-lightning so that they might expose the relevant parameters |
I do have a windows system, can you be specific what we'd need - just an MRE for the failure, or sth more specific? |
You might check if this is failing with PyTorch 2.4.0 import pytorch_lightning as pl
import numpy as np
import torch
from torch.nn import MSELoss
from torch.optim import Adam
from torch.utils.data import DataLoader, Dataset
import torch.nn as nn
class SimpleDataset(Dataset):
def __init__(self):
X = np.arange(10000)
y = X * 2
X = [[_] for _ in X]
y = [[_] for _ in y]
self.X = torch.Tensor(X)
self.y = torch.Tensor(y)
def __len__(self):
return len(self.y)
def __getitem__(self, idx):
return {"X": self.X[idx], "y": self.y[idx]}
class MyModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.fc = nn.Linear(1, 1)
self.criterion = MSELoss()
def forward(self, inputs_id, labels=None):
outputs = self.fc(inputs_id)
loss = 0
if labels is not None:
loss = self.criterion(outputs, labels)
return loss, outputs
def train_dataloader(self):
dataset = SimpleDataset()
return DataLoader(dataset, batch_size=1000)
def training_step(self, batch, batch_idx):
input_ids = batch["X"]
labels = batch["y"]
loss, outputs = self(input_ids, labels)
return {"loss": loss}
def configure_optimizers(self):
optimizer = Adam(self.parameters())
return optimizer
if __name__ == '__main__':
model = MyModel()
trainer = pl.Trainer(
max_epochs=1,
accelerator="cpu",
strategy="ddp")
trainer.fit(model)
X = torch.Tensor([[1.0], [51.0], [89.0]])
_, y = model(X)
print(y)
Hopefully this is the issue with the strategy |
I can reproduce the error on windows 11,
|
Ok I would propose to open an Issue at PyTorch lightning. And perhaps remove the ddp strategy for testing at least for windows or set the environment variable to use the old store and not the Libuv one. |
I've added a skip here #1631, but haven't closed the issue, as the skip of course does not causally solve this... |
Tests currently fail on windows (
windows-latest
)libuv
issues, wee [MNT] CI matrix extended towindows-latest
#1622. We should check (a) whether this is CI specific or a deeper compatibility issue, and (b) fix it.The text was updated successfully, but these errors were encountered: