-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug following changes to the tensor creation API in torch v2.2 #100
Comments
@jwallwork23, @ElliottKasoar, @TomMelt Would appreciate your thoughts on this as people who are a bit more in touch with c++ than I. The installed version of libtorch on CESM that we are currently trying to build against is v2.2.1. |
From a quick test with libtorch 2.2.1 and gcc 7, 9, and 11, it does indeed seem to be resolved by updating gcc >= 9, which matches the requirement for building from source in the latest release notes, and more generally the requirement for full C++17 compatibility that we noted. It's interesting that there's a slight mismatch between the check and the README, which suggests it should actually be gcc>=9.4.0, and that it doesn't raise the same error I saw. I don't think there's an obvious solution/issue within FTorch itself. |
Oh nice, v2.2.2 was released 2 hrs ago!! The release notes suggest >= 9.0: https://github.com/pytorch/pytorch/releases/tag/v2.2.2 Suggest we resolve this by enforcing gcc >= 9.0 in our CMake. Now I need to try and resolve the issue of cuda wanting <= 11 on Derecho -_- |
I think that's just the release notes stating that for this release the |
A couple lines below there is this comment:
Which, to me, implies they also support the windows c++ compiler.... though I wouldn't quote me on it 😳 |
When building against libtorch v2.2.1 I am able to build FTorch successfully.
I am also able to subsequently build the examples.
However, I get a runtime error:
It seems that this is possibly an issue with the move from torch v2.1 to v2.2 - see here.
This is confirmed if I build against libtorch v2.1 obtained from e.g.(this is v2.1 cuda 11.8 compatible).
It appears that the error is being thrown from here (torch source).
The Torch docs note that the
torch.layout
argument is "beta and subject to change" - see here but I can't see that anything recent has happened.I have tried amending our source to be explicit with the
layout
inTensorOptions
by amending this line to beas indicated from the Tensor creation API docs.
This builds OK as before, but still produces the same error at runtime.
So the current state is I am unsure if this is something we need to address and change, or something that is maybe fixed upstream and coming in v2.2.2 as the closure of this PyTorch issue (maybe??) suggests.
If I understand correctly this can be resolved by requiring
gcc >= 9
.Next steps would be building against the nightly release from https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip to see if there is indeed an upstream fix. However, I can't do this on Derecho as it has a fairly limited software stack. I may try on CSD3 which is broader.
Need either cuda/11.8 + gcc/9-11 or cuda/12.1
The text was updated successfully, but these errors were encountered: