Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug following changes to the tensor creation API in torch v2.2 #100

Open
jatkinson1000 opened this issue Mar 27, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@jatkinson1000
Copy link
Member

jatkinson1000 commented Mar 27, 2024

When building against libtorch v2.2.1 I am able to build FTorch successfully.
I am also able to subsequently build the examples.
However, I get a runtime error:

[ERROR]: Unknown layout

It seems that this is possibly an issue with the move from torch v2.1 to v2.2 - see here.
This is confirmed if I build against libtorch v2.1 obtained from e.g.(this is v2.1 cuda 11.8 compatible).

wget https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu118.zip
unzip libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu118.zip

It appears that the error is being thrown from here (torch source).

The Torch docs note that the torch.layout argument is "beta and subject to change" - see here but I can't see that anything recent has happened.
I have tried amending our source to be explicit with the layout in TensorOptions by amending this line to be

torch::TensorOptions().dtype(get_dtype(dtype)).layout(torch::kStrided)).to(get_device(device));

as indicated from the Tensor creation API docs.
This builds OK as before, but still produces the same error at runtime.

So the current state is I am unsure if this is something we need to address and change, or something that is maybe fixed upstream and coming in v2.2.2 as the closure of this PyTorch issue (maybe??) suggests.

If I understand correctly this can be resolved by requiring gcc >= 9.

Next steps would be building against the nightly release from https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip to see if there is indeed an upstream fix. However, I can't do this on Derecho as it has a fairly limited software stack. I may try on CSD3 which is broader.
Need either cuda/11.8 + gcc/9-11 or cuda/12.1

@jatkinson1000
Copy link
Member Author

@jwallwork23, @ElliottKasoar, @TomMelt Would appreciate your thoughts on this as people who are a bit more in touch with c++ than I.

The installed version of libtorch on CESM that we are currently trying to build against is v2.2.1.

@ElliottKasoar
Copy link
Contributor

ElliottKasoar commented Mar 28, 2024

From a quick test with libtorch 2.2.1 and gcc 7, 9, and 11, it does indeed seem to be resolved by updating gcc >= 9, which matches the requirement for building from source in the latest release notes, and more generally the requirement for full C++17 compatibility that we noted.

It's interesting that there's a slight mismatch between the check and the README, which suggests it should actually be gcc>=9.4.0, and that it doesn't raise the same error I saw.

I don't think there's an obvious solution/issue within FTorch itself.

@jatkinson1000
Copy link
Member Author

Oh nice, v2.2.2 was released 2 hrs ago!!

The release notes suggest >= 9.0: https://github.com/pytorch/pytorch/releases/tag/v2.2.2

Suggest we resolve this by enforcing gcc >= 9.0 in our CMake.
And this implies that FTorch has to be built with g++, not icpc?? I know we've been back and forth on this one a bit... #54

Now I need to try and resolve the issue of cuda wanting <= 11 on Derecho -_-

@TomMelt
Copy link
Contributor

TomMelt commented Mar 28, 2024

I think that's just the release notes stating that for this release the gcc requirement has changed to gcc>= 9.0. Not that you have to use gcc.

@TomMelt
Copy link
Contributor

TomMelt commented Mar 28, 2024

A couple lines below there is this comment:

Fix building from source on Windows source MSVC 14.38 - VS 2022 (#122120)

Which, to me, implies they also support the windows c++ compiler.... though I wouldn't quote me on it 😳

@jwallwork23 jwallwork23 added the bug Something isn't working label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants