Bug following changes to the tensor creation API in torch v2.2 #100

jatkinson1000 · 2024-03-27T00:00:05Z

When building against libtorch v2.2.1 I am able to build FTorch successfully.
I am also able to subsequently build the examples.
However, I get a runtime error:

[ERROR]: Unknown layout

It seems that this is possibly an issue with the move from torch v2.1 to v2.2 - see here.
This is confirmed if I build against libtorch v2.1 obtained from e.g.(this is v2.1 cuda 11.8 compatible).

wget https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu118.zip
unzip libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu118.zip

It appears that the error is being thrown from here (torch source).

The Torch docs note that the torch.layout argument is "beta and subject to change" - see here but I can't see that anything recent has happened.
I have tried amending our source to be explicit with the layout in TensorOptions by amending this line to be

torch::TensorOptions().dtype(get_dtype(dtype)).layout(torch::kStrided)).to(get_device(device));

as indicated from the Tensor creation API docs.
This builds OK as before, but still produces the same error at runtime.

So the current state is I am unsure if this is something we need to address and change, or something that is maybe fixed upstream and coming in v2.2.2 as the closure of this PyTorch issue (maybe??) suggests.

If I understand correctly this can be resolved by requiring gcc >= 9.

Next steps would be building against the nightly release from https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip to see if there is indeed an upstream fix. However, I can't do this on Derecho as it has a fairly limited software stack. I may try on CSD3 which is broader.
Need either cuda/11.8 + gcc/9-11 or cuda/12.1

The text was updated successfully, but these errors were encountered:

jatkinson1000 · 2024-03-27T00:05:09Z

@jwallwork23, @ElliottKasoar, @TomMelt Would appreciate your thoughts on this as people who are a bit more in touch with c++ than I.

The installed version of libtorch on CESM that we are currently trying to build against is v2.2.1.

ElliottKasoar · 2024-03-28T01:01:24Z

From a quick test with libtorch 2.2.1 and gcc 7, 9, and 11, it does indeed seem to be resolved by updating gcc >= 9, which matches the requirement for building from source in the latest release notes, and more generally the requirement for full C++17 compatibility that we noted.

It's interesting that there's a slight mismatch between the check and the README, which suggests it should actually be gcc>=9.4.0, and that it doesn't raise the same error I saw.

I don't think there's an obvious solution/issue within FTorch itself.

jatkinson1000 · 2024-03-28T01:09:04Z

Oh nice, v2.2.2 was released 2 hrs ago!!

The release notes suggest >= 9.0: https://github.com/pytorch/pytorch/releases/tag/v2.2.2

Suggest we resolve this by enforcing gcc >= 9.0 in our CMake.
And this implies that FTorch has to be built with g++, not icpc?? I know we've been back and forth on this one a bit... #54

Now I need to try and resolve the issue of cuda wanting <= 11 on Derecho -_-

TomMelt · 2024-03-28T17:41:22Z

I think that's just the release notes stating that for this release the gcc requirement has changed to gcc>= 9.0. Not that you have to use gcc.

TomMelt · 2024-03-28T17:42:24Z

A couple lines below there is this comment:

Fix building from source on Windows source MSVC 14.38 - VS 2022 (#122120)

Which, to me, implies they also support the windows c++ compiler.... though I wouldn't quote me on it 😳

jatkinson1000 mentioned this issue Mar 27, 2024

Potential pytorch incompatibility #37

Open

jwallwork23 added the bug Something isn't working label Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug following changes to the tensor creation API in torch v2.2 #100

Bug following changes to the tensor creation API in torch v2.2 #100

jatkinson1000 commented Mar 27, 2024 •

edited

Loading

jatkinson1000 commented Mar 27, 2024

ElliottKasoar commented Mar 28, 2024 •

edited

Loading

jatkinson1000 commented Mar 28, 2024

TomMelt commented Mar 28, 2024

TomMelt commented Mar 28, 2024

Bug following changes to the tensor creation API in torch v2.2 #100

Bug following changes to the tensor creation API in torch v2.2 #100

Comments

jatkinson1000 commented Mar 27, 2024 • edited Loading

jatkinson1000 commented Mar 27, 2024

ElliottKasoar commented Mar 28, 2024 • edited Loading

jatkinson1000 commented Mar 28, 2024

TomMelt commented Mar 28, 2024

TomMelt commented Mar 28, 2024

jatkinson1000 commented Mar 27, 2024 •

edited

Loading

ElliottKasoar commented Mar 28, 2024 •

edited

Loading