Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] compilation error: invalid instruction mnemonic 'vcvtneeph2ps' #22519

Open
saiden89 opened this issue Oct 21, 2024 · 6 comments
Open
Labels
build build issues; typically submitted using template contributions welcome lower priority issues for the core ORT teams

Comments

@saiden89
Copy link

Describe the issue

I am attempting to compile ONNX Runtime on the LUMI supercomputer, a Cray system.

The configuration step is completed without any issues. However, during the compile phase, I encountered problems when using the default CC and cc Cray compiler wrappers, which apply Cray-specific optimizations. To bypass this, I manually specified the AMD compilers (amdclang and amdclang++) instead of the wrappers.

System Details:

  • GPU: AMD MI250X (gfx90a)
  • CPU: AMD EPYC 7A53 "Trento"

Now, I’m encountering a compile-time error possibly related to the AVX512 instruction set: error: invalid instruction mnemonic 'vcvtneeph2ps', but I’m not familiar enough with all this to diagnose the issue. I would appreciate any guidance on how to address this.

Urgency

Not urgent, but would be nice to have since I have a big inference job on a project.

Target platform

AMD MI250X

Build script

The build script relies on some specific modules being loaded to target the correct architecture, as well as loading the correct programming environment. Full reproducibility might be limited because of the exotic nature of the system, but I am more than happy to try myself any suggestions.

module purge

module load PrgEnv-amd
module load rocm/6.0.3
module load craype-accel-amd-gfx90a craype-x86-trento

cd /tmp || exit
git clone --single-branch --branch main --recursive https://github.com/Microsoft/onnxruntime onnxruntime
cd onnxruntime || exit

mamba install rust -y
pip install cmake

./build.sh --config Release \
    --build_wheel \
    --update \
    --build \
    --parallel \
    --use_rocm \
    --rocm_home "$ROCM_PATH" \
    --cmake_extra_defines CMAKE_HIP_ARCHITECTURES=gfx90a \
    --cmake_extra_defines CMAKE_C_COMPILER=amdclang \
    --cmake_extra_defines CMAKE_CXX_COMPILER=amdclang++

pip install build/Linux/Release/dist/*

Error / output

log.txt

Visual Studio Version

No response

GCC / Compiler Version

AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.0.3 24012 af27734ed982b52a9f1be0f035ac91726fc697e4)

@saiden89 saiden89 added the build build issues; typically submitted using template label Oct 21, 2024
@github-actions github-actions bot added the ep:ROCm questions/issues related to ROCm execution provider label Oct 21, 2024
@edgchen1
Copy link
Contributor

Here's the error from the log for convenience:

Building ASM object CMakeFiles/onnxruntime_mlas.dir/tmp/onnxruntime/onnxruntime/core/mlas/lib/x86_64/cvtfp16Avx.S.o
/tmp/onnxruntime/onnxruntime/core/mlas/lib/x86_64/cvtfp16Avx.S:60:9: error: invalid instruction mnemonic 'vcvtneeph2ps'
        vcvtneeph2ps ymm0, ymmword PTR [rdi]
        ^~~~~~~~~~~~

I think this code was added in this PR:
#21183

@eralmual do you have any pointers on how to fix this?

@eralmual
Copy link
Contributor

Hi! Thank you for reaching out!

Seems like the vcvtneeph2ps instruction not recognized by the compiler, I did a quick search and the instruction is supported on Clang since v16.0 as part of the AVX-NE-CONVERT ISA, seems like you are using v17.0 so it should work fine.

If it's not working for Clang in general I can do a quick patch to prevent the compiler error while we find a solution, just let me know.
In the meanwhile i think you should be able to safely delete the if and everything inside at line

if(CMAKE_CXX_COMPILER_VERSION GREATER_EQUAL 13.1 AND NOT(APPLE))
and that should fix the compiler issue.

Let me know if it works!

@snnn
Copy link
Member

snnn commented Oct 22, 2024

It is more about if your Assembler(like gas) can recognize this instruction. We should write a test program to check it: https://cmake.org/cmake/help/latest/module/CheckSourceCompiles.html, instead of detecting compiler name/version.

Contributions are welcomed

@snnn snnn added contributions welcome lower priority issues for the core ORT teams and removed ep:ROCm questions/issues related to ROCm execution provider labels Oct 22, 2024
@saiden89
Copy link
Author

Thank you @eralmual for the suggestion, your proposed solution solves the problem. However, as the compilation continues I am greeted by a lot more errors.

/pfs/lustrep2/projappl/project_465000941/compartments/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/cast_op.cc:295:1: error: explicit instantiation of 'ComputeInternal' that occurs after an explicit specialization has no effect [-Werror,-Winstantiation-after-specialization]
SPECIALIZE_IMPL(MLFloat16)

/pfs/lustrep2/projappl/project_465000941/compartments/onnxruntime/onnxruntime/core/providers/rocm/nn/conv_impl.cu:24:21: error: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
  fast_divmod fdm_c(bias_size);
              ~~~~~ ^~~~~~~~~

Any further insights are deeply appreciated, thanks!

@snnn
Copy link
Member

snnn commented Oct 23, 2024

Please add "--compile_no_warning_as_error" to your build command.

@snnn
Copy link
Member

snnn commented Oct 23, 2024

We don't use clang to build our CUDA code. Therefore we didn't see such warnings. You can help us fix them or suppress them if you'd like. Contributions are welcome. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template contributions welcome lower priority issues for the core ORT teams
Projects
None yet
Development

No branches or pull requests

4 participants