Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples in the integration test reported segmentation fault on my machine #4635

Closed
16 tasks
xuan112358 opened this issue Jul 10, 2024 · 16 comments
Closed
16 tasks
Assignees
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT

Comments

@xuan112358
Copy link
Collaborator

Describe the bug

When running ABACUS with code after commit #4622, I found that I can't run examples in integrate test.
They all reported segmentation fault.
Take “tests/integrate/201_NO_15_pseudopots” for example, the log.txt was:
image

Expected behavior

No response

To Reproduce

No response

Environment

compiler: -DMPI_CXX_COMPILER=mpiicpc -DCMAKE_CXX_COMPILER=icpc
build with deepks and libri

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
@dyzheng
Copy link
Collaborator

dyzheng commented Jul 10, 2024

What is your compiling command?

@xuan112358
Copy link
Collaborator Author

@dyzheng
cmake -B build -DCMAKE_INSTALL_PREFIX=./
-DELPA_DIR=/home/xuan/03_library/elpa/elpa-2021.05.002 -DELPA_INCLUDE_DIRS=/home/xuan/03_library/elpa/elpa-2021.05.002/include/elpa-2021.05.002
-DCEREAL_INCLUDE_DIR=/home/xuan/03_library/cereal/include
-DMPI_CXX_COMPILER=mpiicpc -DCMAKE_CXX_COMPILER=icpc
-DTorch_DIR=/home/xuan/03_library/libtorch/share/cmake/Torch/
-Dlibnpy_INCLUDE_DIR=/home/xuan/03_library/libnpy-new/include -DENABLE_DEEPKS=1 -DLibxc_DIR=/home/xuan/03_library/libxc/libxc-5.2.3 -DENABLE_LIBRI=1
&& cmake --build build -j30 && cmake --install build

@Cstandardlib
Copy link
Collaborator

I will check with deepks and libri on my machine right away.

@dyzheng
Copy link
Collaborator

dyzheng commented Jul 10, 2024

I have tested both in GNU and Intel environment and cannot rerun your error.

@dyzheng
Copy link
Collaborator

dyzheng commented Jul 10, 2024

Intel environment can rerun your error with LibRI and DeePKS

@dyzheng
Copy link
Collaborator

dyzheng commented Jul 10, 2024

Error comes from DeePKS with Intel compiler.

@dyzheng
Copy link
Collaborator

dyzheng commented Jul 10, 2024

Can you try the newest commit, I test it with no error.

@mohanchen mohanchen added the Bugs Bugs that only solvable with sufficient knowledge of DFT label Jul 10, 2024
@xuan112358
Copy link
Collaborator Author

@dyzheng
I can't compile with the command above after commit #4613
The error is :
image

@dyzheng
Copy link
Collaborator

dyzheng commented Jul 11, 2024

@xuan112358 can you rerun the toolchain to rebuild libtorch in your machine?

@xuan112358
Copy link
Collaborator Author

@dyzheng It seems like I can just download the LibTorch zip file but not need to build it? I downloaded the new LibTorch v2.3.1 in https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.3.1%2Bcpu.zip and tried again. The error was the same.

@caic99
Copy link
Member

caic99 commented Jul 11, 2024

Duplicated with #4636. Let's track the problem there.

@Cstandardlib
Copy link
Collaborator

Build #4622 with IntelLLVM 2024.1.0 and encountered a compilation error
image
Build latest with IntelLLVM 2024.1.0
Warnings are generated by cmake as follows:
image
and build failed with error
image
Building command:
CXX=mpiicpx cmake -B build -DBUILD_TESTING=1 -DENABLE_LIBRI=ON -DENABLE_DEEPKS=1 -DTorch_DIR=~/develop/libtorch/share/cmake/Torch/ -Dlibnpy_INCLUDE_DIR=~/develop/libnpy/include
with https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.3.1%2Bcpu.zip

@caic99
Copy link
Member

caic99 commented Jul 11, 2024

@Cstandardlib Thanks for testing, and would you please apply the modifications in #4644 ?

@Cstandardlib
Copy link
Collaborator

@caic99 Thanks! I'll update my test results in #4636.

@caic99
Copy link
Member

caic99 commented Jul 12, 2024

Hi @xuan112358 ,
I've updated the building process. Would you try using the latest codes to compile and run it again? Thanks.

@xuan112358
Copy link
Collaborator Author

@caic99 I‘ve tried it and all worked well now ! Also , I've tried to comile and run with Intel oneAPI toolkit 2024 and it worked well too. Thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT
Projects
None yet
Development

No branches or pull requests

5 participants