Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes that should not cause crash, but do. #525

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danpovey
Copy link
Collaborator

No description provided.

@danpovey
Copy link
Collaborator Author

danpovey commented Dec 18, 2020

When I make this change in the code, when running build/bin/cu_intersect_test I get a crash on this line:

K2_CHECK_LT(backward_loglike, -src_state_forward_loglike + 2.0);

with output like the following. I have discovered by printing stuff out that it's due to atomicMax() not working. It appears that the compiler is somehow picking up the __host__ version of atomicMax() that I have declared, instead of the CUDA one.
I am using the CUDA toolkit version 10.1.

I am creating this pull request to demonstrate the issue to the NVidia guys (I think it is a compiler problem).

[ -100.39 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf ]
] }
[F] [F] [F] [F] /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signed int)->void::operator()(signed int)->void:1045 /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signed int)->void::op\
erator()(signed int)->void:1045 /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signed int)->void::operator()(signed int)->void:1045 /ceph-dan/k2/k2/csrc/intersect_pruned.cu:lambda [](signe\
d int)->void::operator()(signed int)->void:1045 block:[0,0,0], thread: [8,0,0] block:[0,0,0], thread: [42,0,0] block:[0,0,0], thread: [43,0,0] block:[0,0,0], thread: [61,0,0] Check failed: Che\
ck failed: Check failed: Check failed: backward_loglikebackward_loglikebackward_loglikebackward_loglike    <<<<    -src_state_forward_loglike + 2.0-src_state_forward_loglike + 2.0-src_state_fo\
rward_loglike + 2.0-src_state_forward_loglike + 2.0 ( ( ( (220.366562149.052094158.026581159.356491 vs.  vs.  vs.  vs. 220.320801147.468399147.468399155.392899) ) ) )

@luitjens
Copy link

Have you tried with Cuda 11.1 to see if the issue persists?

@zhu-han
Copy link

zhu-han commented Dec 29, 2020

"build/bin/cu_intersect_test" gets a crash with CUDA 11.1 . The output of "cu_intersect_test" is like the following:
cu_intersect_test.log

@danpovey
Copy link
Collaborator Author

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants