Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix default allocator #1127

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

pkufool
Copy link
Collaborator

@pkufool pkufool commented Dec 7, 2022

The purpose of this PR is to fix the default_context.cu so that we can use k2 without Pytorch. There are still some issues to be fixed.

  • NvToolExt.h Not found error (if K2_ENABLE_NVTX=ON). I found that the cuda env was handle by Pytorch before, I am reading FindCUDA.cmake in Pytorch and try to extract some script there.
  • About the cudaStream, I am still a little confued about how k2 or pytorch handle cuda streams, now I always allocate memory on the default stream (i.e. stream 0).
  • The caching strategy (using default now, might need some tuning and benchmark).
  • A global Caching allocator or each Context has its own allocator instance? Which one is better?

Even though this is a small fixes, allocator is the fundamental part, I may miss someting important, hope for your suggestions.

}

cudaStream_t GetCudaStream() const override {
return g_stream_override.OverrideStream(stream_);
#ifdef K2_WITH_CUDA
return g_stream_override.OverrideStream(0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, in Pytorch_Context.cu, we get the stream by c10::cuda::getCurrentCudaStream, I read the source code of Pytorch, if we don't setCurrentCudaStream the return stream is always the default stream (i.e. stream 0).

So, can we use the default stream 0 here? (0 might not be good here, we can add a macro some like kDefaultStream)

#ifdef K2_WITH_CUDA
DeviceGuard guard(gpu_id_);
// the default stream is 0
auto ret = allocator_->DeviceAllocate(&p, bytes);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the allocator allocates memory with default stream 0, we can pass a stream to it as well.

@csukuangfj
Copy link
Collaborator

The purpose of this PR is to fix the default_context.cu so that we can use k2 without Pytorch

What is the use case? That is, how will we use k2 with cuda but without PyTorch?

@pkufool
Copy link
Collaborator Author

pkufool commented Dec 7, 2022

The purpose of this PR is to fix the default_context.cu so that we can use k2 without Pytorch

What is the use case? That is, how will we use k2 with cuda but without PyTorch?

For decoding purpose, if we use onnxruntime or tensorRT as the inference engine, I don't think we should depend on Pytorch at that time. The inference engine might have allocator, though, I am not sure.

@csukuangfj
Copy link
Collaborator

The inference engine might have allocator, though, I am not sure.

In that case, I would suggest creating another context that wraps the allocator from the given framework.

@pkufool
Copy link
Collaborator Author

pkufool commented Dec 7, 2022

The inference engine might have allocator, though, I am not sure.

In that case, I would suggest creating another context that wraps the allocator from the given framework.

Yes, we will see, just to fix the default_context.cu, it is not working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants