-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add thread local cache for brgemm #350
base: main
Are you sure you want to change the base?
Conversation
@@ -137,24 +150,38 @@ void dnnl_brgemm_tilerelease() { | |||
void dnnl_brgemm_execute(int64_t kernel_idx, void *A, uint64_t A_offset, | |||
void *B, uint64_t B_offset, void *C, uint64_t C_offset, | |||
int num) { | |||
auto it = tl_cache.find(kernel_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better not define tl_cache as global static, define it here as a function static.
if (it != tl_cache.end()) { | ||
desc_ptr = &it->second.desc; | ||
kernel = it->second.kernel; | ||
} else { | ||
read_lock_guard_t g(g_brgemm_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's thread local, do we still need this lock?
struct brgemmInfo { | ||
brgemm_desc_t desc; | ||
brgemm_kernel_t *kernel; | ||
char *palette; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not to store raw pointer managed by smart pointer, maybe change to shared_ptr
?
@@ -57,6 +57,14 @@ static std::vector<brgemm_desc_t> g_brgemm_desc_list; | |||
static std::vector<brgemm_kernel_t *> g_brgemm_kernel_list; | |||
static std::vector<std::unique_ptr<char[]>> g_brgemm_palette; | |||
|
|||
struct brgemmInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest change the name to brgemm_cache_info_t
to align with naming convention in this file, and avoid confusion
struct brgemm_cache_info_t { | ||
std::shared_ptr<brgemm_desc_t> desc; | ||
std::shared_ptr<brgemm_kernel_t> kernel; | ||
std::shared_ptr<char> palette; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use shared_ptr<char[]>
for palette
, and for desc
and kernel
we don't need to change, since they are not managed by smart ptr, and storing ptr to vector elements is dangerous as well
std::shared_ptr<char> palette; | ||
|
||
brgemm_cache_info_t() = default; | ||
brgemm_cache_info_t(brgemm_desc_t *d, brgemm_kernel_t *k, char *p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we need to change the unique_ptr
in global palette pool to shared_ptr
as well, and pass the shared_ptr
of palette here for construction
struct brgemm_cache_info_t { | ||
std::shared_ptr<brgemm_desc_t> desc; | ||
std::shared_ptr<brgemm_kernel_t> kernel; | ||
std::shared_ptr<char> palette; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we created brgemm_cache_info_t
to store desc/kernel/palette together thread locally, would it be a better manner to also use brgemm_cache_info_t
for global management?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea, we can unify the struct used in both thead-local and global
Tracking Issue#323