-
Notifications
You must be signed in to change notification settings - Fork 326
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Integration test for Megatron-LM
bug
Something isn't working
#1329
opened Nov 13, 2024 by
timmoon10
Loading…
9 of 14 tasks
[COMMON/JAX] Support sliding window on THD format
#1327
opened Nov 11, 2024 by
zlsh80826
Loading…
6 of 13 tasks
[PyTorch] Remove special handling for FP8 params in FP8 recipe infrastructure
#1326
opened Nov 9, 2024 by
timmoon10
Loading…
8 of 13 tasks
[PyTorch] Fix ONNX export bug with operation-based API
bug
Something isn't working
#1320
opened Nov 7, 2024 by
timmoon10
Loading…
8 of 13 tasks
TP communication overlap: enable the overlap between GEMM chunk at Ho…
#1311
opened Nov 4, 2024 by
erhoo82
Loading…
1 of 13 tasks
Improving communication overlap for the case of multi kernel queue usage
#1308
opened Nov 2, 2024 by
youngeunkwon0405
Loading…
13 tasks
[PyTorch] Add heuristics for intializing FP8 params
enhancement
New feature or request
#1300
opened Oct 30, 2024 by
timmoon10
Loading…
8 of 13 tasks
[PyTorch] Fix get_swa_mask() for padding masks
#1281
opened Oct 21, 2024 by
cyanguwa
Loading…
6 of 13 tasks
attention_mask fill with -inf for UnfusedDotProductAttention
#1268
opened Oct 18, 2024 by
Agoniii
Loading…
1 of 13 tasks
Draft: reduce cudagraph mem via preoallcations
#1253
opened Oct 15, 2024 by
JimmyZhang12
Loading…
13 tasks
Save CUDA Graph memory by reusing input and output tensors
#1234
opened Oct 9, 2024 by
buptzyb
Loading…
5 of 13 tasks
Draft: Use fused push_send_recv kernel for TP AG and RS overlaps
#1200
opened Sep 24, 2024 by
erhoo82
Loading…
13 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.