Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[Dummy] Testing branch for #1326 invalid This doesn't seem right
#1330 opened Nov 13, 2024 by timmoon10 Draft
13 tasks
[PyTorch] Integration test for Megatron-LM bug Something isn't working
#1329 opened Nov 13, 2024 by timmoon10 Loading…
9 of 14 tasks
[PyTorch] Fix GQA error message 1.13.0
#1328 opened Nov 12, 2024 by cyanguwa Loading…
8 of 13 tasks
[COMMON/JAX] Support sliding window on THD format
#1327 opened Nov 11, 2024 by zlsh80826 Loading…
6 of 13 tasks
Build with uv instead of just pip
#1324 opened Nov 8, 2024 by jennifgcrl Loading…
5 of 13 tasks
[PyTorch] Fix ONNX export bug with operation-based API bug Something isn't working
#1320 opened Nov 7, 2024 by timmoon10 Loading…
8 of 13 tasks
TP communication overlap: enable the overlap between GEMM chunk at Ho…
#1311 opened Nov 4, 2024 by erhoo82 Loading…
1 of 13 tasks
[PyTorch] Add heuristics for intializing FP8 params enhancement New feature or request
#1300 opened Oct 30, 2024 by timmoon10 Loading…
8 of 13 tasks
Offloading example
#1299 opened Oct 29, 2024 by sanandaraj5597 Loading…
[PyTorch] Fix get_swa_mask() for padding masks
#1281 opened Oct 21, 2024 by cyanguwa Loading…
6 of 13 tasks
[PyTorch] Fix autocast deprecation warnings
#1277 opened Oct 21, 2024 by yaox12 Loading…
13 tasks
attention_mask fill with -inf for UnfusedDotProductAttention
#1268 opened Oct 18, 2024 by Agoniii Loading…
1 of 13 tasks
Draft: reduce cudagraph mem via preoallcations
#1253 opened Oct 15, 2024 by JimmyZhang12 Loading…
13 tasks
fused out correction in CP
#1248 opened Oct 14, 2024 by xiaoyao0115 Loading…
12 tasks
Save CUDA Graph memory by reusing input and output tensors
#1234 opened Oct 9, 2024 by buptzyb Loading…
5 of 13 tasks
Support CUDA Graph for MoE models
#1233 opened Oct 9, 2024 by buptzyb Loading…
6 of 13 tasks
[PyTorch] Improve CP P2P efficiency
#1208 opened Sep 26, 2024 by yenchenlin Loading…
1 of 6 tasks
Draft: Use fused push_send_recv kernel for TP AG and RS overlaps
#1200 opened Sep 24, 2024 by erhoo82 Loading…
13 tasks
ProTip! Add no:assignee to see everything that’s not assigned.