[Question] What is the memory footprint of te.Linear() weights ? #239

vince62s · 2023-05-16T17:24:58Z

vince62s
May 16, 2023

Does each parameter take 1 byte, 2 bytes, 4 bytes ?

it does not seem so clear in terms of memory efficiency.

May 19, 2023

Currently the FP8 weights are only internal and so the actual model weights take the same amount of memory as without FP8 execution (e.g. 2B for FP16+FP8 training). We are working together with Meta on exposing FP8 tensors in pyTorch, which will enable storing only the FP8 weights, resulting in memory savings over the base model as well as e.g. faster communication in FSDP, but it is currently in the PoC stage.

View full answer

ptrendx · 2023-05-19T21:21:10Z

ptrendx
May 19, 2023
Maintainer

Currently the FP8 weights are only internal and so the actual model weights take the same amount of memory as without FP8 execution (e.g. 2B for FP16+FP8 training). We are working together with Meta on exposing FP8 tensors in pyTorch, which will enable storing only the FP8 weights, resulting in memory savings over the base model as well as e.g. faster communication in FSDP, but it is currently in the PoC stage.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What is the memory footprint of te.Linear() weights ? #239

{{title}}

Replies: 1 comment

{{title}}

Select a reply

[Question] What is the memory footprint of te.Linear() weights ? #239

vince62s May 16, 2023

Replies: 1 comment

ptrendx May 19, 2023 Maintainer

vince62s
May 16, 2023

ptrendx
May 19, 2023
Maintainer