[Fix] fix initialization of ref_llm for full param dpo training with zero-3 #778

xu-song · 2024-06-17T11:38:27Z

ref_llm can not be instantiated while full param dpo training with zero-3.

This pull request enable instantiation of ref_llm with the following config.

model = dict(
    type=DPO,
    loss_type=dpo_loss_type,
    use_varlen_attn=use_varlen_attn,
    beta=loss_beta,
    label_smoothing=label_smoothing,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        trust_remote_code=True,
        torch_dtype=torch.float16,
        ),
    ref_llm=dict(  ##### initialization of ref_llm #######
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        trust_remote_code=True,
        torch_dtype=torch.float16,
        ),
    )

RangiLyu · 2024-06-18T03:19:08Z

Thank you for your contribution! It appears that this fix might not fully address the issue when using var_len_attention. I noticed another solution in PR #781, which seems to be a more comprehensive approach.

xu-song · 2024-06-18T09:48:58Z

@RangiLyu I checked the PR you mentioned above. However it's not a good idea to create another SupervisedFinetune instance. It has some irrelevant operations.
Following your advice, var_len_attention has been supported in this pr.

pppppM · 2024-07-09T00:45:51Z

@xu-song Great PR! There are conflicts that need to be resolved.

xu-song · 2024-07-09T02:25:15Z

conflicts have been resolved

xu-song added 2 commits June 17, 2024 19:37

Fix initialization of ref_llm

42b641f

Update dpo.py

c24f33e

pppppM assigned hhaAndroid Jun 17, 2024

Update dpo.py

cdf3619

xu-song changed the title ~~[Fix] fix initialization of ref_llm~~ [Fix] fix initialization of ref_llm for full param dpo training with zero-3 Jun 18, 2024

xu-song added 2 commits June 18, 2024 17:46

Update dpo.py

3e8178f

Update sft.py

ac3cbee

xu-song added 2 commits June 18, 2024 17:49

Update dpo.py

ba95068

Update dpo.py

5a16fa2

xu-song added 2 commits July 9, 2024 09:59

Merge branch 'main' into patch-1

924831a

Update dpo.py

7730933

HIT-cwh approved these changes Jul 19, 2024

View reviewed changes

HIT-cwh merged commit ba7afc7 into InternLM:main Jul 19, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] fix initialization of ref_llm for full param dpo training with zero-3 #778

[Fix] fix initialization of ref_llm for full param dpo training with zero-3 #778

xu-song commented Jun 17, 2024 •

edited

Loading

RangiLyu commented Jun 18, 2024

xu-song commented Jun 18, 2024 •

edited

Loading

pppppM commented Jul 9, 2024

xu-song commented Jul 9, 2024 •

edited

Loading

[Fix] fix initialization of ref_llm for full param dpo training with zero-3 #778

[Fix] fix initialization of ref_llm for full param dpo training with zero-3 #778

Conversation

xu-song commented Jun 17, 2024 • edited Loading

RangiLyu commented Jun 18, 2024

xu-song commented Jun 18, 2024 • edited Loading

pppppM commented Jul 9, 2024

xu-song commented Jul 9, 2024 • edited Loading

xu-song commented Jun 17, 2024 •

edited

Loading

xu-song commented Jun 18, 2024 •

edited

Loading

xu-song commented Jul 9, 2024 •

edited

Loading