Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensembling over layers #259

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open

Ensembling over layers #259

wants to merge 79 commits into from

Conversation

lauritowal
Copy link
Collaborator

@lauritowal lauritowal commented Jun 19, 2023

Ensembling from mid to last layer

@CLAassistant
Copy link

CLAassistant commented Jun 19, 2023

CLA assistant check
All committers have signed the CLA.

@@ -41,6 +41,73 @@ def to_dict(self, prefix: str = "") -> dict[str, float]:
return {**auroc_dict, **cal_acc_dict, **acc_dict, **cal_dict}


def calc_auroc(y_logits, y_true, ensembling, num_classes):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add annotation

@lauritowal lauritowal changed the title Ensembling layer Ensembling over layers Jun 19, 2023
Copy link
Collaborator Author

@lauritowal lauritowal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests run forever on my machine. Need to check what is wrong there.

Copy link
Collaborator

@AlexTMallen AlexTMallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly just fix the handling of the multidataset case

❯ elk elicit gpt2 imdb amazon_polarity --max_examples 10 300 --debug --num_gpus 1

elk/metrics/eval.py Show resolved Hide resolved
elk/metrics/eval.py Outdated Show resolved Hide resolved
y_logits_collection.append(y_logits)

# get logits and ground_truth from middle to last layer
middle_index = len(layer_outputs) // 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in some ways I think we should allow the layers over which we ensemble to be configurable. E.g. sometimes the last layers perform worse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it makes sense to make it configurable. However, I'm curious, how would you decide which layers to pick?

middle_index = len(layer_outputs) // 2
y_logits_stacked = torch.stack(y_logits_collection[middle_index:])
# layer prompt_ensembling of the stacked logits
y_logits_stacked_mean = torch.mean(y_logits_stacked, dim=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the ensembling is done by taking the mean over layers, rather than concatenating. This isn't super clear from comments/docstrings, and hard to tell from reading the code because the shapes aren't commented.

elk/metrics/eval.py Show resolved Hide resolved
from enum import Enum


class PromptEnsembling(Enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine

@@ -53,7 +54,7 @@ def apply_to_layer(
layer: int,
devices: list[str],
world_size: int,
) -> dict[str, pd.DataFrame]:
) -> tuple[dict[str, pd.DataFrame], list[dict]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here regarding return type

elk/run.py Outdated
try:
for df_dict in tqdm(mapper(func, layers), total=len(layers)):
for k, v in df_dict.items():
for df_dict, layer_output in tqdm(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't write all the appropriate lines for

❯ elk elicit gpt2 imdb amazon_polarity --max_examples 10 300 --debug --num_gpus 1

There should be evaluation results for both imdb and amazon_polarity in the layer_ensembling_results.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants