Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What parameters are trained in linear probe (LP) exactly? #10

Open
zhilif opened this issue May 23, 2023 · 0 comments
Open

What parameters are trained in linear probe (LP) exactly? #10

zhilif opened this issue May 23, 2023 · 0 comments

Comments

@zhilif
Copy link

zhilif commented May 23, 2023

Adopting your notations in figure 5(d), you initialize the final linear layer W with the text embedding V, and you also keep the visual projection W_v. When you do LP, do you train both V and W_v, or just W? From your code, it seems that you turned off require_grad for visual.proj, so I guess you only trained W.

In section 5.1, you reached the conclusion that "with the proposed language-init method, one can ensure that few-shot performance is always better than zero-shot". This is not precise because if you only train W, the optimization problem becomes convex and the initialization should not matter (of course this is up to the optimizer's inductive bias). So I suspect the reason why you have LP better than zero-shot in figure 6 is due to the inherent regularization from the optimizer (for example, early stopping ~ l2 regularization). For the CLIP paper, they used L-BFGS to solve LP so initialization really didn't matter to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant