You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using Cross Entropy, does it use the last layer embedding for the the top-k retrieval evaluation ?
Yes indeed.
Do you see any benefits of using multi-label cross entropy head loss vs a series of single labels losses ?
I guess training a classifier with multi-label cross-entropy makes it easier to make predictions (because unambiguous) downstream. Now if the end-task is not classification but representation learning, it's a bit trickier. In both cases, the two essential terms we discuss in the paper (tightness and contrastive) are present. Computationally speaking, I believe both have similar time/memory complexity. From a purely geometrical intuition, I would say treating the problem as a series of binary problems is "harder" in the sense that it is less permissive regarding the configurations it allows in the feature space (c.f. my dirty drawing below). Therefore, treating the problem as 1 vs all may take longer to converge, but may lead to even better clustered regions. It would be interesting to try this :)
Hello,
On the paper, just have few questions.
Using Cross Entropy, does it use the last layer embedding for the the top-k retrieval evaluation ?
Do you see any benefits of using
multi-label cross entropy head loss vs
a series of single labels losses ?
thx and happy new year
The text was updated successfully, but these errors were encountered: