Debugging Segmentation Task #62

JD-ETH · 2024-01-05T04:34:45Z

JD-ETH
Jan 5, 2024

I have implemented a segmentation model output and gradient, since the max entropy loss is simply a sum over every pixel's loss, which was a straight forward extension of the proposed multi-class loss. However, in practice, I got results with seemingly random scores that I couldn't quite debug. Observations I made:

the scores are very low in 10^-5 range
rerunning the attribution on same checkpoints result in completely different attribution

I checked the data loading is ordered, checkpoint loading is correct, and I'm averaging over 80 checkpoints for a standard segmentation task. Is there something more I could look into?

Answered by kristian-georgiev

Jan 17, 2024

Thanks for bringing this up! Summing over all pixels' loss seems like a reasonable approach. I'm not sure I can offer much help without more details about the data/code/etc. I'm putting a few guesses below :)

Based on the first point, there might possibly be numerical issues with computing the attribution scores. Often that happens during the matrix inverse computation of $(X^\top X)^{-1}$. A simple check to see if that's the culprit is to set the lambda_reg parameter of your TRAKer instance to some positive number (e.g., you can start with lambda_reg=1e-4).

Another possible source of instability could be the out_to_loss_grad method. A simple check to test that is to simply skip it, or mo…

View full answer

kristian-georgiev · 2024-01-17T19:12:41Z

kristian-georgiev
Jan 17, 2024
Maintainer

Thanks for bringing this up! Summing over all pixels' loss seems like a reasonable approach. I'm not sure I can offer much help without more details about the data/code/etc. I'm putting a few guesses below :)

Based on the first point, there might possibly be numerical issues with computing the attribution scores. Often that happens during the matrix inverse computation of $(X^\top X)^{-1}$. A simple check to see if that's the culprit is to set the lambda_reg parameter of your TRAKer instance to some positive number (e.g., you can start with lambda_reg=1e-4).

Another possible source of instability could be the out_to_loss_grad method. A simple check to test that is to simply skip it, or modify it to return torch.ones.

Hope that helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging Segmentation Task #62

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Debugging Segmentation Task #62

JD-ETH Jan 5, 2024

Replies: 1 comment

kristian-georgiev Jan 17, 2024 Maintainer

JD-ETH
Jan 5, 2024

kristian-georgiev
Jan 17, 2024
Maintainer