SIMclr training vs test sets configuration #39

Bontempogianpaolo1 · 2022-05-17T18:04:00Z

I'm trying to replicate your results without success on camelyon16. I put the number of classes to 1 and also tried weights online for computing the feats on both training and test set. Even with that I still obtain only 0.7% AUC... So I start thinking about how I organized the data different from you. I downloaded the data from here: https://ftp.cngb.org/pub/gigadb/pub/10.5524/100001_101000/100439/CAMELYON16/
the data is divided into training and test. I used as threeshold 25 for filtering out background. So I used only the training set for training the self-supervised model.
After that, even with the model you published on drive, I extracted feats with the compute_feat script for both training and test(especially with the fusion option). Finally, I modified the train_tcga for considering them as sources for the training set and the test set (270 /130 bags). Even

If instead, I use the features precomputed by you the mil model works. So the problem could be how I split data or how I extract embeddings. What am I missing?

binli123 · 2022-05-19T15:21:05Z

Could you check out the CSV files containing the features and labels？

Bontempogianpaolo1 · 2022-05-23T12:40:49Z

the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold:

camelyon.csv

normal143.csv

However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:

Slide name	th =19	th=25	your features
tumor_108	29905	402	23263
test_124	6693	3001	2402
tumor_095	39960	1002	31791
normal_137	33396	505	23443
tumor_076	61670	42057	19708

Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20

With this configuration the mil training remains under the 0.7 % AUC
Thanks in advance for your reply

binli123 · 2022-05-23T15:41:52Z

the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold:

camelyon.csv normal143.csv

However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:

Slide name th =19 th=25 your features
tumor_108 29905 402 23263
test_124 6693 3001 2402
tumor_095 39960 1002 31791
normal_137 33396 505 23443
tumor_076 61670 42057 19708
Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20

With this configuration the mil training remains under the 0.7 % AUC Thanks in advance for your reply

The feature values look strange. There are some abnormal values > 10. Did you use BatchnNorm or InstanceNorm consistently in the training and feature computation?

Bontempogianpaolo1 · 2022-05-23T15:52:54Z

I took directly your embedder without training and I passed it to the compute_feats script with InstanceNorm2d since it is the default parameter

binli123 · 2022-05-23T15:57:38Z

model_v2.pth

Have you tried model_v0.pth and model_v1.pth, did they also not work?

Bontempogianpaolo1 · 2022-05-23T16:03:42Z

not yet... I considered the v2 model as the best one

Bontempogianpaolo1 · 2022-05-23T16:20:48Z

screenshot features using model-v0

screenshot features using model-v1

binli123 · 2022-05-23T16:23:40Z

screenshot features using model-v0

screenshot features using model-v1

Those are very different from mine. There should not be values>10, they are all around the same scale. If you are using a newer GPU card please make sure cuda>=11.0, not 10.2

Bontempogianpaolo1 · 2022-05-23T16:43:01Z

sorry.. excel made some errors during the visualization... the real screenshots are these:

model-v0

model-v1

So all the numbers seems under the same scale....

binli123 · 2022-05-23T17:03:13Z

Does your normal_141_42_54.jpg look like this?
My feature csv using v2
normal_141.csv

Bontempogianpaolo1 · 2022-05-23T17:16:26Z

I don't have it... What are the parameters you used for the script deepzoom_tiler.py in the case of Camelyon?

This is my normal_141 48_112.jpeg

this is my tumor_047 101_546.jpeg

Mine seems with an higher magnification maybe?

binli123 · 2022-05-23T18:01:17Z

It turns out that Camleyon16 consists of mixed magnifications, so by experimenting the correct configuration: python deepzoom_tiler.py -m 1 -b 20 -d Camelyon16-pilot -v tif

Bontempogianpaolo1 · 2022-05-23T20:11:12Z

In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it

binli123 · 2022-05-23T20:19:26Z

In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it

I think it is still 20x because the base magnification has ~0.25 micro/pixel which corresponds to 40x for the Aperio scanner (FDA standard). A 20x magnification corresponds to ~0.5 micron/pixel. Camelyon16 uses a mixture of magnifications with different micron/pixel.

Notice how their 20x and 40x scanners have almost the same micron/pixel? You will call the "20x" RUMC a "40x" image for UMCU. So better just use the FDA standard.

Bontempogianpaolo1 · 2022-05-23T20:34:07Z

Ok! I'm just trying it and inside the folder "temp" the patches are stored inside a "10" folder ( imagining it refers to the magnitude). Anyway, thank you very much for your replies! I'll just try the entire pipeline again with these new patches and I'll tell you the results as soon as possible

Bontempogianpaolo1 · 2022-05-25T07:04:16Z

It worked !! But I still have problems :(... I'm opening a new issue for that since it is not relative to the dataset but to the embedder

Bontempogianpaolo1 mentioned this issue May 25, 2022

Camelyon16: pretrained embedders #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMclr training vs test sets configuration #39

SIMclr training vs test sets configuration #39

Bontempogianpaolo1 commented May 17, 2022

binli123 commented May 19, 2022

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

Bontempogianpaolo1 commented May 25, 2022

SIMclr training vs test sets configuration #39

SIMclr training vs test sets configuration #39

Comments

Bontempogianpaolo1 commented May 17, 2022

binli123 commented May 19, 2022

Bontempogianpaolo1 commented May 23, 2022 • edited Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 • edited Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 • edited Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 • edited Loading

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022

binli123 commented May 23, 2022

Bontempogianpaolo1 commented May 23, 2022 • edited Loading

Bontempogianpaolo1 commented May 25, 2022

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading

Bontempogianpaolo1 commented May 23, 2022 •

edited

Loading