-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMclr training vs test sets configuration #39
Comments
Could you check out the CSV files containing the features and labels? |
the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold: However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:
Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20 With this configuration the mil training remains under the 0.7 % AUC |
The feature values look strange. There are some abnormal values > 10. Did you use BatchnNorm or InstanceNorm consistently in the training and feature computation? |
I took directly your embedder without training and I passed it to the compute_feats script with InstanceNorm2d since it is the default parameter |
Have you tried model_v0.pth and model_v1.pth, did they also not work? |
not yet... I considered the v2 model as the best one |
|
It turns out that Camleyon16 consists of mixed magnifications, so by experimenting the correct configuration: |
In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it |
I think it is still 20x because the base magnification has ~0.25 micro/pixel which corresponds to 40x for the Aperio scanner (FDA standard). A 20x magnification corresponds to ~0.5 micron/pixel. Camelyon16 uses a mixture of magnifications with different micron/pixel. Notice how their 20x and 40x scanners have almost the same micron/pixel? You will call the "20x" RUMC a "40x" image for UMCU. So better just use the FDA standard. |
Ok! I'm just trying it and inside the folder "temp" the patches are stored inside a "10" folder ( imagining it refers to the magnitude). Anyway, thank you very much for your replies! I'll just try the entire pipeline again with these new patches and I'll tell you the results as soon as possible |
It worked !! But I still have problems :(... I'm opening a new issue for that since it is not relative to the dataset but to the embedder |
Hi @binli123 ,
I'm trying to replicate your results without success on camelyon16. I put the number of classes to 1 and also tried weights online for computing the feats on both training and test set. Even with that I still obtain only 0.7% AUC... So I start thinking about how I organized the data different from you. I downloaded the data from here: https://ftp.cngb.org/pub/gigadb/pub/10.5524/100001_101000/100439/CAMELYON16/
the data is divided into training and test. I used as threeshold 25 for filtering out background. So I used only the training set for training the self-supervised model.
After that, even with the model you published on drive, I extracted feats with the compute_feat script for both training and test(especially with the fusion option). Finally, I modified the train_tcga for considering them as sources for the training set and the test set (270 /130 bags). Even
If instead, I use the features precomputed by you the mil model works. So the problem could be how I split data or how I extract embeddings. What am I missing?
The text was updated successfully, but these errors were encountered: