Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model training based on pretrained classification model does not work as expected #916

Open
1 task done
curtinmjc opened this issue Nov 6, 2024 · 5 comments
Open
1 task done
Labels
classify Image Classification issues, PR's HUB Ultralytics HUB issues question Further information is requested

Comments

@curtinmjc
Copy link

Search before asking

Question

Background:

  1. I have a large dataset that exhausts available Google Colab System RAM even with the A100 runtime so I decided to try splitting the dataset into multiple pieces.
  2. To eventually have a model that includes the entire dataset I thought I could use the Ultralytics HUB capability of basing a new model on a previously trained model.

What I did and learned:

  1. In Train a Model - Step 2 of 3 (1st screenshot) I selected the Custom tab and picked my previous model that was trained on the first dataset. Doing that displayed a panel that showed the prior model was Pretrained = Yes. And the Advanced Model Configuration for the new model had the Pre-trained toggle button On (2nd screenshot).
  2. I then went through the steps of starting new model training using Google Colab (3rd screenshot).
  3. After training started I went back to the Ultralytics model web page which showed that training had started. It also showed the training parameter settings which included Pretrained = No (4th screenshot).
  4. After training of the new model was complete I did inference testing using the Ultralytics model's Preview tab. In all cases the images associated with the original pretrained model (the "parent" of the new model) were incorrectly identified as an image used in training the new model. I.e. the "parent" model's trained images did not flow through to the new model.
    Screenshot 2024-11-06 at 3 11 43 PM copy
    Screenshot 2024-11-06 at 3 13 08 PM copy
    Screenshot 2024-11-06 at 3 16 31 PM copy
    Screenshot 2024-11-06 at 3 16 44 PM copy

Additional

Questions:

  1. Should the images of the pretrained "parent" model flow through to the "child" model?
  2. Can you explain why the model's page (4th screenshot) has Pretrained = No when the parameters specified in model creation step 2 of 3 has Pretrained = Yes?
@curtinmjc curtinmjc added the question Further information is requested label Nov 6, 2024
@UltralyticsAssistant UltralyticsAssistant added classify Image Classification issues, PR's HUB Ultralytics HUB issues labels Nov 6, 2024
@UltralyticsAssistant
Copy link
Member

👋 Hello @curtinmjc, thank you for raising an issue about Ultralytics HUB 🚀! Your contribution helps us improve and address any potential issues.

For more insights into the Ultralytics HUB, please explore our comprehensive HUB Docs:

As this seems like a 🐛 Bug Report, we would appreciate it if you could provide a minimum reproducible example (MRE) to help us understand the issue better. This can include a detailed description along with code snippets or configuration files that showcase the problem.

On your specific concerns:

  1. The expected behavior of pretrained models and whether the "parent" images should affect the "child" model is definitely something an Ultralytics engineer will look into for you.
  2. The discrepancy between the Pretrained parameter settings in your workflow and the final model training page will also be addressed.

We aim to resolve all issues promptly and appreciate your patience. An Ultralytics engineer will follow up with you soon to provide further assistance. Thank you for your understanding! 😊

@curtinmjc
Copy link
Author

As far as a minimum reproducible example (MRE) is concerned, the first model training used the Official YOLO11n classify architecture with my first custom dataset. The second model training started using the Custom model that I had trained initially with my second custom dataset. I can provide you with the two Classify datasets, but I do not believe there is anything special about them.

@pderrenger
Copy link
Member

Hello @curtinmjc,

Thank you for providing additional context about your training process. It sounds like you're using a well-structured approach with the YOLO11n classify architecture and custom datasets. To address the issue you're experiencing, here are a few steps you can take:

  1. Verify with Latest Versions: Ensure that you're using the latest version of the Ultralytics packages and the Ultralytics HUB. Updates often include bug fixes and improvements that might resolve your issue.

  2. Check Model Configuration: Double-check the configuration settings for your second model training. Ensure that the pretrained model option is correctly set and that any changes in the configuration are saved before starting the training.

  3. Review Training Logs: Examine the training logs for any discrepancies or warnings that might indicate why the pretrained setting is not being applied as expected.

  4. Reproduce the Issue: If possible, try to reproduce the issue with a smaller subset of your data. This can help isolate the problem and make it easier to identify any specific causes.

  5. Community Support: While I can't provide private support, I encourage you to share your findings and any additional questions on our GitHub Discussions or join our Discord community for further assistance from other users and the Ultralytics team.

Your feedback is invaluable, and we're here to help you get the most out of Ultralytics HUB. If you have any more details or questions, feel free to share them. 😊

@curtinmjc
Copy link
Author

Thank you for your response. My answers to your listed steps:

  1. Verify with Latest versions -- the training is happening in Google Colab. The versions are as follows: Ultralytics 8.3.28 🚀 Python-3.10.12 torch-2.5.0+cu121 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)
  2. Check Model Configuration -- I have screenshots of all sections of the web page Train a Model Step 2 of 3 and nothing was changed besides number of Epochs being reduced from 100 to 30 to save on training costs. (Note: I had the same issue with the behavior of pretrained models in prior days when I had Epochs set to the default 100.)
  3. Review Training Logs -- I replicated the model training today to check the Google Colab Logs. There is nothing in the logs after "Kernel started..." and nothing unusual in Colab Output pane
  4. Reproduce the Issue -- I already did this before opening the GitHub Issue yesterday using a smaller subset of the data. The datasets used only contain three classes each.

@pderrenger
Copy link
Member

Hello @curtinmjc,

Thank you for the detailed follow-up! It sounds like you've been thorough in your troubleshooting process. Let's see how we can further assist you:

  1. Version Check: Your setup with Ultralytics 8.3.28 and the latest CUDA and PyTorch versions looks good. It's always a good idea to ensure compatibility, and it seems you're up-to-date. 🚀

  2. Model Configuration: Since you've verified the settings and the issue persists even with different epoch settings, it might be worth checking if there are any cached configurations or settings that could be affecting the training process. Sometimes clearing the cache or starting a fresh session can help.

  3. Training Logs: The absence of detailed logs can be tricky. Ensure that logging is enabled in your Colab environment. You might want to add some print statements or logging commands in your training script to capture more detailed outputs. This can help identify if the pretrained model is being loaded correctly.

  4. Reproduce the Issue: Given that you've already tried with a smaller dataset, it might be beneficial to test with a completely different dataset or a different model architecture to see if the issue persists. This can help determine if the problem is specific to your current setup or more general.

If the issue continues, consider sharing a minimal reproducible example with the community on GitHub Discussions or Discord. This can provide more insights and allow others to replicate and diagnose the problem.

Thank you for your patience and for working with the community to resolve this. If you have any more questions or need further assistance, feel free to reach out. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classify Image Classification issues, PR's HUB Ultralytics HUB issues question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants