Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to retrain you model ? #10

Open
sonia-auv-private opened this issue Mar 21, 2018 · 22 comments
Open

Is it possible to retrain you model ? #10

sonia-auv-private opened this issue Mar 21, 2018 · 22 comments

Comments

@sonia-auv-private
Copy link

Hi,

I was wondering if there is any method that would let us retrain this model using Pascal voc a notion files and images ???

@gustavz
Copy link
Owner

gustavz commented Mar 21, 2018

yes ofcourse. Just use the skripts train.py and eval.py provided by Tensorflow's Object Detection API like you would with any other model.
In stuff/ssd_mobilenet_checkpoints you find the same checkpoint files i used, but they are the original ones provided by Tensorflow.

@gauthiermartin
Copy link

Thank you

@uzbhutta
Copy link

Hi,
Just to clarify, I must train my Tensorflow Object Detection API only on 600x600px or 300x300px images in order for it to work with config file, and then place my trained ckpt file under stuff/ssd_mobilenet_checkpoints and run your scripts as usual, is this correct?

Thanks so much.

@gustavz
Copy link
Owner

gustavz commented Apr 26, 2018

Hey @uzbhutta,

I suggest you should first take a closer look at tensorflows original object detection API. Try to understand how training and inferencing works, which scripts are usable. And after that you take a look at my code and what it does.

To give you a short overview:
It does not matter what size your images have that you train on as if you train with tfs api they will always be resized to a fixed size which you set in the config file. And this size is normally 300x300 for SSD.
But you can ofcourse train a network on 600x600 if you like.
But then you won’t be able to use a pretrained model as starting point as the weights are bound to the input dimensions that you train on.

So while training you get several checkpoints in an interval that you also set in the config.

And finally when you want to use my api to do inference, then you need to export one of those checkpoint files to a frozen model in the pb format.

This frozen model can then be included in my api and Adressen correctly in my config.yml.

And another thing: make sure to use my checkpoint files as starting point as my speed hack, the split model + multithreading only works if your model has the exact same layer names as mine.

I hope i could clearify some things for you.

Cheers
Gustav

@David-Lee-1990
Copy link

David-Lee-1990 commented Jun 6, 2018

@gustavz where is your checkpoint file? I trained on my own labeled data with tensorflow's object detection api using your config file located in models/ssd_mobilenet_v11_coco/. After training, I replace the frozen graph in models/ssd_mobilenet_v11_coco/.

When do inferencing, there comes an error:

ValueError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

I wonder why my frozen graph has the Node 'Preprocessor/map/TensorArray_2' but your frozen graph does not.

@gustavz
Copy link
Owner

gustavz commented Jun 6, 2018

@David-Lee-1990
Which version of the model_zoo did you take? (which date is added at the end?)
As Tensorflow seems to have changed some layer names in the newer version than the one i used
(2017_11_17).

My checkpoint file is inside the model dir of ssd_mobilenet: https://github.com/GustavZ/realtime_object_detection/tree/master/models/ssd_mobilenet_v11_coco

With this checkpoint it should work, at least it did for my retrainings.

I hope i could help you!

@David-Lee-1990
Copy link

@gustavz I retrained my data using the configue file and the model.ckpt files in your model dir of ssd_mobilenet. But after that, I still encounter the same problem ( Node 'Preprocessor/map/TensorArray_2'). I wonder whether this is caused by the version difference of tensorflow? my tensorflow version is 1.8.

@gustavz
Copy link
Owner

gustavz commented Jun 7, 2018

Yes pretty sure.
There are so many changings during the version which lead to strange behavior and errors.
I also keep switching versions all the time when I face errors.

Try tf 1.4 that’s where I started this project.

@David-Lee-1990
Copy link

tf 1.4 is not available for training tensorflow's object detection api now for the 'AttributeError: module 'tensorflow.contrib.data' has no attribute 'parallel_interleave'.

I tried tf 1.5 to retrain the model, but the result graph still has the node 'Preprocessor/map/TensorArray_2'.
This drives me crazy!

@AnthonyLabaere
Copy link

AnthonyLabaere commented Jun 15, 2018

Hi @gustavz,

First of all thanks for your work. It's really great.

However I have the same problem :/

Traceback (most recent call last):
  File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 489, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "...\realtime_object_detection-2.0\run_objectdetection.py", line 178, in <module>
    config.NUM_CLASSES,config.SPLIT_MODEL, config.SSD_SHAPE).prepare_od_model()
  File "...\realtime_object_detection-2.0\rod\model.py", line 157, in prepare_od_model
    self.load_frozenmodel()
  File "...\realtime_object_detection-2.0\rod\model.py", line 129, in load_frozenmodel
    tf.import_graph_def(remove, name='')
  File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 493, in import_graph_def
    raise ValueError(str(e))
ValueError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

I have trained my model with tf 1.8, replaced your configuration and model by mines and tried a run.
The same issue occurs when I try a run with your release 1.0.

For information :

  • I runned object_detection.py on your release 1.0 and run_objectdetection.py on 2.0 : it worked with your default configuration.
  • I runned object_detection_tutorial (from object_detection) with my model and it worked.

@AnthonyLabaere
Copy link

Ok my bad, i turned off SPLIT_MODEL and it works now.

@gustavz
Copy link
Owner

gustavz commented Jun 15, 2018

Dont use v2.0
Use master.
I will update that next week

@David-Lee-1990
Copy link

@AnthonyLabaere Hi, after turning on SPLIT_MODEL, your model works now? ValueError: Node 'Preprocessor/map/TensorArray_2' gone?

@gustavz
Copy link
Owner

gustavz commented Jun 15, 2018

Again: the split_model speed hack will ONLY work with ssd_mobilenet_v1 Models that are exported from the exact same checkpoint that I used and published in /models.
Tensorflow and also the SSDMetaArch inside models/object_detection changes.

I have no insight on this as I am not working with ssd anymore.
If you want to apply the speed hack to those models you need to investigate by your own. Sorry.

But if you find a solution you are very welcome to contribute / file a PR.

Gustav

@David-Lee-1990
Copy link

@gustavz ok, thanks!

@AnthonyLabaere
Copy link

AnthonyLabaere commented Jun 15, 2018

@David-Lee-1990 I just succeeded to make it work on my computer (on Windows) and on my raspberry (with some updates) with my model.
And yes the issue with 'Preprocessor/map/TensorArray_2' is gone because this part (with SPLIT_MODEL true) concerns the GPU.

@gustavz If I find a "real" solution I would make a PR but for now I didn't find anything :/ Ok I will use master i nthe future.

@David-Lee-1990
Copy link

@AnthonyLabaere is your model trained by tensorflow's object detection api? what do you mean by saying ''Preprocessor/map/TensorArray_2' is gone because this part concerns the GPU'?
I check the frozen graph generated by tensorflow, and find after the node 'TensorArray_2' , the graph directly goes to Batch-NMS nodes without feature extraction.

@AnthonyLabaere
Copy link

@David-Lee-1990 yes it is trained by tensorflow's object detection api.
Well, concerning the 'Preprocessor/map/TensorArray_2', I spoke too fast. I don't know why the problem is gone sorry.

How do you see that ? With tensorboard ?

@naisy
Copy link

naisy commented Jun 16, 2018

Hi,

Split model hack solution is only avaiable in ssd_mobilenet_v1 with 300x300.
'Preprocessor/map/TensorArray_2' that appears with 600x600 train image.

Set your ssd_mobilenet_v1_coco.config with 300x300 size.

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }

See also:
https://github.com/tensorflow/models/issues/3270

@David-Lee-1990
Copy link

@naisy Hi, have you tried this 300*300 config? In fact, my config is set with 300 * 300 all the time, but there is still the error.

@naisy
Copy link

naisy commented Jun 18, 2018

Hi @David-Lee-1990,

I check config now. config in master branch was changed.
Please use r1.5 branch for ssd_mobilenet_v1.

--- r1.5	2018-06-18 01:43:31.752331891 +0000
+++ master	2018-06-18 01:43:18.056376250 +0000
@@ -108,12 +108,10 @@
     loss {
       classification_loss {
         weighted_sigmoid {
-          anchorwise_output: true
         }
       }
       localization_loss {
         weighted_smooth_l1 {
-          anchorwise_output: true
         }
       }
       hard_example_miner {
@@ -193,5 +191,4 @@
   label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
   shuffle: false
   num_readers: 1
-  num_epochs: 1
 }

My own training is here:
https://github.com/naisy/train_ssd_mobilenet

@David-Lee-1990
Copy link

@naisy Thank you for your tips. Problem solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants