How to continouosly evaluate a tensorflow object detection model in parallel to training with model_main

本秂侑毒 提交于 2020-01-01 17:27:21

问题


I successfully trained an object detection model with custom examples using train.py and eval.py. Running both programms in parallel I was able to visualize training and evaluation metrics in tensorboard during training.

However both programs were moved to the legacy folder and model_main.py seems to be the preferred way to run training and evaluation (by executing only a single process). However when I start model_main.py with the following pipeline.config:

train_config {
  batch_size: 1
  num_steps: 40000
  ...
}
eval_config {
  # entire evaluation set
  num_examples: 821
  # for continuous evaluation
  max_evals: 0
  ...
}

I see with enabled INFO logging in the output of model_main.py that training and evaluation are executed sequentially (as opposed to concurrently as before with two processes) and after every single training step a complete evaluation takes place.

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 35932: ...
INFO:tensorflow:Saving checkpoints for 35933 into ...
INFO:tensorflow:Calling model_fn.
...
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-08-30-10:06:47
...
INFO:tensorflow:Restoring parameters from .../model.ckpt-35933
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [82/821]
...
INFO:tensorflow:Evaluation [738/821]
INFO:tensorflow:Evaluation [820/821]
INFO:tensorflow:Evaluation [821/821]
...
INFO:tensorflow:Finished evaluation at 2018-08-30-10:29:35
INFO:tensorflow:Saving dict for global step 35933: ...
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 35933: .../model.ckpt-35933
INFO:tensorflow:Saving checkpoints for 35934 into .../model.ckpt.
INFO:tensorflow:Calling model_fn.
...
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-08-30-10:29:56
...
INFO:tensorflow:Restoring parameters from .../model.ckpt-35934

This of course slows down training in a way that almost no progress is made. When I reduce the evaluation steps with model_main's command line parameter --num_eval_steps to 1 training is as fast as it was before (using train.py and eval.py), however the evaluation metrics become useless (e.g. the DetectionBoxes_Precision/mAP... become constant and have values like 1, 0 or even -1). To me it seems it is constantly computing these values for the same single image only.

So what is the right way to start model_main.py such that is does make reasonable fast progress and in parallel computes the evaluation metrics from the entire evaluation set.


回答1:


Inside training.py there's a class EvalSpec which is called in main_lib.py. Its constructor has a parameter called throttle_secs which sets the interval between consequent evaluations and has a default value of 600, and it never gets a different value in model_lib.py. If you have a specific value you want, you can simply change the default value, but the better practice of course is to pass it as a parameter of model_main.py which will feed EvalSpec through model_lib.py.

In more details, set it as another input flag flags.DEFINE_integer('throttle_secs', <DEFAULT_VALUE>, 'EXPLANATION'), then throttle_secs=FLAGS.throttle_secs, and then change model_lib.create_train_and_eval_specs to also receive throttle_secs, and inside it, add it to the call of tf.estimator.EvalSpec.

EDIT: I found out that you can also set eval_interval_secs in the eval_config of the .config file. In case this works (not all flags are supported since they moved from eval.py to model_main.py) - this is obviously a simpler solution. If not - use the solution above.

EDIT2: I tried using eval_interval_secs in eval_config, and it didn't work, so you should use the first solution.



来源:https://stackoverflow.com/questions/52099724/how-to-continouosly-evaluate-a-tensorflow-object-detection-model-in-parallel-to

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!