Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]

╄→гoц情女王★ 提交于 2020-12-10 04:07:19

问题


I am trying to train a model, at first I had dataset of 5000 images and training worked fine, Now I have added couple of more images and now my dataset contains 6,423‬ images. I am using python 3.6.1 on Ubuntu 18.04, my tensorflow version is 1.15 & numpy version is 1.16 (had same versions before and it worked fine). Now when I use:

python model_main.py --logtostderr --pipeline_config_path=training/faster_rcnn_resnet50_coco.config --model_dir=training

It starts settings up for couple of minutes and after these lines:

INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt. 
I1123 10:26:21.548237 140482563244160 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt. 
2019-11-23 10:28:30.801453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 

I get following erros:

2019-11-23 10:08:38.843259: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.843323: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.843345: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.                 
2019-11-23 10:08:38.851405: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.851488: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.851512: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.                 
2019-11-23 10:08:38.851807: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.851848: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.                 
2019-11-23 10:08:38.851899: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.               
Traceback (most recent call last):                                                                                                                                                                                                             
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call                                                                                                                                 
 return fn(*args)                                                                                                                                                                                                                           
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn                                                                                                                                  
 target_list, run_metadata)                                                                                                                                                                                                                 
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun                                                                                                                      
 run_metadata)                                                                                                                                                                                                                            
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.                                                                                                                                                           
(0) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]                                                                                                   
[[{{node IteratorGetNext}}]]                                                                                                                                                                                                                 
[[ToAbsoluteCoordinates_118/Assert/AssertGuard/Assert/data_0/_5709]]                                                                                                                                                                  
(1) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]                                                                                                   
[[{{node IteratorGetNext}}]]                                                                                                                                                                                                        
0 successful operations.                                                                                                                                                                                                                     
0 derived errors ignored. 

and training stops.


回答1:


It seems that the new images you've added have a resolution of 585x1024, which differs from the size that's expected by the model i.e. 600x799.

If so, then the solution is to resize these new images accordingly.




回答2:


Changing the batch_size to 1 fixed this issue for me.




回答3:


If you need batch size > 1, you can resize the images to a uniform size with the right image_resizer in the config, one of the ones defined in the image_resizer protobuf file, which I assume is what is used to parse that part of the config.

For example (stolen from here):

image_resizer {
  fixed_shape_resizer {
    height: 600
    width: 800
  }
}

This seems to fix the problem for me.



来源:https://stackoverflow.com/questions/59006696/cannot-add-tensor-to-the-batch-number-of-elements-does-not-match-shapes-are

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!