Estimator.predict() has Shape Issues?

匿名 (未验证) 提交于 2019-12-03 01:25:01

问题:

I can train and evalaute a Tensorflow Estimator model without any problems. When I do prediction, this error arises:

InvalidArgumentError (see above for traceback): output_shape has incorrect number of elements: 68 should be: 2      [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]] 

All of the model functions use the same architecture:

def _train_model_fn(features, labels, mode, params):     features = _network_fn(features, mode, params)      outputs = _get_output(features, params["output_layer"],                           params["num_classes"])     predictions = {         "outputs": outputs     }      ... # loss initialization and whatnot  def _eval_model_fn(features, labels, mode, params):     features = _network_fn(features, mode, params)     outputs = _get_output(features, params["output_layer"], params["num_classes"])     predictions = {         "outputs": outputs     }      ... # loss initialization and whatnot   def _predict_model_fn(features, mode, params):     features = _network_fn(features, mode, params)     outputs = _get_output(features, params["output_layer"], params["num_classes"])     predictions = {         "outputs": outputs     }      ... 

Here's the predict code:

def predict(params, features, checkpoint_dir):     estimator = tf.estimator.Estimator(model_fn=_predict_model_fn,                                        params=params,                                        model_dir=checkpoint_dir)     predictions = estimator.predict(input_fn=_input_fn(features))     for i, p in enumerate(predictions):         print(i, p) 

I also checked the shapes given every time the input passes a layer when training, and the same thing for predicting. They give the same shapes:

Training:

conv2d [1, 358, 358, 16] max_pool2d [1, 179, 179, 16] collapse_to_rnn_dims [1, 179, 2864] birnn [1, 179, 64] 

Prediction:

conv2d [1, 358, 358, 16] max_pool2d [1, 179, 179, 16] collapse_to_rnn_dims [1, 179, 2864] birnn [1, 179, 64] 

Here are the SparseTensors I passed to sparse_to_dense:

Training:

SparseTensor(indices=Tensor("CTCBeamSearchDecoder:0", shape=(?, 2), dtype=int64), values=Tensor("CTCBeamSearchDecoder:1", shape=(?,), dtype=int64), dense_shape=Tensor("CTCBeamSearchDecoder:2", shape=(2,), dtype=int64)) 

Evaluation:

SparseTensor(indices=Tensor("CTCBeamSearchDecoder:0", shape=(?, 2), dtype=int64), values=Tensor("CTCBeamSearchDecoder:1", shape=(?,), dtype=int64), dense_shape=Tensor("CTCBeamSearchDecoder:2", shape=(2,), dtype=int64)) 

Prediction:

SparseTensor(indices=Tensor("CTCBeamSearchDecoder:0", shape=(?, 2), dtype=int64), values=Tensor("CTCBeamSearchDecoder:1", shape=(?,), dtype=int64), dense_shape=Tensor("CTCBeamSearchDecoder:2", shape=(2,), dtype=int64)) 

Which are all pretty much the same.

Any reason why this is happening? Shouldn't the _predict_model_fn work given that it follows the same architecture as that of the other model_fns?

Here's the full stacktrace:

INFO:tensorflow:Using default config. INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_service': None, '_save_summary_steps': 100, '_model_dir': 'checkpoint\\model-20180419-150303', '_task_id': 0, '_evaluation_master': '', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000091F58B3080>, '_num_ps_replicas': 0, '_master': '', '_save_checkpoints_secs': 600, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_num_worker_replicas': 1} INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Restoring parameters from checkpoint\model-20180419-150303\model.ckpt-1 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. Process Process-2: Traceback (most recent call last):   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call     return fn(*args)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn     target_list, status, run_metadata)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__     c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: output_shape has incorrect number of elements: 68 should be: 2      [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]  During handling of the above exception, another exception occurred:  Traceback (most recent call last):   File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap     self.run()   File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 93, in run     self._target(*self._args, **self._kwargs)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\train_ocr.py", line 42, in evaluate_model     evaluate(architecture_params, images, labels, checkpoint_dir)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 82, in evaluate     predict(params, features, checkpoint_dir)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 90, in predict     for i, p in enumerate(predictions):   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 492, in predict     preds_evaluated = mon_sess.run(predictions)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 546, in run     run_metadata=run_metadata)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1022, in run     run_metadata=run_metadata)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1113, in run     raise six.reraise(*original_exc_info)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\six.py", line 693, in reraise     raise value   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1098, in run     return self._sess.run(*args, **kwargs)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1170, in run     run_metadata=run_metadata)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 950, in run     return self._sess.run(*args, **kwargs)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 905, in run     run_metadata_ptr)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run     feed_dict_tensor, options, run_metadata)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run     options, run_metadata)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call     raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: output_shape has incorrect number of elements: 68 should be: 2      [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]  Caused by op 'output', defined at:   File "<string>", line 1, in <module>   File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main     exitcode = _main(fd)   File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\spawn.py", line 119, in _main     return self._bootstrap()   File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap     self.run()   File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 93, in run     self._target(*self._args, **self._kwargs)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\train_ocr.py", line 42, in evaluate_model     evaluate(architecture_params, images, labels, checkpoint_dir)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 82, in evaluate     predict(params, features, checkpoint_dir)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 90, in predict     for i, p in enumerate(predictions):   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 479, in predict     features, None, model_fn_lib.ModeKeys.PREDICT, self.config)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 793, in _call_model_fn     model_fn_results = self._model_fn(features=features, **kwargs)   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 217, in _predict_model_fn     outputs = _get_output(features, params["output_layer"], params["num_classes"])   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 134, in _get_output     return _sparse_to_dense(decoded, name="output")   File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 38, in _sparse_to_dense     name=name)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\ops\sparse_ops.py", line 791, in sparse_to_dense     name=name)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_sparse_ops.py", line 2401, in _sparse_to_dense     name=name)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper     op_def=op_def)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op     op_def=op_def)   File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in __init__     self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access  InvalidArgumentError (see above for traceback): output_shape has incorrect number of elements: 68 should be: 2      [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]] 

Update

I tried using the same architecture in a different training run, I encountered a different shap error:

InvalidArgumentError (see above for traceback): output_shape has incorrect number of elements: 69 should be: 2      [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]] 

Apparently, the problem seems to lie in the ctc_beam_search_decoder. Switching to ctc_greedy_decoder doesn't help either. Why is it doing this?

More updates

I have uploaded the reproducible example: https://github.com/selcouthlyBlue/ShapeErrorReproduce

回答1:

I have finally figured out the error. The problem actually lies in the way I used sparse_to_dense. Apparently, the order I gave is wrong where the values came first before the shape:

return tf.sparse_to_dense(tf.to_int32(decoded[0].indices),                               tf.to_int32(decoded[0].values),                               tf.to_int32(decoded[0].dense_shape),                               name="output") 

The order should be (shape comes first before values):

return tf.sparse_to_dense(tf.to_int32(decoded[0].indices),                               tf.to_int32(decoded[0].dense_shape),                               tf.to_int32(decoded[0].values),                               name="output") 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!