How to restore tensorflow v1.1.0 saved model in v1.13.1

你离开我真会死。 提交于 2021-01-27 16:08:46

问题


I'm trying to restore the pretrained model provided here and continue training on a different dataset. The pretrained models available there are trained on tensorflow_gpu-1.1.0. But I have tensorflow_gpu-1.13.1. When I try restoring the model, I get the below error.

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

Is it possible to convert the model to current tensorflow version? I tried a script provided here, but no luck!

If not possible to convert, I'm okay to use older tensorflow version as well. But I'm not able to install properly the old version as well. The command provided in the github page is below

pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl

But when I install tensorflow using the above command, I get the below error

Python 2.7.16 |Anaconda, Inc.| (default, Aug 22 2019, 16:00:36) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

If I install tensorflow-1.1.0 using conda, import works, but restore model fails again with the same error

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

Kindly help!


回答1:


I am really not sure if it is possible to port a model but I will try to share how I solved this issue.

First off, you should be able to create whole graph independent of the TensorFlow version. If any error occurs there it should be a minimal one. Then, you can simply copy all variables from your old model to the new one with:

RESTORE_VARS_BLACKLIST = ['dont', 'load', 'this']
ckpt_vars = tf.train.list_variables(RESTORE_VARS_CKPT)
ass_ops = []
for dst_var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
    for (ckpt_var, ckpt_shape) in ckpt_vars:
        if dst_var.name.split(":")[0] == ckpt_var and dst_var.shape == ckpt_shape and ckpt_var not in RESTORE_VARS_BLACKLIST:
            value = tf.train.load_variable(RESTORE_VARS_CKPT, ckpt_var)
            ass_ops.append(tf.assign(dst_var, value))
# Run assign in a session
sess.run(ass_ops)

At the end, just save your new model.



来源:https://stackoverflow.com/questions/57816305/how-to-restore-tensorflow-v1-1-0-saved-model-in-v1-13-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!