Google cloudml Always Gives Me The Same Results

只谈情不闲聊 提交于 2019-12-06 07:30:33
rhaertel80

There are multiple possible causes of this issue. The first that comes to mind is that the weights in your model may be being initialized to zero when it is imported. This can happen if there is an initialization defined in the graph (c.f. the loader). To check for this, use the following commands:

from tensorflow.contrib.session_bundle import session_bundle

session, _ = session_bundle.load_session_bundle_from_path("/path/to/model")
print(s.graph.get_collection("serving_init_op"))

If there is something in that collection, make sure that it isn't initializing variables.

If there are no initializers, make sure the weights themselves look reasonable, e.g.,

session, _ = session_bundle.load_session_bundle_from_path("/path/to/model")
print(session.run("name_of_var:0"))

If all of that checks out, then you'll probably want to pay attention to the inputs to the graph and the output after transforming those inputs. To this end, you can use session.run to run parts of the graph. For instance, you can feed a jpeg string and view the output of various steps along the way by using the appropriate feeds and fetches in a call to session.run.

For example, using the example from this post, we can load a JPEG from disk, feed it to the graph, and see what the data looks like after resizing and after scaling:

INPUT_PLACEHOLDER = 'Placeholder:0'
DECODE_AND_RESIZE = 'map/TensorArrayPack_1/TensorArrayGather:0'
SCALED = 'Mul:0'

# Read in a sample image, preferably with small dimensions.
jpg = open("/tmp/testing22222.jpg", "rb").read()

session, _ = session_bundle.load_session_bundle_from_path("/path/to/model")
resized, scaled = session.run([DECODE_AND_RESIZE, SCALED], feed_dict={INPUT_PLACEHOLDER: [jpg]})

By strategically placing the names of tensors in your graph in the fetch list, you can inspect what is going on in any given layer of the neural net, although the most likely problems reside with the inputs and/or variables.

The tricky part is figuring out the names of tensors. You can use the name property when defining most operations, which might be helpful. You can also use something like:

 print([o.name for o in session.graph.get_operations()])

To help inspect the operations in the graph.

Finally, you may also want to try running the graph locally in order to minimize the feedback cycle while debugging. Check out local_predict.py in the samples for an example of how to do this. This will help you iterate quickly to identify issues with the model itself.

It might also be that your inputs need to be scaled. If you have one input whose magnitude overwhelms everything else, the optimization might be poor. This is what is happening if the result you get is close to the mean of the target variable.

This is less likely in your particular case because your inputs are images, do your input values are probably similarly scaled, but more common if you are training from, say, csv files.

Google published a blog post on the image recognition task and some associated code. It starts from the retrain.py example you mentioned, but made all the modifications for it to run on Cloud ML.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!