I\'ve been digging around on this for a while. I have found a ton of articles; but none really show just tensorflow inference as a plain inference. Its always \"use the ser
While it's not as cut and dry as model.predict(), it's still really trivial.
In your model you should have a tensor that computes the final output you're interested in, let's name that tensor output. You may currently just have a loss function. If so create another tensor (variable in the model) that actually computes the output you want.
For example, if your loss function is:
tf.nn.sigmoid_cross_entropy_with_logits(last_layer_activation, labels)
And you expect your outputs to be in the range [0,1] per class, create another variable:
output = tf.sigmoid(last_layer_activation)
Now, when you call sess.run(...) just request the output tensor. Don't request the optimization OP you normally would to train it. When you request this variable tensorflow will do the minimum work necessary to produce the value (e.g. it won't bother with backprop, loss functions, and all that because a simple feed forward pass is all that's necessary to compute output.
So if you're creating a service to return inferences of the model you'll want to keep the model loaded in memory/gpu, and repeat:
sess.run(output, feed_dict={X: input_data})
You won't need to feed it the labels because tensorflow won't bother to compute ops that aren't needed to produce the output you are requesting. You don't have to change your model or anything.
While this approach might not be as obvious as model.predict(...) I'd argue that it's vastly more flexible. If you start playing with more complex models you'll probably learn to love this approach. model.predict() is like "thinking inside the box."