TensorFlow model serving on Google AI Platform online prediction too slow with instance batches
问题 I'm trying to deploy a TensorFlow model to Google AI Platform for Online Prediction. I'm having latency and throughput issues . The model runs on my machine in less than 1 second (with only an Intel Core I7 4790K CPU) for a single image. I deployed it to AI Platform on a machine with 8 cores and an NVIDIA T4 GPU. When running the model on AI Platform on the mentioned configuration, it takes a little less than a second when sending only one image. If I start sending many requests, each with