google-cloud-ml | 易学教程

tf.datasets input_fn getting error after 1 epoch

阅读更多关于 tf.datasets input_fn getting error after 1 epoch

问题 So I am trying to switch to an input_fn() using tf.datasets as described in this question. While I have been able to get superior steps/sec using tf.datasets with the input_fn() below, I appear to run into an error after 1 epoch when running this experiment on GCMLE. Consider this input_fn(): def input_fn(...): files = tf.data.Dataset.list_files(filenames).shuffle(num_shards) dataset = files.apply(tf.contrib.data.parallel_interleave(lambda filename: tf.data.TextLineDataset(filename).skip(1),

Segmentation fault: 11 after back-porting TensorFlow script from Python 3 to Python 2

阅读更多关于 Segmentation fault: 11 after back-porting TensorFlow script from Python 3 to Python 2

问题 After having created a TensorFlow 1.4 model for Python 3, I have now found that Google Cloud ML Engine currently only has support for Python 2.7. Back-porting my Python 3 code at first seemed simple enough: Some scripts still work as expected when I replace their shebang #!/usr/bin/env python3 with #!/usr/bin/env python . python -V reports 2.7.10 in my (macOS) environment. Yet one script does not react so gracefully. When I run it now, it produces a Segmentation fault: 11 without any previous

Creating a serving graph separately from training in tensorflow for Google CloudML deployment?

阅读更多关于 Creating a serving graph separately from training in tensorflow for Google CloudML deployment?

问题 I am trying to deploy a tf.keras image classification model to Google CloudML Engine. Do I have to include code to create serving graph separately from training to get it to serve my models in a web app? I already have my model in SavedModel format ( saved_model.pb & variable files), so I'm not sure if I need to do this extra step to get it to work. e.g. this is code directly from GCP Tensorflow Deploying models documentation def json_serving_input_fn(): """Build the serving inputs.""" inputs

Using a nightly TensorFlow build for training with Cloud ML Engine

阅读更多关于 Using a nightly TensorFlow build for training with Cloud ML Engine

问题 If I need to use a nightly TensorFlow build in a Cloud ML Engine training job, how do I do it? 回答1: Download a nightly build from https://github.com/tensorflow/tensorflow#installation. How to pick the right build: use "Linux CPU-only" or "Linux GPU" depending on whether you need to use GPUs for training, use the Python 2 build. Rename the .whl file, for example mv tensorflow-1.0.1-cp27-cp27mu-linux_x86_64.whl \ tensorflow-1.0.1-cp27-none-linux_x86_64.whl (here we renamed the cpu27mu to none .

Docker container for google cloudML on compute engine - authenticating for mounting bucket

阅读更多关于 Docker container for google cloudML on compute engine - authenticating for mounting bucket

问题 I have been working with google's machine learning platform, cloudML . Big picture: I'm trying to figure out the cleanest way to get their docker environment up and running on google compute instances, have access to the cloudML API and my storage bucket. Starting locally, I have my service account configured C:\Program Files (x86)\Google\Cloud SDK>gcloud config list Your active configuration is: [service] [compute] region = us-central1 zone = us-central1-a [core] account = 773889352370

Why am I getting an “Error loading notebook” error when trying to set up Datalab and do image classification on Cloud ML Engine?

阅读更多关于 Why am I getting an “Error loading notebook” error when trying to set up Datalab and do image classification on Cloud ML Engine?

问题 I am following the tutorial here: https://codelabs.developers.google.com/codelabs/cloud-ml-engine-image-classification/index.html?index=..%2F..%2Findex#0 and it is claiming that it will allow me to do image classification on the google cloud. I follow the instructions but when I get to step 4 where I "Start a datalab notebook". It tells me to open the docs folder in Google Cloud DataLab and then open the file called: Hello World.ipynb. WHen I open this file I get a really weird error that I

Differents outputs from predictions using Tensorflow from same data?

阅读更多关于 Differents outputs from predictions using Tensorflow from same data?

问题 I am caught in a problem here when I try to take the predictions from my training model. The scenario is: I train a neural network model to learn and classify pictures using Tensorflow. When I train in Gcloud, it returns a different results that when I train locally. Even using the same OS, libraries and code, it returns differents outputs. Some questions came on my mind: The data I am talking about is the pictures I am using to train. I) Considering that you are in the same machine, every

Google Cloud ML scipy.misc.imread returning <PIL.JpegImagePlugin.JpegImageFile>

阅读更多关于 Google Cloud ML scipy.misc.imread returning

问题 I am running the following snippet: import tensorflow as tf import scipy.misc from tensorflow.python.lib.io import file_io file = file_io.FileIO('gs://BUCKET/data/celebA/000007.jpg', mode='r') img = scipy.misc.imread(file) If I run that snippet in Cloud Console, I get back a proper array. But when that same snippet runs in Cloud ML, the img object is <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=178x218 at 0x7F1F8F26DA10> This stackoverflow answer suggests that libjpeg was not

Number of examples in each tfrecord

阅读更多关于 Number of examples in each tfrecord

问题 Running the sample.sh script in Google Cloud Shell to call the below preprocess on set of images following the steps of flowers example. https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/flowers/trainer/preprocess.py Preprocess was successfully on both eval set and train set. But the generated .tfrecord.gz files does not seem matching the image numbers in eval/train_set.csv. i.e. eval-00000-of-00157.tfrecord.gz says there are 158 tfrecord while there are 35227 rows in eval

Memory Leak in TensorFlow Google Cloud ML Training

阅读更多关于 Memory Leak in TensorFlow Google Cloud ML Training

问题 I've been trying the TensorFlow tutorial scripts on Google Cloud ML. In particular I've used the cifar10 CNN tutorial scripts at https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10. When I run this training script in Google Cloud ML, there is a memory leak of around 0.5% per hour. I have not made any changes to the scripts other than packaging them into the required GCP format (as described in https://cloud.google.com/ml-engine/docs/how-tos/packaging-trainer) and setting