Submitting a Training Job to Google Cloud ML

问题

I have a code as below that I want to submit to Google cloud ml. I already tested their example and got results.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np

# Data sets
I_TRAINING = "/home/android/Desktop/training.csv"
I_TEST = "/home/android/Desktop/test.csv"

# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv(filename=I_TRAINING, target_dtype=np.int)
test_set = tf.contrib.learn.datasets.base.load_csv(filename=I_TEST, target_dtype=np.int)

# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=2)]

# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                            hidden_units=[10, 20, 10],
                                            n_classes=2,
                                            model_dir="/tmp/my_model")

# Fit model.
classifier.fit(x=training_set.data, y=training_set.target, steps=2000)

# Evaluate accuracy.
accuracy_score = classifier.evaluate(x=test_set.data, y=test_set.target)["accuracy"]
print('Accuracy: {0:f}'.format(accuracy_score))

# Classify two new flower samples.
#new_samples = np.array(
 #   [[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
#y = classifier.predict(new_samples)
#print('Predictions: {}'.format(str(y)))

It's a code to train and create a DNN model in tensorflow. I already tested it locally and received results. I put this code in a folder named trainer along with init.py file, and uploaded the folder to gs://bucket-ml/second_job/trainer. Second_job is the JOB_NAME.

Then, when I want to submit this as a job, I do this and get the following error:

gcloud beta ml jobs submit training ${JOB_NAME}  \ 
--package-path=trainer   \
--module-name=trainer.trainer   \
--staging-bucket="${TRAIN_BUCKET}"   \
--region=us-central1   \
--train_dir="${TRAIN_PATH}/train"

ERROR: (gcloud.beta.ml.jobs.submit.training) 
    Packaging of user python code failed with message:
      running sdist
running egg_info
creating trainer.egg-info
writing trainer.egg-info/PKG-INFO
writing top-level names to trainer.egg-info/top_level.txt
writing dependency_links to trainer.egg-info/dependency_links.txt
writing manifest file 'trainer.egg-info/SOURCES.txt'
error: package directory 'trainer' does not exist
    Try manually writing a setup.py file at your package root
    and rerunning the command

I am not sure if the package-path and module-name are correct. Please advise me what to do. Thanks Regards,

回答1:

The --package-path argument to the gcloud command should point to a directory that is a valid Python package, i.e., a directory that contains an __init__.py file (often an empty file). Note that it should be a local directory, not one on GCS.

The --module argument will be the fully qualified name of a valid Python module within that package. You can organize your directories however you want, but for the sake of consistency, the samples all have a Python package named trainer with the module to be run named task.py.

The directory structure of the samples look like:

trainer/
  __init__.py
  task.py

__init__.py will likely be an empty file. task.py contains your code. Then you can submit your job as follows:

gcloud beta ml jobs submit training ${JOB_NAME}  \ 
  --package-path=trainer   \
  --module-name=trainer.task   \
  --staging-bucket="${TRAIN_BUCKET}"   \
  --region=us-central1   \
  -- \
  --train_dir="${TRAIN_PATH}/train"

You can choose whatever names you want for your package and modules, just make sure the names on disk and the gcloud arguments match up: top-level directory is --package-path and the file with your code to run is --module (without the .py suffix).

A few notes:

Note the extra '-- \'. That indicates that all following arguments should be passed through to your program. That is, --train_dir is NOT an argument to gcloud beta ml jobs submit training and will be passed as a flag to your program
If you intend to use train_dir, you'll need to add some flag parsing to your code, e.g., using argparse.
Files you read in the cloud need to be on GCS.
Although flag parsing gives you more flexibility, it's not required. You can hard code paths to filenames. Just make sure they point to objects on GCS (and then remove the --train_dir from the gcloud call)

来源：https://stackoverflow.com/questions/40281299/submitting-a-training-job-to-google-cloud-ml

标签

google-cloud-platform

google-cloud-ml