问题
I have a code as below that I want to submit to Google cloud ml. I already tested their example and got results.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
# Data sets
I_TRAINING = "/home/android/Desktop/training.csv"
I_TEST = "/home/android/Desktop/test.csv"
# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv(filename=I_TRAINING, target_dtype=np.int)
test_set = tf.contrib.learn.datasets.base.load_csv(filename=I_TEST, target_dtype=np.int)
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=2)]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=2,
model_dir="/tmp/my_model")
# Fit model.
classifier.fit(x=training_set.data, y=training_set.target, steps=2000)
# Evaluate accuracy.
accuracy_score = classifier.evaluate(x=test_set.data, y=test_set.target)["accuracy"]
print('Accuracy: {0:f}'.format(accuracy_score))
# Classify two new flower samples.
#new_samples = np.array(
# [[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
#y = classifier.predict(new_samples)
#print('Predictions: {}'.format(str(y)))
It's a code to train and create a DNN model in tensorflow. I already tested it locally and received results. I put this code in a folder named trainer along with init.py file, and uploaded the folder to gs://bucket-ml/second_job/trainer. Second_job is the JOB_NAME.
Then, when I want to submit this as a job, I do this and get the following error:
gcloud beta ml jobs submit training ${JOB_NAME} \
--package-path=trainer \
--module-name=trainer.trainer \
--staging-bucket="${TRAIN_BUCKET}" \
--region=us-central1 \
--train_dir="${TRAIN_PATH}/train"
ERROR: (gcloud.beta.ml.jobs.submit.training)
Packaging of user python code failed with message:
running sdist
running egg_info
creating trainer.egg-info
writing trainer.egg-info/PKG-INFO
writing top-level names to trainer.egg-info/top_level.txt
writing dependency_links to trainer.egg-info/dependency_links.txt
writing manifest file 'trainer.egg-info/SOURCES.txt'
error: package directory 'trainer' does not exist
Try manually writing a setup.py file at your package root
and rerunning the command
I am not sure if the package-path and module-name are correct. Please advise me what to do. Thanks Regards,
回答1:
The --package-path
argument to the gcloud command should point to a directory that is a valid Python package, i.e., a directory that contains an __init__.py
file (often an empty file). Note that it should be a local directory, not one on GCS.
The --module
argument will be the fully qualified name of a valid Python module within that package. You can organize your directories however you want, but for the sake of consistency, the samples all have a Python package named trainer
with the module to be run named task.py
.
The directory structure of the samples look like:
trainer/
__init__.py
task.py
__init__.py
will likely be an empty file. task.py
contains your code. Then you can submit your job as follows:
gcloud beta ml jobs submit training ${JOB_NAME} \
--package-path=trainer \
--module-name=trainer.task \
--staging-bucket="${TRAIN_BUCKET}" \
--region=us-central1 \
-- \
--train_dir="${TRAIN_PATH}/train"
You can choose whatever names you want for your package and modules, just make sure the names on disk and the gcloud arguments match up: top-level directory is --package-path
and the file with your code to run is --module
(without the .py
suffix).
A few notes:
- Note the extra '-- \'. That indicates that all following arguments should be passed through to your program. That is, --train_dir is NOT an argument to gcloud beta ml jobs submit training and will be passed as a flag to your program
- If you intend to use train_dir, you'll need to add some flag parsing to your code, e.g., using argparse.
- Files you read in the cloud need to be on GCS.
- Although flag parsing gives you more flexibility, it's not required. You can hard code paths to filenames. Just make sure they point to objects on GCS (and then remove the
--train_dir
from the gcloud call)
来源:https://stackoverflow.com/questions/40281299/submitting-a-training-job-to-google-cloud-ml