问题
After having created a TensorFlow 1.4 model for Python 3, I have now found that Google Cloud ML Engine currently only has support for Python 2.7.
Back-porting my Python 3 code at first seemed simple enough: Some scripts still work as expected when I replace their shebang #!/usr/bin/env python3
with #!/usr/bin/env python
. python -V
reports 2.7.10
in my (macOS) environment.
Yet one script does not react so gracefully. When I run it now, it produces a Segmentation fault: 11
without any previous warnings or other diagnostic output.
How can I find out about the root cause, so that I know what else to change to make also that script palatable to Python 2?
UPDATE The segmentation fault apparently occurs during a call to session.run(get_next)
, where get_next
is obtained from a tf.data.Iterator
as follows:
iterator = dataset.make_initializable_iterator()
get_next = iterator.get_next()
回答1:
There are two issues here: one is about Python 3 support and the other is about the segfault.
Python 3 Support CloudML Engine now supports Python 3, via the 'pythonVersion' field when submitting jobs (see the API reference docs).
If you are using gcloud
you will need to create a config file like this (let's name it config.yaml
):
trainingInput:
pythonVersion: "3.5"
When you submit your job, point gcloud
to that file, e.g.
gcloud ml-engine jobs submit training --config=config.yaml ...
Segfault This may be caused by running out of memory. Please check the memory usage in the console for that job. That said, if the job dies abruptly, memory usage at the time of failure may not be accurately reflected for that job.
来源:https://stackoverflow.com/questions/47943039/segmentation-fault-11-after-back-porting-tensorflow-script-from-python-3-to-pyt