Cloud ML Unable to find the file on Google Cloud Storage

爱⌒轻易说出口 提交于 2019-12-11 11:53:19

问题


I am reading my data file using the following commands:

data_dir = arguments['data_dir']
data = pd.read_csv(data_dir + "/train.csv")

I am using this data to train my model on Google Cloud ML, I am successfully able to schedule the job but getting the following IO error while fetching the file:

IOError: File gs://cloud-bucket/data/train.csv does not exist

The address of the file is proper as I have uploaded it using the console in the above mentioned bucket. Also the Cloud ML is working in the same region and configured with the same project as my bucket


回答1:


GCS is not a POSIX file system and therefore you cannot typically use "regular" file libraries to manipulate files on GCS (e.g. see this, this, and this), including, of course, convenience functions like pd.read_csv.

In the case of pandas, you can pass a file handle, so, based on the aforementioned post, I recommend using TensorFlow's File wrapper which can read from GCS or standard POSIX file systems to enable you to run the same code locally and on the cloud:

from tensorflow.python.lib.io import file_io

data_dir = arguments['data_dir']
with file_io.FileIO(data_dir + "/train.csv", mode ='r') as f:
  data = pd.read_csv(f)

It might also be helpful to test your code by running it locally and passing in GCS filenames before submitting a cloud job.



来源:https://stackoverflow.com/questions/47942299/cloud-ml-unable-to-find-the-file-on-google-cloud-storage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!