问题
I'm looking for simplification/encapsulation so my existing programs that use (sic) open("my_file.txt") can be ported to colaboratory with the minimum change in the existing logic flow. Happy to have some cut/paste logic before my existing logic.
The mental model I understand from google (here) is that I have to do these prerequisites to get my file loaded.
- upload to google drive
- download to python (vm, probably in /tmp)
And then I can execute my existing code w/o change.
Therefore the I suspect/propose that what works for me (but not just me!) would be an interface/function as follows:
- inputs (from local computer)
- source_file_dir
- source_file_name
- (of course authentication inputs are implicitly required)
- output
- python_vm_file_dir (dir I can use in my program; /tmp is fine)
- (implicitly I expect the same dest_file_name)
With this code snippet, I code easily move code into colaboratory.
Has anyone already created this?
Thank you.
回答1:
I've been tackling similar questions. In terms of simplicity, I found that keeping data files in Google Cloud Storage the easyiest. It's quite well explained in the tutorial - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb
I've found the easiest thing to do is insert cells to copy data to the VM running the notebook
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt
That way I can generally leave the 'active' code blocks the same that I run locally.
I use a chromebook when I'm out and about, so like to keep as much in the cloud as possible. It's quite easy to set up a 'mapped network drive' (in windows speak) to a GCS bucket - for moving files around. It's also very easy on Linux. Windows, I found that this utility is really handy https://www.cloudberrylab.com/drive/google-cloud.aspx - Not an advert, I'm just a fan.
回答2:
Upload to Google Drive. Here is a code snippet to access it directly.
!apt-get install -y -qq software-properties-common python-software-properties
module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret=
{creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret=
{creds.client_secret}
Now Create a drive directory
!mkdir -p drive
!google-drive-ocamlfuse drive
You can simply access any file present in google drive as drive/Filename
Eg.
df = pandas.read_hdf("drive/Colab Notebooks/S2C5_complete_cleaned_by_me_10percent.h5")
Also You only need to do this once for only one notebooks. After which you can access data in other notebooks as well.
来源:https://stackoverflow.com/questions/47949235/wish-to-use-colaboratory-what-is-simplest-way-to-do-a-get-drive-file-to-python