How to upload and save large data to Google Colaboratory from local drive?

放肆的年华 提交于 2019-12-09 12:07:42

问题


I have downloaded large image training data as zip from this Kaggle link

https://www.kaggle.com/c/yelp-restaurant-photo-classification/data

How do I efficiently achieve the following?

  1. Create a project folder in Google Colaboratory
  2. Upload zip file to project folder
  3. unzip the files

Thanks

EDIT: I tried the below code but its crashing for my large zip file. Is there a better/efficient way to do this where I can just specify the location of the file in local drive?

from google.colab import files
uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

回答1:


!pip install kaggle
api_token = {"username":"USERNAME","key":"API_KEY"}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
    json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config set -n path -v /content
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge')
for file in os.listdir():
    zip_ref = zipfile.ZipFile(file, 'r')
    zip_ref.extractall()
    zip_ref.close()

There is minor change on line 9, without which was encountering error. source: https://gist.github.com/jayspeidell/d10b84b8d3da52df723beacc5b15cb27 couldn't add as comment cause rep.




回答2:


You may refer with these threads:

  • Import data into Google Colaboratory
  • Load local data files to Colaboratory

Also check out the I/O example notebook. Example, for access to xls files, you'll want to upload the file to Google Sheets. Then, you can use the gspread recipes in the same I/O example notebook.




回答3:


You may need to use kaggle-cli module to help with the download.

It’s discussed in this fast.ai thread.




回答4:


I just wrote this script that downloads and extracts data from the Kaggle API to a Colab notebook. You just need to paste in your username, API key, and competition name.

https://gist.github.com/jayspeidell/d10b84b8d3da52df723beacc5b15cb27

The manual upload function in Colab is kind of buggy now, and it's better to download files via wget or an API service anyway because you start with a fresh VM each time you open the notebook. This way the data will download automatically.




回答5:


Another option is to upload the data to dropbox (if it can fit), get a download link. Then in the notebook do

!wget link -0 new-name && ls


来源:https://stackoverflow.com/questions/48860586/how-to-upload-and-save-large-data-to-google-colaboratory-from-local-drive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!