How to upload dataset in google colaboratory?

匆匆过客 提交于 2019-12-19 10:07:16

问题


I need to upload dataset of images in google colaboratory. It has subfolder inside it which contains images. Whatever I found on the net was for the single file.

from google.colab import files

uploaded = files.upload()

Is there any way to do it?


回答1:


For uploading data to Colab, you have three methods.

Method 1

You can directly upload file or directory in Colab UI

The data is saved in Colab local machine. In my experiment, there are three features: 1) the upload speed is good. 2) it will remain directory structure but it will not unzip directly. You need to execute this code in Colab cell

!makedir {dir_name}
!unzip {zip_file} -d {dir_name}

3) Most importantly, when Colab crashes, the data will be deleted.

Method 2

Execute the code in Colab cell:

from google.colab import files
uploaded = files.upload()

In my experiment, when you run the cell, it appears the upload button. and when the cell executing indicator is still running, you choose a file. 1) After execution, the file name will appear in the result panel. 2)Refresh Colab files, you will see the file. 3) Or execute !ls, you shall see you file. If not, the file is not uploaded successfully.

Method 3

If your data is from kaggle, you can use Kaggle API to download data to Colab local directory.

Method 4

Upload data to Google Drive, you can use 1)Google Drive Web Browser or 2) Drive API (https://developers.google.com/drive/api/v3/quickstart/python). To access drive data, use the following code in Colab.

from google.colab import drive
drive.mount('/content/drive')

I would recommend uploading data to Google Drive because it is permanent.




回答2:


You need to copy your dataset into Google Drive. Then obtain the DATA_FOLDER_ID. The best way to do so, is to open the folder in your Google Drive and copy the last part of html address. For example the folder id for the link:

https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxxxxx is xxxxxxxxxxxxxxxxxxxxxxxx

Then you can create local folders and upload each file recursively.

DATA_FOLDER_ID = 'xxxxxxxxxxxxxxxxxxxxxxxx'
ROOT_PATH = '~/you_path'
!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# choose a local (colab) directory to store the data.
local_root_path = os.path.expanduser(ROOT_PATH)
try:
  os.makedirs(local_root_path)
except: pass

def ListFolder(google_drive_id, destination):
  file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % google_drive_id}).GetList()
  counter = 0
  for f in file_list:
    # If it is a directory then, create the dicrectory and upload the file inside it
    if f['mimeType']=='application/vnd.google-apps.folder': 
      folder_path = os.path.join(destination, f['title'])
      os.makedirs(folder_path)
      print('creating directory {}'.format(folder_path))
      ListFolder(f['id'], folder_path)
    else:
      fname = os.path.join(destination, f['title'])
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)
      counter += 1
  print('{} files were uploaded in {}'.format(counter, destination))

ListFolder(DATA_FOLDER_ID, local_root_path)


来源:https://stackoverflow.com/questions/50525568/how-to-upload-dataset-in-google-colaboratory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!