问题
I need to upload dataset of images in google colaboratory. It has subfolder inside it which contains images. Whatever I found on the net was for the single file.
from google.colab import files
uploaded = files.upload()
Is there any way to do it?
回答1:
For uploading data to Colab, you have three methods.
Method 1
You can directly upload file or directory in Colab UI
The data is saved in Colab local machine. In my experiment, there are three features: 1) the upload speed is good. 2) it will remain directory structure but it will not unzip directly. You need to execute this code in Colab cell
!makedir {dir_name}
!unzip {zip_file} -d {dir_name}
3) Most importantly, when Colab crashes, the data will be deleted.
Method 2
Execute the code in Colab cell:
from google.colab import files
uploaded = files.upload()
In my experiment, when you run the cell, it appears the upload button. and when the cell executing indicator is still running, you choose a file. 1) After execution, the file name will appear in the result panel. 2)Refresh Colab files, you will see the file. 3) Or execute !ls
, you shall see you file. If not, the file is not uploaded successfully.
Method 3
If your data is from kaggle, you can use Kaggle API to download data to Colab local directory.
Method 4
Upload data to Google Drive, you can use 1)Google Drive Web Browser or 2) Drive API (https://developers.google.com/drive/api/v3/quickstart/python). To access drive data, use the following code in Colab.
from google.colab import drive
drive.mount('/content/drive')
I would recommend uploading data to Google Drive because it is permanent.
回答2:
You need to copy your dataset into Google Drive. Then obtain the DATA_FOLDER_ID. The best way to do so, is to open the folder in your Google Drive and copy the last part of html address. For example the folder id for the link:
https://drive.google.com/drive/folders/xxxxxxxxxxxxxxxxxxxxxxxx
is xxxxxxxxxxxxxxxxxxxxxxxx
Then you can create local folders and upload each file recursively.
DATA_FOLDER_ID = 'xxxxxxxxxxxxxxxxxxxxxxxx'
ROOT_PATH = '~/you_path'
!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_root_path = os.path.expanduser(ROOT_PATH)
try:
os.makedirs(local_root_path)
except: pass
def ListFolder(google_drive_id, destination):
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % google_drive_id}).GetList()
counter = 0
for f in file_list:
# If it is a directory then, create the dicrectory and upload the file inside it
if f['mimeType']=='application/vnd.google-apps.folder':
folder_path = os.path.join(destination, f['title'])
os.makedirs(folder_path)
print('creating directory {}'.format(folder_path))
ListFolder(f['id'], folder_path)
else:
fname = os.path.join(destination, f['title'])
f_ = drive.CreateFile({'id': f['id']})
f_.GetContentFile(fname)
counter += 1
print('{} files were uploaded in {}'.format(counter, destination))
ListFolder(DATA_FOLDER_ID, local_root_path)
来源:https://stackoverflow.com/questions/50525568/how-to-upload-dataset-in-google-colaboratory