How to Upload Many Files to Google Colab?

☆樱花仙子☆ 提交于 2019-11-30 09:36:21

You can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.

Step 1: Transfer your data into your google drive.

Step 2: Run the following code to mount you google drive.

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse



# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()


# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}


# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive


!ls My Drive/

# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt

Step 3: Run the following line to check if you can see your desired data into mounted drive.

!ls Drive

Step 4:

Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.

train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')

I hope it can help.

Edit

For downloading the data into your drive from the colab notebook environment, you can run the following code.

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials



# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)



# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

Here are few steps to upload large dataset to Google Colab

1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox)
2.Create a shareable link of your uploaded file and copy it.
3.Open your notebook in Google Colab and run this command in one of the cell:

    !wget your_shareable_file_link

That's it!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:

    !unzip downloaded_filename -d destination_folder
Deepak Ravi

Zip you file first then upload it to Google Drive.

See this simple command to unzip:

!unzip {file_location}

Example:

!unzip drive/models.rar

You may want to try the kaggle-cli module, as discussed here

Step1: Mount the Drive, by running the following command:

from google.colab import drive
drive.mount('/content/drive')

This will output a link. Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it. This process is just giving permission for colab to access your Google Drive.

Step2: Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive

Step3: Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.

This process may look something like this: The current working directory in colab when you start off will be /content/ Just to make sure, run the following command in the cell:

!pwd

It will show you the current directory you are in. (pwd stands for "print working directory") Then use the commands like:

!ls

to list the directories and files in the directory you are in and the command:

!cd /directory/name/of/your/choice

to move into the directories to locate your uploaded folder or the uploaded .zip file.

And just like that, you are ready to get your hands dirty with your Machine Learning model! :)

Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!