Can't load load my dataset to train my model on Google Colab

做~自己de王妃 提交于 2020-06-09 05:23:28

问题


I am currently facing the problems of dealing with a large dataset, I can not download the dataset directly into google colab due to the limited space google colab provides(37 GB) I have done some research and it seems that it depends on the GPU we get assigned, for some people the available space on the disk could be more. So my question is, can I download the dataset on a server such as Google Cloud on then load it from the server. The dataset is roughly 20 GB, the reason why 37 GB is not enough is that when u download a zip file it will require to extract the files so an additional 20GB will be required, but if I download and extract the file on the server, I would only use 20 GB on google colab, any other suggestion is welcome, my end goal is to find a solution to have a model trained on the coco dataset.


回答1:


One more approach could be uploading just the annotations file to Google Colab. There's no need to download the image dataset. We will make use of the PyCoco API. Next, when preparing an image, instead of accessing the image file from Drive / local folder, you can read the image file with the URL!

# The normal method. Read from folder / Drive
I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name']))

# Instead, use this! Url to load image
I = io.imread(img['coco_url'])

This method will save you plenty of space, download time and effort. However, you'll require a working internet connection during training to fetch the images (which of course you have, since you are using colab).

If you are interested in exploring the COCO dataset more, you can have a look at my post on medium.



来源:https://stackoverflow.com/questions/61237760/cant-load-load-my-dataset-to-train-my-model-on-google-colab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!