How to speed up the “ImageFolder” for ImageNet

£可爱£侵袭症+ 提交于 2021-01-27 17:12:10

问题


I am in an university, and all the file system are in a remote system, wherever I log in with my account, I could aways access my home directory. even though I log into the GPU servers through SSH command. This is the condition where I employ the GPU servers to read data.

Currently, I use the PyTorch to train ResNet from scratch on ImageNet, my codes only use all the GPUs in the same computer, I found that the "torchvision.datasets.ImageFolder" will take almost two hours.

Would you please provide some experiences in how to speed up "torchvision.datasets.ImageFolder"? Thanks very much.


回答1:


Why it takes so long?
Setting up an ImageFolder can take a long time, especially when the images are stored on a slow remote disk. The reason for this latency is that the __init__ function for the dataset goes over all files in the image folders and check whether this file is an image file. For ImageNet that can take quite a while as there are over 1 million files to check.

What can you do?
- As Kevin Sun already pointed out, copying the dataset to a local (and possibly much faster) storage can significantly speed up things.
- Alternatively, you can create a modified dataset class that does not read all the files, but relies on a cached list of files - a cached list that you prepare only once in advance and to be used for all runs.



来源:https://stackoverflow.com/questions/54207204/how-to-speed-up-the-imagefolder-for-imagenet

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!