发表新帖

发表新帖

Most efficient way to use a large data set for PyTorch?

后端未结

关注

 3  1296

爱一瞬间的悲伤 2021-02-09 08:18

Perhaps this question has been asked before, but I\'m having trouble finding relevant info for my situation.

I\'m using PyTorch to create a CNN for regression with image

3条回答

花落未央 (楼主)

2021-02-09 08:51
Here is a concrete example to demonstrate what I meant. This assumes that you've already dumped the images into an hdf5 file (train_images.hdf5) using h5py.
```
import h5py
hf = h5py.File('train_images.hdf5', 'r')

group_key = list(hf.keys())[0]
ds = hf[group_key]

# load only one example
x = ds[0]

# load a subset, slice (n examples) 
arr = ds[:n]

# should load the whole dataset into memory.
# this should be avoided
arr = ds[:]
```
In simple terms, ds can now be used as an iterator which gives images on the fly (i.e. it doesn't load anything in memory). This should make the whole run time blazing fast.
```
for idx, img in enumerate(ds):
   # do something with `img`
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题