import pandas as pd
import numpy as np
import cv2
from torch.utils.data.dataset import Dataset
class CustomDatasetFromCSV(Dataset):
def __init__(self, csv_path,
This is the PyTorch Subset
class attached holding the random_split
method. Note that this method is base for the SubsetRandomSampler
.
For MNIST if we use random_split
:
loader = DataLoader(
torchvision.datasets.MNIST('/data/mnist', train=True, download=True,
transform=torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.5,), (0.5,))
])),
batch_size=16, shuffle=False)
print(loader.dataset.data.shape)
test_ds, valid_ds = torch.utils.data.random_split(loader.dataset, (50000, 10000))
print(test_ds, valid_ds)
print(test_ds.indices, valid_ds.indices)
print(test_ds.indices.shape, valid_ds.indices.shape)
We get:
torch.Size([60000, 28, 28])
tensor([ 1520, 4155, 45472, ..., 37969, 45782, 34080]) tensor([ 9133, 51600, 22067, ..., 3950, 37306, 31400])
torch.Size([50000]) torch.Size([10000])
Our test_ds.indices
and valid_ds.indices
will be random from range (0, 600000)
. But if I would like to get sequence of indices from (0, 49999)
and from (50000, 59999)
I cannot do that at the moment unfortunately, except this way.
Handy in case you run the MNIST benchmark where it is predefined what should be the test and what should be the validation dataset.