How do I split a custom dataset into training and test datasets?

前端未结

关注

 5  1299

遇见更好的自我 2020-12-07 09:50

import pandas as pd
import numpy as np
import cv2
from torch.utils.data.dataset import Dataset

class CustomDatasetFromCSV(Dataset):
    def __init__(self, csv_path,


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   [愿得一人]
                                             
                
                
                (楼主)
            
              
              
                2020-12-07 10:30
              

            
            
                        
This is the PyTorch Subset class attached holding the random_split method. Note that this method is base for the SubsetRandomSampler.



For MNIST if we use random_split:

loader = DataLoader(
  torchvision.datasets.MNIST('/data/mnist', train=True, download=True,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.5,), (0.5,))
                             ])),
  batch_size=16, shuffle=False)

print(loader.dataset.data.shape)
test_ds, valid_ds = torch.utils.data.random_split(loader.dataset, (50000, 10000))
print(test_ds, valid_ds)
print(test_ds.indices, valid_ds.indices)
print(test_ds.indices.shape, valid_ds.indices.shape)


We get:

torch.Size([60000, 28, 28])
 
tensor([ 1520,  4155, 45472,  ..., 37969, 45782, 34080]) tensor([ 9133, 51600, 22067,  ...,  3950, 37306, 31400])
torch.Size([50000]) torch.Size([10000])


Our test_ds.indices and valid_ds.indices will be random from range (0, 600000). But if I would like to get sequence of indices from (0, 49999) and from (50000, 59999) I cannot do that at the moment unfortunately, except this way. 

Handy in case you run the MNIST benchmark where it is predefined what should be the test and what should be the validation dataset.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复