Separate pandas dataframe using sklearn's KFold

前端 未结 1 2021
萌比男神i
萌比男神i 2020-12-08 11:37

I had obtained the index of training set and testing set with code below.

df = pandas.read_pickle(filepath + filename)
kf = KFold(n_splits = n_splits, shuffl         


        
相关标签:
1条回答
  • 2020-12-08 12:01

    You need DataFrame.iloc for select rows by positions:

    Sample:

    np.random.seed(100)
    df = pd.DataFrame(np.random.random((10,5)), columns=list('ABCDE'))
    df.index = df.index * 10
    print (df)
               A         B         C         D         E
    0   0.543405  0.278369  0.424518  0.844776  0.004719
    10  0.121569  0.670749  0.825853  0.136707  0.575093
    20  0.891322  0.209202  0.185328  0.108377  0.219697
    30  0.978624  0.811683  0.171941  0.816225  0.274074
    40  0.431704  0.940030  0.817649  0.336112  0.175410
    50  0.372832  0.005689  0.252426  0.795663  0.015255
    60  0.598843  0.603805  0.105148  0.381943  0.036476
    70  0.890412  0.980921  0.059942  0.890546  0.576901
    80  0.742480  0.630184  0.581842  0.020439  0.210027
    90  0.544685  0.769115  0.250695  0.285896  0.852395
    

    from sklearn.model_selection import KFold
    
    #added some parameters
    kf = KFold(n_splits = 5, shuffle = True, random_state = 2)
    result = next(kf.split(df), None)
    print (result)
    (array([0, 2, 3, 5, 6, 7, 8, 9]), array([1, 4]))
    
    train = df.iloc[result[0]]
    test =  df.iloc[result[1]]
    
    print (train)
               A         B         C         D         E
    0   0.543405  0.278369  0.424518  0.844776  0.004719
    20  0.891322  0.209202  0.185328  0.108377  0.219697
    30  0.978624  0.811683  0.171941  0.816225  0.274074
    50  0.372832  0.005689  0.252426  0.795663  0.015255
    60  0.598843  0.603805  0.105148  0.381943  0.036476
    70  0.890412  0.980921  0.059942  0.890546  0.576901
    80  0.742480  0.630184  0.581842  0.020439  0.210027
    90  0.544685  0.769115  0.250695  0.285896  0.852395
    
    print (test)
               A         B         C         D         E
    10  0.121569  0.670749  0.825853  0.136707  0.575093
    40  0.431704  0.940030  0.817649  0.336112  0.175410
    
    0 讨论(0)
提交回复
热议问题