Panda .loc or .iloc to select the columns from a dataset

前端 未结 2 1566
暗喜
暗喜 2020-12-19 15:08

I have been trying to select a particular set of columns from a dataset for all the rows. I tried something like below.

train_features = train_df.loc[,[0,4,5         


        
2条回答
  •  Happy的楠姐
    2020-12-19 15:14

    You can access the column values via the the underlying numpy array

    Consider the dataframe df

    df = pd.DataFrame(np.random.randint(10, size=(5, 20)))
    df
    

    You can slice the underlying array

    slc = [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
    df.values[:, slc]
    
    array([[1, 3, 9, 8, 3, 2, 1, 6, 6, 0, 3, 9, 8, 5, 9, 9],
           [8, 0, 2, 3, 7, 8, 9, 2, 7, 2, 1, 3, 2, 5, 4, 9],
           [1, 1, 9, 3, 5, 8, 8, 8, 8, 4, 8, 0, 5, 4, 9, 0],
           [6, 3, 1, 8, 0, 3, 7, 9, 9, 0, 9, 7, 6, 1, 4, 8],
           [3, 2, 3, 3, 9, 8, 3, 8, 3, 4, 1, 6, 4, 1, 6, 4]])
    

    Or you can reconstruct a new dataframe from this slice

    pd.DataFrame(df.values[:, slc], df.index, df.columns[slc])
    

    This is not as clean and intuitive as

    df.iloc[:, slc]
    

    You could also use slc to slice the df.columns object and pass that to df.loc

    df.loc[:, df.columns[slc]]
    

提交回复
热议问题