How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

前端 未结 3 1643
野性不改
野性不改 2020-12-04 10:31

I\'m wondering if there is a simpler, memory efficient way to select a subset of rows and columns from a pandas DataFrame.

For instance, given this dataframe:

<
3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-04 10:48

    .loc accept row and column selectors simultaneously (as do .ix/.iloc FYI) This is done in a single pass as well.

    In [1]: df = DataFrame(np.random.rand(4,5), columns = list('abcde'))
    
    In [2]: df
    Out[2]: 
              a         b         c         d         e
    0  0.669701  0.780497  0.955690  0.451573  0.232194
    1  0.952762  0.585579  0.890801  0.643251  0.556220
    2  0.900713  0.790938  0.952628  0.505775  0.582365
    3  0.994205  0.330560  0.286694  0.125061  0.575153
    
    In [5]: df.loc[df['c']>0.5,['a','d']]
    Out[5]: 
              a         d
    0  0.669701  0.451573
    1  0.952762  0.643251
    2  0.900713  0.505775
    

    And if you want the values (though this should pass directly to sklearn as is); frames support the array interface

    In [6]: df.loc[df['c']>0.5,['a','d']].values
    Out[6]: 
    array([[ 0.66970138,  0.45157274],
           [ 0.95276167,  0.64325143],
           [ 0.90071271,  0.50577509]])
    

提交回复
热议问题