How to select DataFrame columns based on partial matching?

后端 未结 3 473
暖寄归人
暖寄归人 2020-12-14 22:48

I was struggling this afternoon to find a way of selecting few columns of my Pandas DataFrame, by checking the occurrence of a certain pattern in their name (label?).

<
相关标签:
3条回答
  • 2020-12-14 23:00

    I think df.keys().tolist() is the thing you're searching for.

    A tiny example:
    
    from pandas import DataFrame as df
    
    d = df({'somename': [1,2,3], 'othername': [4,5,6]})
    
    names = d.keys().tolist()
    
    for n in names:
        print n
        print type(n)
    

    Output:

    othername
    type 'str'
    
    somename
    type 'str'
    

    Then with the strings you got, you can do any string operation you want.

    0 讨论(0)
  • 2020-12-14 23:04

    Select column by partial string, can simply be done, via:

    df.filter(like='hello')  # select columns which contain the word hello
    

    And to select rows by partial string match, you can pass axis=0 to filter:

    df.filter(like='hello', axis=0) 
    
    0 讨论(0)
  • Your solution using map is very good. If you really want to use str.contains, it is possible to convert Index objects to Series (which have the str.contains method):

    In [1]: df
    Out[1]: 
       x  y  z
    0  0  0  0
    1  1  1  1
    2  2  2  2
    3  3  3  3
    4  4  4  4
    5  5  5  5
    6  6  6  6
    7  7  7  7
    8  8  8  8
    9  9  9  9
    
    In [2]: df.columns.to_series().str.contains('x')
    Out[2]: 
    x     True
    y    False
    z    False
    dtype: bool
    
    In [3]: df[df.columns[df.columns.to_series().str.contains('x')]]
    Out[3]: 
       x
    0  0
    1  1
    2  2
    3  3
    4  4
    5  5
    6  6
    7  7
    8  8
    9  9
    

    UPDATE I just read your last paragraph. From the documentation, str.contains allows you to pass a regex by default (str.contains('^myregex'))

    0 讨论(0)
提交回复
热议问题