Pandas query function not working with spaces in column names

前端 未结 5 938
-上瘾入骨i
-上瘾入骨i 2020-11-30 10:45

I have a dataframe with spaces in column names. I am trying to use query method to get the results. It is working fine with \'c\' column but getting error for \

相关标签:
5条回答
  • 2020-11-30 11:27

    Pandas 0.25+

    As described here:

    DataFrame.query() and DataFrame.eval() now supports quoting column names with backticks to refer to names with spaces (GH6508)

    So you can use:

    a.query('`a b`==5')
    

    Pandas pre-0.25

    You cannot use pd.DataFrame.query if you have whitespace in your column name. Consider what would happen if you had columns named a, b and a b; there would be ambiguity as to what you require.

    Instead, you can use pd.DataFrame.loc:

    df = df.loc[df['a b'] == 5]
    

    Since you are only filtering rows, you can omit .loc accessor altogether:

    df = df[df['a b'] == 5]
    
    0 讨论(0)
  • 2020-11-30 11:32

    From pandas 0.25 onward you will be able to escape column names with backticks so you can do

    a.query('`a b` == 5') 
    
    0 讨论(0)
  • 2020-11-30 11:35

    I am afraid that the query method does not accept column name with empty space. In any case you can query the dataframe in this way:

    import pandas as pd
    a = pd.DataFrame({'a b':[1,2,3,4], 'c':[5,6,7,8]})
    a[a['a b']==1]
    
    0 讨论(0)
  • 2020-11-30 11:36

    Instead of using the pandas.query function I would create a condition in this case to lookup values and where the condition is True. For example:

    import pandas as pd
    a = pd.DataFrame(columns=["a b", "c"])
    a["a b"] = [1,2,3,5]
    a["c"] = [5,6,7,8]
    #a.query('a b==5') Remove the query because it cannot lookup columns with spaces in the name.
    
    condition = a['a b'] == 5
    print(a['a b'][condition])
    
    output:
    
        3    5
    

    We see that at index 3 your condition evaluates to True (if you want the specific index and not Series of Boolean values).

    0 讨论(0)
  • 2020-11-30 11:49

    It is not possible yet. Check github issue #6508:

    Note that in reality .query is just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.

    Reason is for query need string to be a valid python expression, so column names must be valid python identifiers.

    Solution is boolean indexing:

    df = df[df['a b'] == 5]
    
    0 讨论(0)
提交回复
热议问题