Selecting with complex criteria from pandas.DataFrame

后端 未结 4 987
孤街浪徒
孤街浪徒 2020-11-22 11:12

For example I have simple DF:

import pandas as pd
from random import randint

df = pd.DataFrame({\'A\': [randint(1, 9) for x in xrange(10)],
                         


        
4条回答
  •  温柔的废话
    2020-11-22 11:42

    Sure! Setup:

    >>> import pandas as pd
    >>> from random import randint
    >>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
                       'B': [randint(1, 9)*10 for x in range(10)],
                       'C': [randint(1, 9)*100 for x in range(10)]})
    >>> df
       A   B    C
    0  9  40  300
    1  9  70  700
    2  5  70  900
    3  8  80  900
    4  7  50  200
    5  9  30  900
    6  2  80  700
    7  2  80  400
    8  5  80  300
    9  7  70  800
    

    We can apply column operations and get boolean Series objects:

    >>> df["B"] > 50
    0    False
    1     True
    2     True
    3     True
    4    False
    5    False
    6     True
    7     True
    8     True
    9     True
    Name: B
    >>> (df["B"] > 50) & (df["C"] == 900)
    0    False
    1    False
    2     True
    3     True
    4    False
    5    False
    6    False
    7    False
    8    False
    9    False
    

    [Update, to switch to new-style .loc]:

    And then we can use these to index into the object. For read access, you can chain indices:

    >>> df["A"][(df["B"] > 50) & (df["C"] == 900)]
    2    5
    3    8
    Name: A, dtype: int64
    

    but you can get yourself into trouble because of the difference between a view and a copy doing this for write access. You can use .loc instead:

    >>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"]
    2    5
    3    8
    Name: A, dtype: int64
    >>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"].values
    array([5, 8], dtype=int64)
    >>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"] *= 1000
    >>> df
          A   B    C
    0     9  40  300
    1     9  70  700
    2  5000  70  900
    3  8000  80  900
    4     7  50  200
    5     9  30  900
    6     2  80  700
    7     2  80  400
    8     5  80  300
    9     7  70  800
    

    Note that I accidentally typed == 900 and not != 900, or ~(df["C"] == 900), but I'm too lazy to fix it. Exercise for the reader. :^)

提交回复
热议问题