Convert Pandas series containing string to boolean

前端 未结 4 1108
慢半拍i
慢半拍i 2020-12-01 07:58

I have a DataFrame named df as

  Order Number       Status
1         1668  Undelivered
2        19771  Undelivered
3    100032108  Undelivered
4         


        
相关标签:
4条回答
  • 2020-12-01 08:15

    You can just use map:

    In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
                                         'SomethingElse']})
    
    In [8]: df
    Out[8]:
              Status
    0      Delivered
    1      Delivered
    2    Undelivered
    3  SomethingElse
    
    In [9]: d = {'Delivered': True, 'Undelivered': False}
    
    In [10]: df['Status'].map(d)
    Out[10]:
    0     True
    1     True
    2    False
    3      NaN
    Name: Status, dtype: object
    
    0 讨论(0)
  • 2020-12-01 08:16

    An example of replace method to replace values only in the specified column C2 and get result as DataFrame type.

    import pandas as pd
    df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})
    
      C1 C2
    0  X  Y
    1  Y  Y
    2  X  X
    3  Y  X
    
    df.replace({'C2': {'X': True, 'Y': False}})
    
      C1     C2
    0  X  False
    1  Y  False
    2  X   True
    3  Y   True
    
    0 讨论(0)
  • 2020-12-01 08:21

    Expanding on the previous answers:

    Map method explained:

    • Pandas will lookup each row's value in the corresponding d dictionary, replacing any found keys with values from d.
    • Values without keys in d will be set as NaN. This can be corrected with fillna() methods.
    • Does not work on multiple columns, since pandas operates through serialization of pd.Series here.
    • Documentation: pd.Series.map
    d = {'Delivered': True, 'Undelivered': False}
    df["Status"].map(d)
    

    Replace method explained:

    • Pandas will lookup each row's value in the corresponding d dictionary, and attempt to replace any found keys with values from d.
    • Values without keys in d will be be retained.
    • Works with single and multiple columns (pd.Series or pd.DataFrame objects).
    • Documentation: pd.DataFrame.replace
    d = {'Delivered': True, 'Undelivered': False}
    df["Status"].replace(d)
    

    Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.

    0 讨论(0)
  • 2020-12-01 08:26

    You've got everything you need. You'll be happy to discover replace:

    df.replace(d)
    
    0 讨论(0)
提交回复
热议问题