creating new column based on whether the letter 'l' or 'L' is in the string of another column

问题

I am working with the Open Food Facts dataset which is very messy. There is a column called quantity in which in information about the quantity of respective food. the entries look like:

365 g (314 ml)  
992 g  
2.46 kg  
0,33 litre  
15.87oz  
250 ml   
1 L    
33 cl

... and so on (very messy!!!) I want to create a new column called is_liquid. My idea is that if the quantity string contains an l or L the is_liquid field in this row should get a 1 and if not 0. Here is what I've tried: I wrote this function:

def is_liquid(x):
    if x.str.contains('l'):  
        return 1  
    elif x.str.contains('L'):  
        return 1  
    else: return 0

(BTW: if something is measured in 'oz' is it liquid?)

And then tried to apply it

df['is_liquid'] = df['quantity'].apply(is_liquid)

But all I get is this error:

AttributeError: 'str' object has no attribute 'str'

Could someone help me out?

回答1:

Use str.contains with case=False for boolean mask and convert it to integers by Series.astype:

df['is_liquid']= df['liquids'].str.contains('L', case=False).astype(int)
print(df)
          liquids  is_liquid
0  365 g (314 ml)          1
1           992 g          0
2         2.46 kg          0
3      0,33 litre          1
4         15.87oz          0
5         250 ml           1
6             1 L          1
7           33 cl          1

来源：https://stackoverflow.com/questions/51811914/creating-new-column-based-on-whether-the-letter-l-or-l-is-in-the-string-of-a

标签

python

regex

pandas

apply

feature-engineering

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!