问题
Hello problem to loop over a column by searching a list of word then create a Boolean column if any of the list word searched is found. Here is my code
# Code naf related to sport.
code = ["3230Z","4764Z","7721Z","8551Z","9311Z", "9312Z", "9313Z", "9319Z",
"9329Z", "364Z" "524W", "714B", "804C", "926A", "926C", "930L", "927C",
"923K"]
# check keywords of code into "Code_Naf" column
for branch in code:
df_codeNaf["topNAF"] = df_codeNaf["Code_NAF"].str.contains("3230Z" or "4764Z" or "7721Z" or "8551Z"
or "9311Z" or "9312Z" or "9313Z" or "9319Z"
or "9329Z" or "364Z" "524W" or "714B" or
"804C" or "926A" or "926C" or "930L" or
"927C" or "923K")
When I look in the topNaf column I found only 2 True but in reality there more than two. What's wrong with my code? Thanks
回答1:
Here a method using lambda
code = ["3230Z","4764Z","7721Z","8551Z","9311Z", "9312Z", "9313Z", "9319Z",
"9329Z", "364Z" "524W", "714B", "804C", "926A", "926C", "930L", "927C",
"923K"]
df_codeNaf["topNAF"] = df_codeNaf["Code_NAF"].apply(lambda x: True if x in code else False)
回答2:
Your problem is you change df_codeNaf['topNAF']
with every single banch
in code
. You code can be fixed by:
df_codeNaf['topNAF'] = False
for branch in code:
df_codeNaf['topNAF'] = df_codeNaf['topNAF'] | df_codeNaf['Code_NAF'].str.contains(branch).
But better yet, you can try regex
with contains
in one line:
pattern = '|'.join(code)
df_codeNaf['topNAF'] = df_codeNaf['Code_NAF'].str.contains(pattern)
来源:https://stackoverflow.com/questions/57890764/search-in-column-a-list-of-word-and-create-a-boolean-column-if-a-word-is-found