Keep elements with pattern in pandas series without converting them to list

让人想犯罪 __ 提交于 2021-01-28 06:25:35

问题


I have the following dataframe:

df = pd.DataFrame(["Air type:1, Space kind:2, water", "something, Space blu:3, somethingelse"], columns = ['A'])

and I want to create a new column that contains for each row all the elements that have a ":" in them. So for example in the first row I want to return "type:1, kind:2" and for the second row I want "blu:3". I managed by using a list comprehension in the following way:

df['new'] = [[y for y in x  if ":" in y] for x in df['A'].str.split(",")]

But my issue is that the new column contains list elements.

    A                                                       new
0   Air type:1, Space kind:2, water                         [Air type:1, Space kind:2]
1   something at the start:4, Space blu:3, somethingelse    [something at the start:4, Space blu:3]

I have not used Python a lot so I am not 100% whether I am missing a more Pandas specific way to do this. If there is one, more than happy to learn about it and use it. If this is a correct approach how can I convert the elements back into strings in order to do regexes on them? I tried How to concatenate items in a list to a single string? but this is not working as I would like it to.


回答1:


You can use pd.Series.str.findall here.

df['new'] = df['A'].str.findall('\w+:\w+')

                                 A               new
0            type:1, kind:2, water  [type:1, kind:2]
1  something, blu:3, somethingelse           [blu:3]

EDIT:

When there are multiple words then try

df['new'] = df['A'].str.findall('[^\s,][^:,]+:[^:,]+').str.join(', ')

                                      A                       new
0        Air type:1, Space kind:2, water  Air type:1, Space kind:2
1  something, Space blu:3, somethingelse               Space blu:3



回答2:


You can use findall with join:

import pandas as pd
df = pd.DataFrame(["type:1, kind:2, water", "something, blu:3, somethingelse"], columns = ['A'])
df['new'] = df['A'].str.findall(r'[^\s:,]+:[^\s,]+').str.join(', ')
df['new']
# => 0    type:1, kind:2
# => 1             blu:3

The regex matches

  • [^\s:,]+ - one or more chars other than whitespace, : and ,
  • : - a colon
  • [^\s,]+ - one or more chars other than whitespace and ,.

See the regex demo.

The .str.join(', ') concats all the found matches with ,+space.



来源:https://stackoverflow.com/questions/64958329/keep-elements-with-pattern-in-pandas-series-without-converting-them-to-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!