问题
I asked a similar question yesterday Keep elements with pattern in pandas series without converting them to list and now I am faced with the opposite problem.
I have a pandas dataframe:
import pandas as pd
df = pd.DataFrame(["Air type:1, Space kind:2, water, wood", "berries, something at the start:4, Space blu:3, somethingelse"], columns = ['A'])
and I want to pick all elements that don't have a ":" in them. What I tried is the following regex which seems to be working:
df['new'] = df.A.str.findall('(^|\s)([^:,]+)(,|$)')
A new
0 Air type:1, Space kind:2, water, wood [( , water, ,), ( , wood, )]
1 berries, something at the start:4, Space blu:3, somethingelse [(, berries, ,), ( , somethingelse, )]
If I understand this correctly, findall searched for 3 patterns (the ones that I have in parenthesis) and returned as many as it could find in tuples wrapped in a list. Is there a way to avoid this and simply return only the middle pattern? As in for the first row: water, wood for the second row: berries, somethingelse
I also tried the opposite approach:
df.A.str.replace('[^\s,][^:,]+:[^:,]+', '').str.replace('\s*,', '')
which seems to be working close to what I want (only the commas between the patterns are missing) but I am wondering if I am missing something that would make my life easier.
回答1:
You may use this regex code:
>>> df['new'] = df.A.str.findall(r'(?:^|,)([^:,]+)(?=,|$)')
>>> print (df)
A new
0 Air type:1, Space kind:2, water, wood [ water, wood]
1 berries, something at the start:4, Space blu:3... [berries, somethingelse]
Regex used is:
(?:^|,)
: Match start or comma
([^:,]+)
: Match 1+ of any character that is not a:
and not a,
(?=,|$)
: Lookahead to assert that we have either a,
or end of line ahead
回答2:
You can use the following regex which use non-capturing group (?:)
:
df.A.str.findall(r'(?:^|\s)([^:,]{2,})(?:,|$)')
This returns the following output:
Name: A, dtype: object
0 [water, wood]
1 [berries, somethingelse]
来源:https://stackoverflow.com/questions/64981401/python-regex-to-pick-all-elements-that-dont-match-pattern