问题
Pandas Dataframe with col1 that contains various dates
col1
Q2 '20
Q1 '21
May '20
June '20
25/05/2020
Q4 '20+Q1 '21
Q2 '21+Q3 '21
Q4 '21+Q1 '22
I want to replace certain values in col1
that match a pattern. For the values that contain 2 quarters with "+" I want to return a season in string plus the first year contained in the pattern. I want to leave the other values as they are.
For example:
1) Q4 '20+Q1 '21 should be 'Winter 20'
2) Q2 '21+Q3 '21 should be 'Summer 21'
3) Q4 '21+Q1 '22 should be 'Winter 21'
Desired output:
col1
Q2 '20
Q1 '21
May '20
June '20
25/05/2020
Winter 20
Summer 20
Winter 21
I have tried with a few methods like replace, split, extract. But I am not solving the problem. Using a dictionary would not be helpful because the df is quite big with lots variants of the Q4 'XX+Q1 'XX and Q2 'XX +Q3 'XX
回答1:
You could do it matching multiple patterns one for each season:
df = pd.DataFrame({'col1': [
"Q2 '20",
"Q1 '21",
"May '20",
"June '20",
"25/05/2020",
"Q4 '20+Q1 '21",
"Q2 '21+Q3 '21",
"Q4 '21+Q1 '22"]})
seasons = {
r"Q4 '(\d*)\+Q1 .*": r'Winter \1',
r"Q1 '(\d*)\+Q2 .*": r'Spring \1',
r"Q2 '(\d*)\+Q3 .*": r'Summer \1',
r"Q3 '(\d*)\+Q4 .*": r'Autumn \1'
}
df.col1.replace(seasons, regex=True)
0 Q2 '20
1 Q1 '21
2 May '20
3 June '20
4 25/05/2020
5 Winter 20
6 Summer 21
7 Winter 21
Or the other version which I think is more efficient because I am matching only one regex but i use global variables so i am not sure which version is better.
seasons = {
'Q4Q1': 'Winter',
'Q1Q2': 'Spring',
'Q2Q3': 'Summer',
'Q3Q4': 'Autumn'
}
pattern = re.compile(r"(Q\d) '(\d*)\+(Q\d) .*")
def change_to_season(row):
match = pattern.match(row)
if match:
season = seasons[match.group(1) + match.group(3)]
year = match.group(2)
return season + ' ' + year
else:
return row
df.col1.apply(change_to_season)
回答2:
'''
col1
Q2 '20
Q1 '21
May '20
June '20
25/05/2020
Q4 '20+Q1 '21
Q2 '21+Q3 '21
Q4 '21+Q1 '22
'''
import pandas as pd
df = pd.read_clipboard(sep="!")
print(df)
Output:
col1
0 Q2 '20
1 Q1 '21
2 May '20
3 June '20
4 25/05/2020
5 Q4 '20+Q1 '21
6 Q2 '21+Q3 '21
7 Q4 '21+Q1 '22
.
import re
def regex_filter(val):
regex = re.compile(r"([Q][1-4])+ '(\d+)\+([Q][1-4])+ '(\d+)")
result = regex.split(val)
result = [val for val in result if val]
if 'Q3' in result:
result = 'Summer '+result[-1]
elif 'Q1' in result:
result = 'Winter '+result[1]
else:
result = ''.join(result)
return result
df['col1'] = df['col1'].apply(regex_filter)
print(df)
Output:
col1
0 Q2 '20
1 Q1 '21
2 May '20
3 June '20
4 25/05/2020
5 Winter 20
6 Summer 21
7 Summer 21
来源:https://stackoverflow.com/questions/62008620/replace-certain-values-based-on-pattern-and-extract-substring-in-pandas