问题
Background
I have a sample df with a Text column containing 0,1, or >1 ABC's
import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith ABC: 1111111 is this here',
'ABC: 1234567 Mary Lisa Rider found here',
'Jane A Doe is also here',
'ABC: 2222222 Tom T Tucker is here ABC: 2222222 too'],
'P_ID': [1,2,3,4],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df
Text N_ID P_ID
0 Jon J Mmith ABC: 1111111 is this here A1 1
1 ABC: 1234567 Mary Lisa Rider found here A2 2
2 Jane A Doe is also here A3 3
3 ABC: 2222222 Tom T Tucker is here ABC: 2222222... A4 4
Goal
1) Change the ABC numbers in Text column (e.g ABC: 1111111) to ABC: **BLOCK**
2) Create a new column Text_ABC containing this output
Desired Output
Text N_ID P_ID Text_ABC
0 Jon J Mmith ABC: 1111111 is this here A1 1 Jon J Mmith ABC: **BLOCK** is this here
1 ABC: 1234567 Mary Lisa Rider found here A2 2 ABC: **BLOCK** Mary Lisa Hider found here
2 Jane A Doe is also here A3 3 Jane A Doe is also here
3 ABC: 2222222 Tom T Tucker is here ABC: 2222222 A4 4 ABC: **BLOCK** Tom T Tucker is here ABC: **BLOCK**
Question
How do I achieve my desired output?
回答1:
If all your numerics are to be replaced, you can do:
df['Text_ABC'] = df['Text'].replace(r'\d+', '***BLOCK***', regex=True)
But if you want to be more specific and only replace the numerics after ABC:, then you can use this:
df['Text_ABC'] = df['Text'].replace(r'ABC: \d+', 'ABC: ***BLOCK***', regex=True)
Giving you:
df
Text P_ID N_ID Text_ABC
0 Jon J Smith ABC: 1111111 is this here 1 A1 Jon J Smith ABC: ***BLOCK*** is this here
1 ABC: 1234567 Mary Lisa Rider found here 2 A2 ABC: ***BLOCK*** Mary Lisa Rider found here
2 Jane A Doe is also here 3 A3 Jane A Doe is also here
3 ABC: 2222222 Tom T Tucker is here ABC: 2222222... 4 A4 ABC: ***BLOCK*** Tom T Tucker is here ABC: ***BLOCK...
As a regex, \d+ means "match one or more consecutive digits", so using that within replace says to "replace one or more consecutive digits with ***BLOCK***"
来源:https://stackoverflow.com/questions/57031837/alter-number-string-in-pandas-column