Alter text in pandas column based on names

问题

Background

I have the following sample df

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith is Here from **BLOCK** until **BLOCK**', 
                                   'No P_Name Found here', 
                                   'Jane Ann Doe is Also here until **BLOCK** ',
                                '**BLOCK** was **BLOCK** Tom Tcker is Not here but **BLOCK** '], 

                      'P_ID': [1,2,3,4], 
                      'P_Name' : ['Mmith, Jon J', 'Hder, Mary', 'Doe, Jane Ann', 'Tcker, Tom'],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name']]
df


                         Text                       N_ID    P_ID    P_Name
0   Jon J Mmith is Here from **BLOCK** until **BLOCK**  A1        1 Mmith, Jon J
1   No P_Name Found here                            A2        2 Hder, Mary
2   Jane Ann Doe is Also here until **BLOCK**           A3        3 Doe, Jane Ann
3   **BLOCK** was **BLOCK** Tom Tcker is Not here but  A4         4 Hcker, Tom

Goal

1) In Text column, add **BLOCK** to the value (e.g. Jon J Mmith) that corresponds to the value found in P_Name

Desired Output

                         Text                       N_ID    P_ID    P_Name
0   **BLOCK** is Here from **BLOCK** until **BLOCK**        A1        1 Mmith, Jon J
1   No P_Name Found here                            A2        2 Hder, Mary
2   **BLOCK** is Also here until **BLOCK**              A3        3 Doe, Jane Ann
3   **BLOCK** was **BLOCK** **BLOCK** is Not here but     A4          4 Tcker, Tom

The desired output can occur in the same Text col or a new_col can be made

Question

How do I achieve my desired output?

回答1:

One way:

>>> df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',regex=True)
0           **BLOCK** is here from **BLOCK** until **BLOCK**
1                                 No P_Name found here
2                  **BLOCK** is also here until **BLOCK**
3    **BLOCK** was **BLOCK** **BLOCK** is not here but **...

You can use replace=True to do this in place, or create a new column with df['new_col']= the above. What this does is splits the P_name column, joins it in reverse with a space, and replaces it in your Text column.

来源：https://stackoverflow.com/questions/57029538/alter-text-in-pandas-column-based-on-names

标签

regex

python-3.x

pandas

text

nlp