How do I create a new column in pandas from the difference of two string columns?

问题

How can I create a new column in pandas that is the result of the difference of two other columns consisting of strings?

I have one column titled "Good_Address" which has entries like "123 Fake Street Apt 101" and another column titled "Bad_Address" which has entries like "123 Fake Street". I want the output in column "Address_Difference" to be " Apt101".

I've tried doing:

import pandas as pd
data = pd.read_csv("AddressFile.csv")
data['Address Difference'] = data['GOOD_ADR1'].replace(data['BAD_ADR1'],'') 
data['Address Difference']

but this does not work. It seems that the result is just equal to "123 Fake Street Apt101" (good address in the example above).

I've also tried:

data['Address Difference'] = data['GOOD_ADR1'].str.replace(data['BAD_ADR1'],'')

but this yields an error saying 'Series' objects are mutable, thus they cannot be hashed.

Any help would be appreciated.

Thanks

回答1:

Using replace with regex

data['Address Difference']=data['GOOD_ADR1'].replace(regex=r'(?i)'+ data['BAD_ADR1'],value="")

回答2:

I'd use a function that we can map across inputs. This should be fast.

The function will use str.find to see if the other string is a subset. If the result of str.find is -1 then the substring could not be found. Otherwise, extricate the substring given the position it was found and the length of the substring.

def rm(x, y):
  i = x.find(y)
  if i > -1:
    j = len(y)
    return x[:i] + x[i+j:]
  else:
    return x

df['Address Difference'] = [*map(rm, df.GOOD_ADR1, df.BAD_ADR1)]

df

          BAD_ADR1                GOOD_ADR1 Address Difference
0  123 Fake Street  123 Fake Street Apt 101            Apt 101

回答3:

You can replace the bad address part from good address

df['Address_Difference'] = df['Good_Address'].replace(df['Bad_Address'], '', regex = True).str.strip()


    Bad_Address     Good_Address            Address_Difference
0   123 Fake Street 123 Fake Street Apt 101 Apt 101

来源：https://stackoverflow.com/questions/53288887/how-do-i-create-a-new-column-in-pandas-from-the-difference-of-two-string-columns

标签

python

regex

pandas