Python Pandas removing substring using another column

前端 未结 3 1610
自闭症患者
自闭症患者 2020-12-17 18:58

I\'ve tried searching around and can\'t figure out an easy way to do this, so I\'m hoping your expertise can help.

I have a pandas data frame with two columns

<
3条回答
  •  春和景丽
    2020-12-17 19:12

    You could do it with replace method and regex argument and then use str.strip:

    In [605]: testing.FULL_NAME.replace(testing.NAME[testing.NAME.notnull()], '', regex = True).str.strip()
    Out[605]: 
    0            LAST
    1             NaN
    2      FIRST LAST
    3           FIRST
    4     FIRST  LAST
    5    ANOTHER NAME
    6       LAST NAME
    Name: FULL_NAME, dtype: object
    

    Note You need to pass notnull to testing.NAME because without it NaN values also will be replaced to empty string

    Benchmarking is slower then fastest @johnchase solution but I think it's more readable and use all pandas methods of DataFrames and Series:

    In [607]: %timeit testing['NEW'] = testing.FULL_NAME.replace(testing.NAME[testing.NAME.notnull()], '', regex = True).str.strip()
    100 loops, best of 3: 4.56 ms per loop
    
    In [661]: %timeit testing ['NEW'] = [e.replace(k, '') for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
    1000 loops, best of 3: 450 µs per loop
    

提交回复
热议问题