Python Pandas removing substring using another column

前端 未结 3 1621
自闭症患者
自闭症患者 2020-12-17 18:58

I\'ve tried searching around and can\'t figure out an easy way to do this, so I\'m hoping your expertise can help.

I have a pandas data frame with two columns

<
3条回答
  •  北荒
    北荒 (楼主)
    2020-12-17 19:23

    Here is one solution that is quite a bit faster than your current solution, I'm not convinced that there wouldn't be something faster though

    In [13]: import numpy as np
             import pandas as pd
             n = 1000
             testing  = pd.DataFrame({'NAME':[
             'FIRST', np.nan, 'NAME2', 'NAME3', 
             'NAME4', 'NAME5', 'NAME6']*n, 'FULL_NAME':['FIRST LAST', np.nan, 'FIRST  LAST', 'FIRST NAME3', 'FIRST NAME4 LAST', 'ANOTHER NAME', 'LAST NAME']*n})
    

    This is kind of a long one liner but it should do what you need

    Fasted solution I can come up with is using replace as mentioned in another answer:

    In [37]: %timeit testing ['NEW2'] = [e.replace(k, '') for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
    100 loops, best of 3: 4.67 ms per loop
    

    Original answer:

    In [14]: %timeit testing ['NEW'] = [''.join(str(e).split(k)) for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
    100 loops, best of 3: 7.24 ms per loop
    

    compared to your current solution:

    In [16]: %timeit testing['NEW1'] = testing.apply(address_remove, axis=1)
    10 loops, best of 3: 166 ms per loop
    

    These get you the same answer as your current solution

提交回复
热议问题