Converting a Panda DF List into a string

前端 未结 4 555
悲哀的现实
悲哀的现实 2020-11-30 10:03

I have a panda data frame. One of the columns contains a list. I want that column to be a single string.

For example my list [\'one\',\'two\',\'three\'] should sim

相关标签:
4条回答
  • 2020-11-30 10:28

    Pandas offers a method for this, Series.str.join.

    0 讨论(0)
  • 2020-11-30 10:35

    When you cast col to str with astype, you get a string representation of a python list, brackets and all. You do not need to do that, just apply join directly:

    import pandas as pd
    
    df = pd.DataFrame({
        'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
        })
    
    # Out[8]: 
    #            A
    # 0  [a, b, c]
    # 1  [A, B, C]
    
    df['Joined'] = df.A.apply(', '.join)
    
    #            A   Joined
    # 0  [a, b, c]  a, b, c
    # 1  [A, B, C]  A, B, C
    
    0 讨论(0)
  • 2020-11-30 10:36

    You could convert your list to str with astype(str) and then remove ', [, ] characters. Using @Yakim example:

    In [114]: df
    Out[114]:
               A
    0  [a, b, c]
    1  [A, B, C]
    
    In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
    Out[115]:
    0    a, b, c
    1    A, B, C
    Name: A, dtype: object
    

    Timing

    import pandas as pd
    df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
    df = pd.concat([df]*1000)
    
    
    In [2]: timeit df['A'].apply(', '.join)
    292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [3]: timeit df['A'].str.join(', ')
    368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
    505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
    2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    0 讨论(0)
  • 2020-11-30 10:42

    You should certainly not convert to string before you transform the list. Try:

    df['col'].apply(', '.join)
    

    Also note that apply applies the function to the elements of the series, so using df['col'] in the lambda function is probably not what you want.


    Edit: thanks Yakym for pointing out that there is no need for a lambda function.

    Edit: as noted by Anton Protopopov, there is a native .str.join method, but it is (surprisingly) a bit slower than apply.

    0 讨论(0)
提交回复
热议问题