pandas: How do I split text in a column into multiple rows?

后端 未结 7 1330
说谎
说谎 2020-11-22 09:47

I\'m working with a large csv file and the next to last column has a string of text that I want to split by a specific delimiter. I was wondering if there is a simple way to

7条回答
  •  暖寄归人
    2020-11-22 10:18

    Can also use groupby() with no need to join and stack().

    Use above example data:

    import pandas as pd
    import numpy as np
    
    
    df = pd.DataFrame({'ItemQty': {0: 3, 1: 25}, 
                       'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'}, 
                       'ItemExt': {0: 60, 1: 300}, 
                       'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'}, 
                       'CustNum': {0: 32363, 1: 31316}, 
                       'Item': {0: 'F04', 1: 'F01'}}, 
                        columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt']) 
    print(df)
    
       CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
    0  32363    McCartney, Paul  3        F04  2:218:10:4,6               60     
    1  31316    Lennon, John     25       F01  1:13:36:1,12 1:13:37:1,13  300  
    
    
    #first define a function: given a Series of string, split each element into a new series
    def split_series(ser,sep):
        return pd.Series(ser.str.cat(sep=sep).split(sep=sep)) 
    #test the function, 
    split_series(pd.Series(['a b','c']),sep=' ')
    0    a
    1    b
    2    c
    dtype: object
    
    df2=(df.groupby(df.columns.drop('Seatblocks').tolist()) #group by all but one column
              ['Seatblocks'] #select the column to be split
              .apply(split_series,sep=' ') # split 'Seatblocks' in each group
             .reset_index(drop=True,level=-1).reset_index()) #remove extra index created
    
    print(df2)
       CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
    0    31316     Lennon, John       25  F01      300  1:13:36:1,12
    1    31316     Lennon, John       25  F01      300  1:13:37:1,13
    2    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
    

提交回复
热议问题