I\'m working with a large csv file and the next to last column has a string of text that I want to split by a specific delimiter. I was wondering if there is a simple way to
Can also use groupby() with no need to join and stack().
Use above example data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ItemQty': {0: 3, 1: 25},
'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'},
'ItemExt': {0: 60, 1: 300},
'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'},
'CustNum': {0: 32363, 1: 31316},
'Item': {0: 'F04', 1: 'F01'}},
columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt'])
print(df)
CustNum CustomerName ItemQty Item Seatblocks ItemExt
0 32363 McCartney, Paul 3 F04 2:218:10:4,6 60
1 31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300
#first define a function: given a Series of string, split each element into a new series
def split_series(ser,sep):
return pd.Series(ser.str.cat(sep=sep).split(sep=sep))
#test the function,
split_series(pd.Series(['a b','c']),sep=' ')
0 a
1 b
2 c
dtype: object
df2=(df.groupby(df.columns.drop('Seatblocks').tolist()) #group by all but one column
['Seatblocks'] #select the column to be split
.apply(split_series,sep=' ') # split 'Seatblocks' in each group
.reset_index(drop=True,level=-1).reset_index()) #remove extra index created
print(df2)
CustNum CustomerName ItemQty Item ItemExt Seatblocks
0 31316 Lennon, John 25 F01 300 1:13:36:1,12
1 31316 Lennon, John 25 F01 300 1:13:37:1,13
2 32363 McCartney, Paul 3 F04 60 2:218:10:4,6