Splitting Columns' Values in Pandas by delimiter without losing delimiter

时光怂恿深爱的人放手 提交于 2021-02-08 08:21:39

问题


Hi I have a dataframe that follows this format:

df = pd.DataFrame(np.array([[1, 2, 'Apples 20pk ABC123', 4, 5], [6, 7, 
'Oranges 40pk XYZ123', 9, 0], [5, 6, 'Bananas 20pk ABC123', 8, 9]]), columns=
               ['Serial #', 'Branch ID', 'Info', 'Value1', 'Value2'])

         Serial#  Branch ID    Info                  Value1   Value2
  0         1       2          Apples 20pk ABC123       4        5
  1         6       7          Bananas 20pk ABC123      9        0
  2         5       6          Oranges 40pk XYZ123      8        9

I want to split the "Info" column's values based on the "pk" character. Essentially, I want to create two new columns, like in the dataframe below:

         Serial#  Branch ID    Package        Branch   Value1   Value2
  0         1       2          Apples 20pk    ABC123      4        5
  1         6       7          Bananas 20pk   ABC123      9        0
  2         5       6          Oranges 40pk   XYZ123      8        9

I tried using:

info = df["Info"].str.split("pk ", n=1, expand=True)
df['Package'] = branch[0]
df['Branch'] = branch[1]
del df['Info']

but the result is that in df's column, 'Package', I only get "Apples 20" instead of "Apples 20pk".

I wanted to split using the " " character (a space) but, then I get three values ('Apples', '20pk', 'ABC123').

Because there are n number of rows (not just 3), I was wondering what's the most efficient way to go about this? Thanks!


回答1:


We can use regular expression here with positive lookbehind. In this case we split on a whitespace (\s) which is preceded (?<=) by the string pk:

df['Info'].str.split('(?<=pk)\s', expand=True)
              0       1
0   Apples 20pk  ABC123
1  Oranges 40pk  XYZ123
2  Bananas 20pk  ABC123

To get your expected output, we create the two columns in one go and drop Info afterwards:

df[['Package', 'Branch']] = df['Info'].str.split('(?<=pk)\s', expand=True)

df.drop('Info', axis=1, inplace=True)
  Serial # Branch ID Value1 Value2       Package  Branch
0        1         2      4      5   Apples 20pk  ABC123
1        6         7      9      0  Oranges 40pk  XYZ123
2        5         6      8      9  Bananas 20pk  ABC123



回答2:


Could you append pk to the column afterward?



来源:https://stackoverflow.com/questions/56961327/splitting-columns-values-in-pandas-by-delimiter-without-losing-delimiter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!