How to split a column into three columns in pandas

后端 未结 3 1140
悲&欢浪女
悲&欢浪女 2021-01-21 17:26

I have a data frame as shown below

ID  Name     Address
1   Kohli    Country: India; State: Delhi; Sector: SE25
2   Sachin   Country: India; State: Mumbai; Secto         


        
3条回答
  •  情书的邮戳
    2021-01-21 17:58

    Original answer

    This can also do the job:

    import pandas as pd
    
    df = pd.DataFrame(
     [
         {'ID': 1, 'Name': 'Kohli', 'Address': 'Country: India; State: Delhi; Sector: SE25'},
         {'ID': 2, 'Name': 'Sachin','Address': 'Country: India; State: Mumbai; Sector: SE39'},
         {'ID': 3,'Name': 'Ponting','Address': 'Country: Australia; State: Tasmania'}
     ]
    )
    
    cols_to_extract = ['ZONE', 'State', 'Sector']
    list_of_rows = df['Address'].str.split(';', 2).tolist()
    df[cols_to_extract] = pd.DataFrame(
        [[item.split(': ')[1] for item in row] for row in list_of_rows], 
        columns=cols_to_extract)
    

    Output would be the following:

    >> df[['ID', 'Name', 'ZONE', 'State', 'Sector']]
    
    ID  Name    ZONE       State     Sector
    1   Kohli   India      Delhi     SE25
    2   Sachin  India      Mumbai    SE39
    3   Ponting Australia  Tasmania  None
    

    Edited answer

    As @jezrael pointed out very well in question comment, my original answer was wrong, because it aligned values by position and could tend to wrong key - value pairs, when some of the values were NaNs. The following code should work on edited data set.

    import pandas as pd
    
    df = pd.DataFrame(
     [
         {'ID': 1, 'Name': 'Kohli', 'Address': 'Country: India; State: Delhi; Sector: SE25'},
         {'ID': 2, 'Name': 'Sachin','Address': 'Country: India; State: Mumbai; Sector: SE39'},
         {'ID': 3,'Name': 'Ponting','Address': 'Country: Australia; State: Tasmania'},
         {'ID': 4, 'Name': 'Ponting','Address': 'State: Tasmania; Sector: SE27'}
     ]
    )
    
    cols_to_extract = ['Country', 'State', 'Sector']
    list_of_rows = df['Address'].str.split(';', 2).tolist()
    df[cols_to_extract] = pd.DataFrame(
        [{item.split(': ')[0].strip(): item.split(': ')[1] for item in row} for row in list_of_rows], 
        columns=cols_to_extract)
    df = df.rename(columns={'Country': 'ZONE'})
    

    Output would be:

    >> df[['ID', 'Name', 'ZONE', 'State', 'Sector']]
    
    ID  Name    ZONE       State     Sector
    1   Kohli   India      Delhi     SE25
    2   Sachin  India      Mumbai    SE39
    3   Ponting Australia  Tasmania  NaN
    3   Ponting NaN        Tasmania  SE27  
    

提交回复
热议问题