How to iterate over pandas dataframe and create new column

后端 未结 3 501
执笔经年
执笔经年 2021-01-01 07:15

I have a pandas dataframe that has 2 columns. I want to loop through it\'s rows and based on a string from column 2 I would like to add a string in a newly created 3th colum

相关标签:
3条回答
  • 2021-01-01 07:41

    Firstly, there is no need to loop through each and every index, just use pandas built in boolean indexing. First line here, we gather all of the values in Column2 that are the same as variable1 and set the same row in Column3 to be variable2

    df.ix[df.Column2==variable1, 'Column3'] = variable2
    df.ix[df.Column2==variable3, 'Column3'] = variable4
    

    A simple example would be

    import pandas as pd
    
    df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog']})
    print(df)
    
        Animal
    0   dog
    1   fish
    2   fish
    3   dog
    
    df.ix[df.Animal=='dog', 'Colour'] = 'brown'
    df.ix[df.Animal=='fish', 'Colour'] = 'silver'
    print(df)
    
        Animal  Colour
    0   dog     brown
    1   fish    silver
    2   fish    silver
    3   dog     brown
    

    The above method can be build on very easily using multiple conditions like & and | to boolean index.

    df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog'], 'Age': [1, 3, 2, 10]})
    print(df)
    
       Age Animal
    0    1    dog
    1    3   fish
    2    2   fish
    3   10    dog
    
    df.ix[(df.Animal=='dog') & (df.Age > 8), 'Colour'] = 'grey' # old dogs go grey
    df.ix[(df.Animal=='dog') & (df.Age <= 8), 'Colour'] = 'brown'
    df.ix[df.Animal=='fish', 'Colour'] = 'silver'
    print(df)
    
       Age Animal  Colour
    0    1    dog   brown
    1    3   fish  silver
    2    2   fish  silver
    3   10    dog    grey
    
    0 讨论(0)
  • 2021-01-01 07:48

    I think you can use double numpy.where, what is faster as loop:

    df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                    np.where(df['Column2']==variable3, variable4))
    

    And if need add variable if both conditions are False:

    df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                    np.where(df['Column2']==variable3, variable4, variable5))
    

    Sample:

    df = pd.DataFrame({'Column2':[1,2,4,3]})
    print (df)
       Column2
    0        1
    1        2
    2        4
    3        3
    
    variable1 = 1
    variable2 = 2
    variable3 = 3
    variable4 = 4
    variable5 = 5
    
    df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                    np.where(df['Column2']==variable3, variable4, variable5))
    
    print (df)
       Column2  Column3
    0        1        2
    1        2        5
    2        4        5
    3        3        4
    

    Another solution, thanks Jon Clements:

    df['Column4'] = df.Column2.map({variable1: variable2, variable3:variable4}).fillna(variable5)
    print (df)
       Column2  Column3  Column4
    0        1        2      2.0
    1        2        5      5.0
    2        4        5      5.0
    3        3        4      4.0
    
    0 讨论(0)
  • 2021-01-01 07:52

    You can also try this (if you want to keep the for loop you use) :

    new_column = []
    
    for i in df.index:
        if df.ix[i]['Column2']==variable1:
            new_column.append(variable2)
        elif df.ix[i]['Column2']==variable3:
            new_column.append(variable4)
        else : #if both conditions not verified
            new_column.append(other_variable)
    
    df['Column3'] = new_column
    
    0 讨论(0)
提交回复
热议问题