pandas concat ignore_index doesn't work

前端 未结 4 537
误落风尘
误落风尘 2020-12-04 11:06

I am trying to column-bind dataframes and having issue with pandas concat, as ignore_index=True doesn\'t seem to work:

df1 = pd.Dat         


        
相关标签:
4条回答
  • 2020-12-04 11:40

    Agree with the comments, always best to post expected output.

    Is this what you are seeking?

    df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},
                        index=[0, 2, 3,4])
    
    df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],
                        'D2': ['D4', 'D5', 'D6', 'D7']},
                        index=[ 5, 6, 7,3])
    
    
    df1 = df1.transpose().reset_index(drop=True).transpose()
    df2 = df2.transpose().reset_index(drop=True).transpose()
    
    
    dfs = [df1,df2]
    df = pd.concat( dfs,axis=0,ignore_index=True)
    
    print df
    
    
    
        0   1   2
    0  A0  B0  D0
    1  A1  B1  D1
    2  A2  B2  D2
    3  A3  B3  D3
    4  A4  C4  D4
    5  A5  C5  D5
    6  A6  C6  D6
    7  A7  C7  D7
    
    0 讨论(0)
  • 2020-12-04 11:45

    The ignore_index option is working in your example, you just need to know that it is ignoring the axis of concatenation which in your case is the columns. (Perhaps a better name would be ignore_labels.) If you want the concatenation to ignore the index labels, then your axis variable has to be set to 0 (the default).

    0 讨论(0)
  • 2020-12-04 11:54

    If I understood you correctly, this is what you would like to do.

    import pandas as pd
    
    df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},
                        index=[0, 2, 3,4])
    
    df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],
                        'D2': ['D4', 'D5', 'D6', 'D7']},
                        index=[ 4, 5, 6 ,7])
    
    
    df1.reset_index(drop=True, inplace=True)
    df2.reset_index(drop=True, inplace=True)
    
    df = pd.concat( [df1, df2], axis=1) 
    

    Which gives:

        A   B   D   A1  C   D2
    0   A0  B0  D0  A4  C4  D4
    1   A1  B1  D1  A5  C5  D5
    2   A2  B2  D2  A6  C6  D6
    3   A3  B3  D3  A7  C7  D7
    

    Actually, I would have expected that df = pd.concat(dfs,axis=1,ignore_index=True) gives the same result.

    This is the excellent explanation from jreback:

    ignore_index=True ‘ignores’, meaning doesn’t align on the joining axis. it simply pastes them together in the order that they are passed, then reassigns a range for the actual index (e.g. range(len(index))) so the difference between joining on non-overlapping indexes (assume axis=1 in the example), is that with ignore_index=False (the default), you get the concat of the indexes, and with ignore_index=True you get a range.

    0 讨论(0)
  • 2020-12-04 11:57

    Thanks for asking. I had the same issue. For some reason "ignore_index=True" doesn't help in my case. I wanted to keep index from the first dataset and ignore the second index a this worked for me

    X_train=pd.concat([train_sp, X_train.reset_index(drop=True, inplace=True)], axis=1)
    
    0 讨论(0)
提交回复
热议问题