iterrows pandas get next rows value

前端 未结 5 1057
南方客
南方客 2020-12-23 09:50

I have a df in pandas

import pandas as pd
df = pd.DataFrame([\'AA\', \'BB\', \'CC\'], columns = [\'value\'])

I want to iterate over rows in

相关标签:
5条回答
  • 2020-12-23 10:08

    a combination of answers gave me a very fast running time. using the shift method to create new column of next row values, then using the row_iterator function as @alisdt did, but here i changed it from iterrows to itertuples which is 100 times faster.

    my script is for iterating dataframe of duplications in different length and add one second for each duplication so they all be unique.

    # create new column with shifted values from the departure time column
    df['next_column_value'] = df['column_value'].shift(1)
    # create row iterator that can 'save' the next row without running for loop
    row_iterator = df.itertuples()
    # jump to the next row using the row iterator
    last = next(row_iterator)
    # because pandas does not support items alteration i need to save it as an object
    t = last[your_column_num]
    # run and update the time duplications with one more second each
    for row in row_iterator:
        if row.column_value == row.next_column_value:
             t = t + add_sec
             df_result.at[row.Index, 'column_name'] = t
        else:
             # here i resetting the 'last' and 't' values
             last = row
             t = last[your_column_num]
    

    Hope it will help.

    0 讨论(0)
  • 2020-12-23 10:10

    There is a pairwise() function example in the itertools document:

    from itertools import tee, izip
    def pairwise(iterable):
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        next(b, None)
        return izip(a, b)
    
    import pandas as pd
    df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])
    
    for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
        print i1, i2, row1["value"], row2["value"]
    

    Here is the output:

    0 1 AA BB
    1 2 BB CC
    

    But, I think iter rows in a DataFrame is slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.

    0 讨论(0)
  • 2020-12-23 10:10

    I would use shift() function as follows:

    df['value_1'] = df.value.shift(-1)
    [print(x) for x in df.T.unstack().dropna(how = 'any').values];
    

    which produces

    AA
    BB
    BB
    CC
    CC
    

    This is how the code above works:

    Step 1) Use shift function

    df['value_1'] = df.value.shift(-1)
    print(df)
    

    produces

    value value_1
    0    AA      BB
    1    BB      CC
    2    CC     NaN
    

    step 2) Transpose:

    df = df.T
    print(df)
    

    produces:

              0   1    2
    value    AA  BB   CC
    value_1  BB  CC  NaN
    

    Step 3) Unstack:

    df = df.unstack()
    print(df)
    

    produces:

    0  value       AA
       value_1     BB
    1  value       BB
       value_1     CC
    2  value       CC
       value_1    NaN
    dtype: object
    

    Step 4) Drop NaN values

    df = df.dropna(how = 'any')
    print(df)
    

    produces:

    0  value      AA
       value_1    BB
    1  value      BB
       value_1    CC
    2  value      CC
    dtype: object
    

    Step 5) Return a Numpy representation of the DataFrame, and print value by value:

    df = df.values
    [print(x) for x in df];
    

    produces:

    AA
    BB
    BB
    CC
    CC
    
    0 讨论(0)
  • 2020-12-23 10:26

    This can be solved also by izipping the dataframe (iterator) with an offset version of itself.

    Of course the indexing error cannot be reproduced this way.

    Check this out

    import pandas as pd
    from itertools import izip
    
    df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])   
    
    for id1, id2 in izip(df.iterrows(),df.ix[1:].iterrows()):
        print id1[1]['value']
        print id2[1]['value']
    

    which gives

    AA
    BB
    BB
    CC
    
    0 讨论(0)
  • 2020-12-23 10:27

    Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.

    A version of your first idea that would work would be:

    row_iterator = df.iterrows()
    _, last = row_iterator.next()  # take first item from row_iterator
    for i, row in row_iterator:
        print(row['value'])
        print(last['value'])
        last = row
    

    The second method could do something similar, to save one index into the dataframe:

    last = df.irow(0)
    for i in range(1, df.shape[0]):
        print(last)
        print(df.irow(i))
        last = df.irow(i)
    

    When speed is critical you can always try both and time the code.

    0 讨论(0)
提交回复
热议问题