Pandas every nth row

前端 未结 5 1221
别那么骄傲
别那么骄傲 2020-11-30 19:40

Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?

5条回答
  •  孤城傲影
    2020-11-30 20:19

    There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

    df = pd.DataFrame('x', index=range(5), columns=list('abc'))
    df
    
       a  b  c
    0  x  x  x
    1  x  x  x
    2  x  x  x
    3  x  x  x
    4  x  x  x
    

    For example, to get every 2 rows, you can do

    df[::2]
    
       a  b  c
    0  x  x  x
    2  x  x  x
    4  x  x  x
    

    There's also GroupBy.first/GroupBy.head, you group on the index:

    df.index // 2
    # Int64Index([0, 0, 1, 1, 2], dtype='int64')
    
    df.groupby(df.index // 2).first()
    # Alternatively,
    # df.groupby(df.index // 2).head(1)
    
       a  b  c
    0  x  x  x
    1  x  x  x
    2  x  x  x
    

    The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

    # df.groupby(np.arange(len(df)) // 2).first()
    df.groupby(pd.RangeIndex(len(df)) // 2).first()
    
       a  b  c
    0  x  x  x
    1  x  x  x
    2  x  x  x
    

提交回复
热议问题