How to cut up my dataframe in chunks, but keeping groups together

后端 未结 2 813
-上瘾入骨i
-上瘾入骨i 2021-01-29 08:01

I currently have a massive set of datasets. I have a set for each year in the 2000\'s. I take a combination of three years and run a code on that to clean. The problem is that d

2条回答
  •  梦如初夏
    2021-01-29 08:32

    One way to achieve this would be like as follows:

    import pandas as pd
    
    # generating random DF
    num_rows = 100
    
    locs = list('abcdefghijklmno')
    
    df = pd.DataFrame(
            {'id': np.random.randint(1, 100, num_rows),
             'location': np.random.choice(locs, num_rows),
             'year': np.random.randint(2005, 2007, num_rows)})
    
    df.sort_values('id', inplace=True)
    
    print('**** sorted DF (first 10 rows) ****')
    print(df.head(10))
    
    # chopping DF into chunks ...
    chunk_size = 5
    
    chunks = [i for i in df.id.unique()[::chunk_size]]
    
    chunk_margins = [(chunks[i-1],chunks[i]) for i in range(1, len(chunks))]
    
    df_chunks = [df.ix[(df.id >= x[0]) & (df.id < x[1])] for x in chunk_margins]
    
    print('**** first chunk ****')
    print(df_chunks[0])
    

    Output:

    **** sorted DF (first 10 rows) ****
        id location  year
    31   2        c  2005
    85   2        e  2006
    89   2        l  2006
    70   2        i  2005
    60   4        n  2005
    68   7        g  2005
    22   7        e  2006
    73  10        i  2005
    23  10        j  2006
    47  16        n  2005
    
    **** first chunk ****
        id location  year
    31   2        c  2005
    85   2        e  2006
    89   2        l  2006
    70   2        i  2005
    60   4        n  2005
    68   7        g  2005
    22   7        e  2006
    73  10        i  2005
    23  10        j  2006
    47  16        n  2005
    6   16        k  2006
    82  16        g  2005
    

提交回复
热议问题