How to cut up my dataframe in chunks, but keeping groups together

后端未结

关注

 2  813

-上瘾入骨i 2021-01-29 08:01

I currently have a massive set of datasets. I have a set for each year in the 2000\'s. I take a combination of three years and run a code on that to clean. The problem is that d

2条回答

梦如初夏 (楼主)

2021-01-29 08:32

One way to achieve this would be like as follows:

import pandas as pd

# generating random DF
num_rows = 100

locs = list('abcdefghijklmno')

df = pd.DataFrame(
        {'id': np.random.randint(1, 100, num_rows),
         'location': np.random.choice(locs, num_rows),
         'year': np.random.randint(2005, 2007, num_rows)})

df.sort_values('id', inplace=True)

print('**** sorted DF (first 10 rows) ****')
print(df.head(10))

# chopping DF into chunks ...
chunk_size = 5

chunks = [i for i in df.id.unique()[::chunk_size]]

chunk_margins = [(chunks[i-1],chunks[i]) for i in range(1, len(chunks))]

df_chunks = [df.ix[(df.id >= x[0]) & (df.id < x[1])] for x in chunk_margins]

print('**** first chunk ****')
print(df_chunks[0])

Output:

**** sorted DF (first 10 rows) ****
    id location  year
31   2        c  2005
85   2        e  2006
89   2        l  2006
70   2        i  2005
60   4        n  2005
68   7        g  2005
22   7        e  2006
73  10        i  2005
23  10        j  2006
47  16        n  2005

**** first chunk ****
    id location  year
31   2        c  2005
85   2        e  2006
89   2        l  2006
70   2        i  2005
60   4        n  2005
68   7        g  2005
22   7        e  2006
73  10        i  2005
23  10        j  2006
47  16        n  2005
6   16        k  2006
82  16        g  2005

0 讨论(0)

查看其它2个回答