I have a pandas data frame that has is composed of different subgroups.
df = pd.DataFrame({
\'id\':[1, 2, 3, 4, 5, 6, 7, 8],
\'group\':[\'a\',
Working with a big DataFrame (13 million lines), the method rank with groupby maxed out my 8GB of RAM an it took a really long time. I found a workaround less greedy in memory , that I put here just in case:
df.sort_values('value')
tmp = df.groupby('group').size()
rank = tmp.map(range)
rank =[item for sublist in rank for item in sublist]
df['rank'] = rank