Pandas enumerate groups in descending order

瘦欲@ 提交于 2020-01-02 07:25:23

问题


I've the following column:

   column
0      10
1      10
2       8
3       8
4       6
5       6

My goal is to find the today unique values (3 in this case) and create a new column which would create the following

   new_column
0           3
1           3
2           2
3           2
4           1
5           1

The numbering starts from length of unique values (3) and same number is repeated if current row is same as previous row based on original column. Number gets decreased as row value changes. All unique values in original column have same number of rows (2 rows for each unique value in this case).

My solution was to groupby the original column and create a new list like below:

i=1
new_time=[]
for j, v in df.groupby('column'):
    new_time.append([i]*2)
    i=i+1

Then I'd flatten the list sort in decreasing order. Any other simpler solution?

Thanks.


回答1:


Use GroupBy.ngroup with ascending=False:

df.groupby('column', sort=False).ngroup(ascending=False)+1

0    3
1    3
2    2
3    2
4    1
5    1
dtype: int64

For DataFrame that looks like this,

df = pd.DataFrame({'column': [10, 10, 8, 8, 10, 10]})

. . .where only consecutive values are to be grouped, you'll need to modify your grouper:

(df.groupby(df['column'].ne(df['column'].shift()).cumsum(), sort=False)
   .ngroup(ascending=False)
   .add(1))

0    3
1    3
2    2
3    2
4    1
5    1
dtype: int64



回答2:


pd.factorize

i, u = pd.factorize(df.column)
df.assign(new=len(u) - i)

   column  new
0      10    3
1      10    3
2       8    2
3       8    2
4       6    1
5       6    1

dict.setdefault

d = {}
for k in df.column:
    d.setdefault(k, len(d))

df.assign(new=len(d) - df.column.map(d))



回答3:


Try with unique and map

df.column.map(dict(zip(df.column.unique(),reversed(range(df.column.nunique())))))+1
Out[350]: 
0    3
1    3
2    2
3    2
4    1
5    1
Name: column, dtype: int64



回答4:


Acutally, we can use rank with method being dense i.e

dense: like ‘min’, but rank always increases by 1 between groups

df['column'].rank(method='dense')

0    3.0
1    3.0
2    2.0
3    2.0
4    1.0
5    1.0

rank version of @cs95's solution would be

df['column'].ne(df['column'].shift()).cumsum().rank(method='dense',ascending=False)



回答5:


IIUC, you want groupID of same-values consecutive groups in reversed order. If so, I think this should work too:

df.column.nunique() - df.column.ne(df.column.shift()).cumsum().sub(1)

Out[691]:
0    3
1    3
2    2
3    2
4    1
5    1
Name: column, dtype: int32


来源:https://stackoverflow.com/questions/56720806/pandas-enumerate-groups-in-descending-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!