Find names of top-n highest-value (non-zero) columns in each pandas dataframe row

青春壹個敷衍的年華 提交于 2019-12-04 17:36:47
jezrael

You need reorder values by column names, and where 0 replace by mask to empty strings:

df = df.set_index('id')

k = 3
vals = df.values
arr1 = np.argsort(-vals, axis=1)

print (vals[np.arange(len(df.index))[:,None], arr1][:,:k])
[[ 9  4  0]
 [ 4  0  0]
 [10  7  3]
 [ 5  3  1]
 [10  7  3]]

a = df.columns[arr1[:,:k]]
mask = vals[np.arange(len(df.index))[:,None], arr1][:,:k] == 0
print (mask)
[[False False  True]
 [False  True  True]
 [False False False]
 [False False False]
 [False False False]]

result = pd.DataFrame(a, columns=['top{}'.format(i) for i in range(1, k+1)],
                         index=df.index)

result = result.mask(mask, '')
print(result)
   top1 top2 top3
id               
1    p2   p4     
2    p4          
3    p3   p4   p2
4    p2   p3   p1
5    p4   p3   p2
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!