multi-column factorize in pandas

后端未结

关注

 4  729

长发绾君心 2020-12-28 09:35

The pandas factorize function assigns each unique value in a series to a sequential, 0-based index, and calculates which index each series entry belongs to.

4条回答

不思量自难忘° (楼主)

2020-12-28 09:54
You need to create a ndarray of tuple first, pandas.lib.fast_zip can do this very fast in cython loop.
```
import pandas as pd
df = pd.DataFrame({'x': [1, 1, 2, 2, 1, 1], 'y':[1, 2, 2, 2, 2, 1]})
print pd.factorize(pd.lib.fast_zip([df.x, df.y]))[0]
```
the output is:
```
[0 1 2 2 1 0]
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...