How to I factorize a list of tuples?

后端未结

关注

 6  2006

春和景丽 2020-12-06 10:43

definition
factorize: Map each unique object into a unique integer. Typically, the range of integers mapped to is from zero to the n - 1 where n is

6条回答

感情败类 (楼主)

2020-12-06 10:57

I was going to give this answer

pd.factorize([str(x) for x in tups])

However, after running some test, it did not pan out to be the fastest of them all. Since I already did the work, I will show it here for comparison:

@AChampion

%timeit [d[tup] if tup in d else d.setdefault(tup, next(c)) for tup in tups] 1000000 loops, best of 3: 1.66 µs per loop

@Divakar

%timeit np.unique(get_row_view(np.array(tups)), return_inverse=1)[1] # 10000 loops, best of 3: 58.1 µs per loop

@self

%timeit pd.factorize([str(x) for x in tups]) # 10000 loops, best of 3: 65.6 µs per loop

@root

%timeit pd.Series(tups).factorize()[0] # 1000 loops, best of 3: 199 µs per loop

EDIT

For large data with 100K entries, we have:

tups = [(np.random.randint(0, 10), np.random.randint(0, 10)) for i in range(100000)]

@root

%timeit pd.Series(tups).factorize()[0] 100 loops, best of 3: 10.9 ms per loop

@AChampion

%timeit [d[tup] if tup in d else d.setdefault(tup, next(c)) for tup in tups] # 10 loops, best of 3: 16.9 ms per loop

@Divakar

%timeit np.unique(get_row_view(np.array(tups)), return_inverse=1)[1] # 10 loops, best of 3: 81 ms per loop

@self

%timeit pd.factorize([str(x) for x in tups]) 10 loops, best of 3: 87.5 ms per loop

0 讨论(0)

查看其它6个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复