Determine the shuffled indices of two lists/arrays

后端未结

关注

 6  637

野的像风 2021-01-18 16:03

As a challenge, I\'ve given myself this problem:

Given 2 lists, A, and B, where B is a shuffled version of A, the idea is to figure out the shuffled indices.

6条回答

Happy的楠姐 (楼主)

2021-01-18 16:41

Since several very nice solutions were posted, I've taken the liberty of assembling some crude timings to compare each method.

Script used for testing

from timeit import timeit


setup = """
from collections import defaultdict
import pandas as pd 
import numpy as np 
idx1 = defaultdict(list); idx2 = {}
A = [10, 40, 30, 2]
B = [30, 2, 10, 40]
"""

me = """
for i, l in enumerate(A):
    idx1[l].append(i)
res = [idx1[l].pop() for l in B]
"""

coldspeed = """
for i, l in enumerate(A):
    idx2.setdefault(l, []).append(i)
res = [idx2[l].pop() for l in B]
"""

divakar = """
sidx = np.argsort(B)
res = sidx[np.searchsorted(B,A, sorter=sidx)]
"""

dyz = """
res = pd.Series(A).reset_index().set_index(0).ix[B].T.values[0]
"""

print('mine:', timeit(setup=setup, stmt=me, number=1000))
print('coldspeed:', timeit(setup=setup, stmt=coldspeed, number=1000))
print('divakar:', timeit(setup=setup, stmt=divakar, number=1000))
print('dyz:', timeit(setup=setup, stmt=dyz, number=1000))

Result/Output (run on Jupyter notebook server. 1000 loops)

mine: 0.0026700650341808796
coldspeed: 0.0029303128831088543
divakar: 0.02583012101240456
dyz: 2.208147854078561

Here are some timings where the size of A is 100,000 random numbers. And B is its shuffled equivalent. The program was just too time and memory consuming. Also I had to reduce the number of loops to 100. Otherwise, everything is the same as above:

mine: 17.663535300991498
coldspeed: 17.11006522300886
divakar: 8.73397267702967
dyz: 44.61878849985078

0 讨论(0)

查看其它6个回答