As a challenge, I\'ve given myself this problem:
Given 2 lists, A, and B, where B is a shuffled version of A, the idea is to figure out the shuffled indices.
Since several very nice solutions were posted, I've taken the liberty of assembling some crude timings to compare each method.
Script used for testing
from timeit import timeit
setup = """
from collections import defaultdict
import pandas as pd
import numpy as np
idx1 = defaultdict(list); idx2 = {}
A = [10, 40, 30, 2]
B = [30, 2, 10, 40]
"""
me = """
for i, l in enumerate(A):
idx1[l].append(i)
res = [idx1[l].pop() for l in B]
"""
coldspeed = """
for i, l in enumerate(A):
idx2.setdefault(l, []).append(i)
res = [idx2[l].pop() for l in B]
"""
divakar = """
sidx = np.argsort(B)
res = sidx[np.searchsorted(B,A, sorter=sidx)]
"""
dyz = """
res = pd.Series(A).reset_index().set_index(0).ix[B].T.values[0]
"""
print('mine:', timeit(setup=setup, stmt=me, number=1000))
print('coldspeed:', timeit(setup=setup, stmt=coldspeed, number=1000))
print('divakar:', timeit(setup=setup, stmt=divakar, number=1000))
print('dyz:', timeit(setup=setup, stmt=dyz, number=1000))
Result/Output (run on Jupyter notebook server. 1000 loops)
mine: 0.0026700650341808796
coldspeed: 0.0029303128831088543
divakar: 0.02583012101240456
dyz: 2.208147854078561
Here are some timings where the size of A
is 100,000 random numbers. And B
is its shuffled equivalent. The program was just too time and memory consuming. Also I had to reduce the number of loops to 100. Otherwise, everything is the same as above:
mine: 17.663535300991498
coldspeed: 17.11006522300886
divakar: 8.73397267702967
dyz: 44.61878849985078