I have two python lists:
a = [(\'when\', 3), (\'why\', 4), (\'throw\', 9), (\'send\', 15), (\'you\', 1)]
b = [\'the\', \'when\', \'send\', \'we\', \'us\']
<
As this is tagged with numpy
, here is a numpy solution using numpy.in1d benchmarked against the list comprehension:
In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
In [2]: b = ['the', 'when', 'send', 'we', 'us']
In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])
In [4]: b_ar = np.array(b)
In [5]: %timeit filtered = [i for i in a if not i[0] in b]
1000000 loops, best of 3: 778 ns per loop
In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
10000 loops, best of 3: 31.4 us per loop
So for 5 records the list comprehension is faster.
However for large data sets the numpy solution is twice as fast as the list comprehension:
In [7]: a = a * 1000
In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])
In [9]: %timeit filtered = [i for i in a if not i[0] in b]
1000 loops, best of 3: 647 us per loop
In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
1000 loops, best of 3: 302 us per loop