I have two list:
main_list = [\'Smith\', \'Smith\', \'Roger\', \'Roger-Smith\', \'42\']
master_list = [\'Smith\', \'Roger\']
I want to coun
What about this
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
print len([word for word in main_list if any(mw in word for mw in master_list)])
This would do it:
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
i = 0
for elem in main_list:
if elem in master_list:
i += 1
continue
for master_elem in master_list:
if master_elem in elem:
i += 1
break
print(i) # i = 4
The code above counts 'Roger-Smith' as 1, if you want it to count as multiple, remove the break.
You can do it other way around. Create list that will contain only elements from main_list that have substring from master_list
temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]
Now temp_list looks like this:
['Smith', 'Smith', 'Roger', 'Roger-Smith']
So the length of temp_list is your answer.
A one liner
>>>sum(any(m in L for m in master_list) for L in main_list)
4
Iterate over main_list and check if any of the values from master_list are in that string. This leaves you with a list of bool values. It will stop after it finds one and so adds only one to the count for each string. Conveniently sum counts all the Trues to give you the count.
You can use pandas (which provide fast vectorized operations) with str.contains and sum()
import pandas as pd
main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
master_list = ['Smith', 'Roger']
count = main_list.str.contains('|'.join(master_list)).sum()
If your master_list is not expected to be huge, one way to do it is with regex:
import re
def string_detection(master_list, main_list):
count = 0
master = re.compile("|".join(master_list))
for entry in main_list:
if master.search(entry):
count += 1
return count