问题
For example:
q=["hi", "sky"]
p=["here","sky","sky","sky","sky"]
What would the function be defined to get:
count_words(["hi", "sky"], ["here","sky","sky","sky","sky"])
[0, 4]
# answer where hi appears 0 times and sky appears 4 times
I started the code off like this:
def count_words(q, p):
count = 0
for word in q:
if q==p:
(q.count("hi"))
(q.count("sky"))
return count
I keep getting one value of 0, which accounts for q, but I can't get a value for p.
回答1:
Here is a simpler answer (by simpler I mean one-liner and no use of additional libraries) -
q=["hi", "sky"]
p=["here","sky","sky","sky","sky"]
def count_words(q,p):
return [ p.count(i) for i in q ]
print(count_words(q,p))
Output
[0, 4]
Explanation
[ p.count(i) for i in q ] is a list comprehension which is like iterating through the q list on the fly and counting for the respective elements in p
Timings (depends on the data)
# My solution
1.78 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# @Delirious Solution 1
7.55 µs ± 1.58 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# @ Delirious Solution 2
3.86 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
回答2:
>>> from collections import Counter
...
...
... def count_words(a, b):
... cnt = Counter(b)
... return [cnt[word] for word in a]
...
>>> count_words(["hi", "sky"], ["here", "sky", "sky", "sky", "sky"])
[0, 4]
Or if you can't use collections.Counter for some reason:
>>> def count_words(a, b):
... cnt = {}
... for word in b:
... try:
... cnt[word] += 1
... except KeyError:
... cnt[word] = 1
... return [cnt.get(word, 0) for word in a]
...
>>> count_words(["hi", "sky"], ["here", "sky", "sky", "sky", "sky"])
[0, 4]
EDIT:
It looks like there should be some timings to "clear things up" about which solution is more efficient. Since @chrisz forgot to post his actual test code, I had to do it myself.
Unfortunately, Vivek's code took 11min 10sec to run while mine took only 46.5ms.
In[18]: def vivek_count_words(q, p):
...: return [p.count(i) for i in q]
...:
In[19]: def lettuce_count_words(a, b):
...: cnt = {}
...: for word in b:
...: try:
...: cnt[word] += 1
...: except KeyError:
...: cnt[word] = 1
...: return [cnt.get(word, 0) for word in a]
...:
In[20]: # https://www.gutenberg.org/files/2701/2701-0.txt
...: with open('moby_dick.txt', 'r') as f:
...: moby_dick_words = []
...: for line in f:
...: moby_dick_words.extend(line.rstrip().split())
...:
In[21]: len(moby_dick_words)
Out[21]: 215829
In[22]: from random import choice
In[23]: random_words = [choice(moby_dick_words) for _ in range(10)]
In[24]: %timeit vivek_count_words(moby_dick_words, random_words)
31.3 ms ± 99.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In[25]: %timeit lettuce_count_words(moby_dick_words, random_words)
20.7 ms ± 54.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In[26]: random_words = [choice(moby_dick_words) for _ in range(100)]
In[27]: %timeit vivek_count_words(moby_dick_words, random_words)
211 ms ± 642 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In[28]: %timeit lettuce_count_words(moby_dick_words, random_words)
20.6 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In[29]: random_words = [choice(moby_dick_words) for _ in range(1000)]
In[30]: %timeit vivek_count_words(moby_dick_words, random_words)
2.18 s ± 2.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In[31]: %timeit lettuce_count_words(moby_dick_words, random_words)
22.2 ms ± 97.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In[32]: random_words = [choice(moby_dick_words) for _ in range(10000)]
In[33]: %timeit vivek_count_words(moby_dick_words, random_words)
29.2 s ± 865 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In[34]: %timeit lettuce_count_words(moby_dick_words, random_words)
25.7 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In[35]: random_words = [choice(moby_dick_words) for _ in range(100000)]
In[36]: %timeit vivek_count_words(moby_dick_words, random_words)
11min 10s ± 7.51 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In[37]: %timeit lettuce_count_words(moby_dick_words, random_words)
46.5 ms ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
来源:https://stackoverflow.com/questions/49103873/python-how-would-a-function-be-made-for-comparing-recurring-words-within-two-li