问题
Training on Checkio. The task is called Popular words. The task is to search for words from a list (of strings) in a given string.
For example:
textt="When I was One I had just begun When I was Two I was nearly new"
wwords=['i', 'was', 'three', 'near']
My code goes like:
def popular_words(text: str, words: list) -> dict:
# your code here
occurence={}
text=text.lower()
for i in words:
occurence[i]=(text.count(i))
# incorrectly takes "nearly" as "near"
print(occurence)
return(occurence)
popular_words(textt,wwords)
which works almost fine, returning
{'i': 4, 'was': 3, 'three': 0, 'near': 1}
thus counting "near" as a part of the "nearly". It was obviously the authors intention. I, however, cannot find a way to get aroud this other than
"search for words that are not first (index 0) or last (last index) and for these that begin/end with whitespace"
May I ask for a help, please? Building upon this rather childish code, please.
回答1:
you'd be better off splitting your sentence, then count the words, not the substrings:
textt="When I was One I had just begun When I was Two I was nearly new"
wwords=['i', 'was', 'three', 'near']
text_words = textt.lower().split()
result = {w:text_words.count(w) for w in wwords}
print(result)
prints:
{'three': 0, 'i': 4, 'near': 0, 'was': 3}
if the text has punctuation now, you're better off with regular expressions to split the string according to non-alphanum:
import re
textt="When I was One, I had just begun.I was Two when I was nearly new"
wwords=['i', 'was', 'three', 'near']
text_words = re.split("\W+",textt.lower())
result = {w:text_words.count(w) for w in wwords}
result:
{'was': 3, 'near': 0, 'three': 0, 'i': 4}
(another alternative is to use findall
on word characters: text_words = re.findall(r"\w+",textt.lower())
)
Now if your list of "important" words is big, maybe it's better to count all the words, and filter afterwards, using the classical collections.Counter
:
text_words = collections.Counter(re.split("\W+",textt.lower()))
result = {w:text_words.get(w) for w in wwords}
回答2:
Your simple solution would be this one:
from collections import Counter
textt="When I was One I had just begun When I was Two I was nearly new".lower()
wwords=['i', 'was', 'three', 'near']
txt = textt.split()
keys = Counter(txt)
for i in wwords:
print(i + ' : ' + str(keys[i]))
来源:https://stackoverflow.com/questions/52302509/check-for-whole-only-words-in-string