Determine if a list of words is in a sentence?

丶灬走出姿态 提交于 2019-12-24 04:53:08

问题


Is there a way (Pattern or Python or NLTK, etc) to detect of a sentence has a list of words in it.

i.e.

The cat ran into the hat, box, and house. | The list would be hat, box, and house

This could be string processed but we may have more generic lists:

i.e.

The cat likes to run outside, run inside, or jump up the stairs. |

List=run outside, run inside, or jump up the stairs.

This could be in the middle of a paragraph or the end of the sentence which further complicates things.

I've been working with Pattern for python for awhile and I'm not seeing a way to go about this and was curious if there is a way with pattern or nltk (natural language tool kit).


回答1:


What about using from nltk.tokenize import sent_tokenize ?

sent_tokenize("Hello SF Python. This is NLTK.")
["Hello SF Python.", "This is NLTK."]

Then you can use that list of sentences in this way:

for sentence in my_list:
  # test if this sentence contains the words you want
  # using all() method 

More info here




回答2:


From what I got from your question, I think you want to search whether all the words in your list is present in a sentence or not.

In general to search for a list elements, in a sentence, you can use all function. It returns true, if all the arguments in it are true.

listOfWords = ['word1', 'word2', 'word3', 'two words']
sentence = "word1 as word2 a fword3 af two words"

if all(word in sentence for word in listOfWords):
    print "All words in sentence"
else:
    print "Missing"

OUTPUT: -

"All words in sentence"

I think this might serve your purpose. If not, then you can clarify.




回答3:


all(word in sentence for word in listOfWords)



回答4:


Using a Trie, you will be able to achieve this is O(n) where n is the number of words in the list of words after building a trie with the list of words which takes O(n) where n is the number of words in the list.

Algorithm

  • split the sentence into list of words separated by space.
  • For each word check if it has a key in the trie. i.e. that word exist in the list
    • if it exits add that word to the result to keep track of how many words from the list appear in the sentence
    • keep track of the words that has a has subtrie that is the current word is a prefix of the longer word in the list of words
      • for each word in this words see by extending it with the current word it can be a key or a subtrie on the list of words
    • if it's a subtrie then we add it to the extend_words list and see if concatenating with the next words we are able to get an exact match.

Code

import pygtrie
listOfWords = ['word1', 'word2', 'word3', 'two words']

trie = pygtrie.StringTrie()
trie._separator = ' '
for word in listOfWords:
  trie[word] = True

print('s', trie._separator)

sentence = "word1 as word2 a fword3 af two words"
sentence_words = sentence.split()
words_found = {}
extended_words = set()

for possible_word in sentence_words:
  has_possible_word = trie.has_node(possible_word)

  if has_possible_word & trie.HAS_VALUE:
    words_found[possible_word] = True

  deep_clone = set(extended_words)
  for extended_word in deep_clone:
    extended_words.remove(extended_word)

    possible_extended_word = extended_word + trie._separator + possible_word
    print(possible_extended_word)
    has_possible_extended_word = trie.has_node(possible_extended_word)

    if has_possible_extended_word & trie.HAS_VALUE:
      words_found[possible_extended_word] = True

    if has_possible_extended_word & trie.HAS_SUBTRIE:
      extended_words.update(possible_extended_word)


  if has_possible_word & trie.HAS_SUBTRIE:
    extended_words.update([possible_word])

print(words_found)
print(len(words_found) == len(listOfWords))

This is useful if your list of words is huge and you do not wish to iterate over it every time or you have a large number of queries that over the same list of words.

The code is here



来源:https://stackoverflow.com/questions/13093339/determine-if-a-list-of-words-is-in-a-sentence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!