Extract non-content English language words string - python [duplicate]

时光怂恿深爱的人放手 提交于 2019-12-11 02:59:19

问题


I am working on Python script in which I want to remove the common english words like "the","an","and","for" and many more from a String. Currently what I have done is I have made a local list of all such words and I just call remove() to remove them from the string. But I want here some pythonish way to achieve this. Have read about nltk and wordnet but totally clueless about that's what I should use and how to use it.

Edit

Well I don't understand why marked as duplicate as my question does not in any way mean that I know about Stop words and now I just want to know how to use it.....the question is about what I can use in my scenario and answer to that was stop words...but when I posted this question I din't know anything about stop words.


回答1:


Do this.

vocabular = set (english_dictionary)
unique_words = [word for word in source_text.split() if word not in vocabular]

It is simple and efficient as can be. If you don't need positions of unique words, make them set too! Operator in is extremely fast on sets (and slow on lists and other containers)




回答2:


this will also work:

yourString = "an elevator is made for five people and it's fast"
wordsToRemove = ["the ", "an ", "and ", "for "]

for word in wordsToRemove:
    yourString = yourString .replace(word, "")



回答3:


I have found that what I was looking for is this:

from nltk.corpus import stopwords
my_stop_words = stopwords.words('english')

Now I can remove or replace the words from my list/string where I find the match in my_stop_words which is a list.

For this to work I had to download the NLTK for python and the using its downloader I downloaded stopwords package.

It also contains many other packages which can be used in different situations for NLP like words,brown,wordnet etc.



来源:https://stackoverflow.com/questions/22904678/extract-non-content-english-language-words-string-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!