Python nltk.clean_html not implemented

前端 未结 3 731
后悔当初
后悔当初 2020-12-28 17:52

I have been trying to use

myNews=urlopen(url).read()    
myNews=nltk.clean_html(myNews)

I get the following error:

File \"/usr/loca

3条回答
  •  爱一瞬间的悲伤
    2020-12-28 18:28

    if your code is

    raw = nltk.clean_html(html) 
    tokens = nltk.word_tokenize(raw)
    

    You can use

    raw = BeautifulSoup(html).get_text()
    tokens = nltk.word_tokenize(raw)
    

    instead, see other answers for reason.

提交回复
热议问题