I have been trying to use
myNews=urlopen(url).read() myNews=nltk.clean_html(myNews)
I get the following error:
File \"/usr/loca
if your code is
raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw)
You can use
raw = BeautifulSoup(html).get_text() tokens = nltk.word_tokenize(raw)
instead, see other answers for reason.