nltk | 易学教程

自然语言18.2_NLTK命名实体识别

阅读更多关于自然语言18.2_NLTK命名实体识别

python机器学习-乳腺癌细胞挖掘（博主亲自录制视频） https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share 机器学习，统计项目合作QQ：231469242 http://blog.csdn.net/u010718606/article/details/50148261参考 NLTK中对于很多自然语言处理应用有着开箱即用的api，但是结果往往让人弄不清楚状况。下面的例子使用NLTK进行命名实体的识别。第一例中，Apple成功被识别出来，而第二例并未被识别。究竟是什么原因导致这样的结果，接下来一探究竟。 In [1]: import nltk In [2]: tokens = nltk.word_tokenize('I am very excited about the next generation of Apple products.') In [3]: tokens = nltk.pos_tag(tokens) In [4]: print tokens [('I', 'PRP'), ('am', 'VBP'), ('very', 'RB'), (

自然语言22_Wordnet with NLTK

阅读更多关于自然语言22_Wordnet with NLTK

python机器学习-乳腺癌细胞挖掘（博主亲自录制视频） https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share 机器学习，统计项目合作QQ：231469242 Wordnet with NLTK 英语的同义词和反义词函数 # -*- coding: utf-8 -*- """ Spyder Editor 英语的同义词和反义词函数 """ import nltk from nltk.corpus import wordnet syns=wordnet.synsets('program') ''' syns Out[11]: [Synset('plan.n.01'), Synset('program.n.02'), Synset('broadcast.n.02'), Synset('platform.n.02'), Synset('program.n.05'), Synset('course_of_study.n.01'), Synset('program.n.07'), Synset('program.n.08'), Synset('program.v.01'),

How to convert token list into wordnet lemma list using nltk?

阅读更多关于 How to convert token list into wordnet lemma list using nltk?

问题 I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this: ['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...] There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age',

How to convert token list into wordnet lemma list using nltk?

阅读更多关于 How to convert token list into wordnet lemma list using nltk?

NLTK Wordnet Download Out of Date

阅读更多关于 NLTK Wordnet Download Out of Date

问题 New to Python, tying to get started with NLTK. After a rough time installing Python on my Windows 7 64-bit system, I am now having a rough time downloading Wordnet and other NLTK data packages located here: http://nltk.org/nltk_data/ Some packages download, some say "Out of Date" import nltk nltk.download() When I use the above to download, the program doesn't let me cancel if I hit the cancel button. So, I just shut it down and go directly to the link above to try and download it manually.

问题处理：[nltk_data] Error loading brown ： urlopen error [Errno 111] Connection refused

阅读更多关于问题处理：[nltk_data] Error loading brown ： urlopen error [Errno 111] Connection refused

一、错误信息错误定位: if nltk.download(brown): //从nltk语料库网站下载指定的语料库错误提示：上图提示错误是：[nltk_data] Error loading brown:<urlopen error [Errno 111] Connection refused>，也就是：下载语料库时连接到网页被拒绝。二、错误原因第一步，在网上查找[Errno 111]的解决办法，有部分答案说电脑添加了代理或翻墙了，才不能正常下载，而我的电脑不符合这种情况。第二步，尝试手动下载数据库（参考链接http://www.nltk.org/data.html#installing-via-a-proxy-web-server），也以失败告终。但也因此发现问题所在。定位错误原因注意到网址前方带红色斜杠的小锁（见下方第一张图），点开小锁–>向右的箭头–>更多信息（见下方第二张图）发现错误原因：网页权限不足。三、问题解决解决办法就是修改网页权限，将“安装附加组件”和“打开弹出式窗口”的权限都改成允许，参考百度经验 — 如何设置火狐浏览器的信任站点，修改后再查看网页权限如下。至此，问题解决，此时再运行程序就能正常下载了。来源： CSDN 作者： lyumoon 链接： https://blog.csdn.net/dengzhuo8077

what is the difference between tfidf vectorizer and tfidf transformer

阅读更多关于 what is the difference between tfidf vectorizer and tfidf transformer

问题 I know that the formula for tfidf vectorizer is Count of word/Total count * log(Number of documents / no.of documents where word is present) I saw there's tfidf transformer in the scikit learn and I just wanted to difference between them. I could't find anything that's helpful. 回答1: TfidfVectorizer is used on sentences, while TfidfTransformer is used on an existing count matrix, such as one returned by CountVectorizer 回答2: Artem's answer pretty much sums up the difference. To make things

How to run naive Bayes from NLTK with Python Pandas?

阅读更多关于 How to run naive Bayes from NLTK with Python Pandas?

问题 I have a csv file with feature (people's names) and label (people's ethnicities). I am able to set up the data frame using Python Pandas, but when I try to link that with NLTK module to run a naive Bayes, I get the following error: Traceback (most recent call last): File "C:\Users\Desktop\file.py", line 19, in <module> classifier = nbc.train(train_set) File "E:\Program Files Extra\Python27\lib\site-packages\nltk\classify\naivebayes.py", line 194, in train for fname, fval in featureset.items()

Word tokeinizing from the list of words in python?

阅读更多关于 Word tokeinizing from the list of words in python?

问题 my program has list of words and amongst that i need few specific words to be tokenized as one word. my program would split a string into words eg str="hello my name is vishal, can you please help me with the red blood cells and platelet count. The white blood cell is a single word." output will be list=['hello','my','name','is','vishal','can','you','please','help','me','with','the','red','blood','cells','and','platelet','count','the','white','blood','cell','is','a','single','word'].. now I

Definition of the CESS_ESP tags

阅读更多关于 Definition of the CESS_ESP tags

问题 I'm using the NLTK CESS ESP data package and I've been able to use an adatpation of the spaghetti tagger and a HiddenMarkovModelTagger to pos-tag the sentence, how ever the tags that it produces are not at all like the ones used when tagging en_US sentences, here's a link to the Categorizing and Tagging documentation for NLTK, you'll notice that the tags used are uppercase and don't have any numbers or punctuation, some cess tags: vsip3s0 , da0fs0 . Does some one know a reference that