pos-tagger

Definition of the CESS_ESP tags

狂风中的少年 提交于 2020-01-14 14:31:53
问题 I'm using the NLTK CESS ESP data package and I've been able to use an adatpation of the spaghetti tagger and a HiddenMarkovModelTagger to pos-tag the sentence, how ever the tags that it produces are not at all like the ones used when tagging en_US sentences, here's a link to the Categorizing and Tagging documentation for NLTK, you'll notice that the tags used are uppercase and don't have any numbers or punctuation, some cess tags: vsip3s0 , da0fs0 . Does some one know a reference that

Getting the closest noun from a stemmed word

限于喜欢 提交于 2020-01-05 10:09:46
问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its

Getting the closest noun from a stemmed word

牧云@^-^@ 提交于 2020-01-05 10:09:10
问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its

Lemmatizing words after POS tagging produces unexpected results

老子叫甜甜 提交于 2020-01-05 03:54:05
问题 I am using python3.5 with the nltk pos_tag function and the WordNetLemmatizer. My goal is to flatten words in our database to classify text. I am trying to test using the lemmatizer and I encounter strange behavior when using the POS tagger on identical tokens. In the example below, I have a list of three strings and when running them in the POS tagger every other element is returned as a noun(NN) and the rest are return as verbs (VBG). This affects the lemmatization. The out put looks like

NLTK POS tagger not working

你离开我真会死。 提交于 2019-12-29 07:47:07
问题 If I try this : import nltk text = nltk.word_tokenize("And now for something completely different") nltk.pos_tag(text) Output: Traceback (most recent call last): File "C:/Python27/pos.py", line 3, in <module> nltk.pos_tag(text) File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\tag\__init__.py" ipos_tag tagger = load(_POS_TAGGER) File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\data.py", line 605,in resource_val = pickle.load(_open(resource_url)) ImportError: No module

Error using Stanford POS Tagger in NLTK Python

北城余情 提交于 2019-12-29 06:45:08
问题 I am trying to use Stanford POS Tagger in NLTK but I am not able to run the example code given here http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford import nltk from nltk.tag.stanford import POSTagger st = POSTagger(r'english-bidirectional-distim.tagger',r'D:/stanford-postagger/stanford-postagger.jar') st.tag('What is the airspeed of an unladen swallow?'.split()) I have already added environment variables as CLASSPATH = D:/stanford-postagger/stanford-postagger.jar STANFORD

Using NLTK Tree

牧云@^-^@ 提交于 2019-12-25 08:19:08
问题 ['', ['S', ['NP-SBJ', ['NP', ['NNP', 'Pierre'], ['NNP', 'Vinken']], [',', ','], ['ADJP', ['NP', ['CD', '61'], ['NNS', 'years']], ['JJ', 'old']], [',', ',']], ['VP', ['MD', 'will'], ['VP', ['VB', 'join'], ['NP', ['DT', 'the'], ['NN', 'board']], ['PP-CLR', ['IN', 'as'], ['NP', ['DT', 'a'], ['JJ', 'nonexecutive'], ['NN', 'director']]], ['NP-TMP', ['NNP', 'Nov.'], ['CD', '29']]]], ['.', '.']]] I want to generate production rules by traversing this grammar using NLTK Tree. Production rules would