pos-tagger | 易学教程

Stanford POS Tagger: How to preserve newlines in the output?

阅读更多关于 Stanford POS Tagger: How to preserve newlines in the output?

问题 My input.txt file contains the following sample text: you have to let's come and see me. Now if I invoke the Stanford POS tagger with the default command: java -classpath stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/wsj-0-18-bidirectional-distsim.tagger -textFile input.txt > output.txt I get the following in my output.txt file: you_PRP have_VBP to_TO let_VB 's_POS come_VB and_CC see_VB me_PRP ._. The problem with the above output is that I have lost my

Java Command Fails in NLTK Stanford POS Tagger

阅读更多关于 Java Command Fails in NLTK Stanford POS Tagger

问题 I request your kind help and assistance in solving the error of "Java Command Fails" which keeps throwing whenever I try to tag an Arabic corpus with size of 2 megabytes. I have searched the web and stanford POS tagger mailing list. However, I did not find the solution. I read some posts on problems similar to this, and it was suggested that the memory is used out. I am not sure of that. Still I have 19GB free memory. I tried every possible solution offered, but the same error keeps showing.

How to obtain better results using NLTK pos tag

阅读更多关于 How to obtain better results using NLTK pos tag

问题 I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ? broke = NN flimsy = NN crap = NN Also I am getting lot of extra words being categorized as NN. How can I filter these out to get better results.? 回答1: Give the context, there you obtained these results. Just as example, I'm obtaining other results with pos_tag on the context phrase "They broke climsy crap": import nltk text=nltk.word

Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

阅读更多关于 Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

问题 I'm using Stanford NLP to do POS tagging for Spanish texts. I can get a POS Tag for each word but I notice that I am only given the first four sections of the Ancora tag and it's missing the last three sections for person, number and gender. Why does Stanford NLP only use a reduced version of the Ancora tag? Is it possible to get the entire tag using Stanford NLP? Here is my code (please excuse the jruby...): props = java.util.Properties.new() props.put("tokenize.language", "es") props.put(

Do we need to use Stopwords filtering before POS Tagging?

阅读更多关于 Do we need to use Stopwords filtering before POS Tagging?

问题 I am new to Text mining and NLP related stuffs.I am working on a small project where I am trying to extract information out of a few documents.I am basically doing a pos tagging and then using a chunker to find out the pattern based on the tagged words.Do I need to use Stopwords before doing this POS tagging?will using stopwords affect my POS tagger's accuracy? 回答1: Let's use this as an example to train/test a tagger: First get the corpus and stoplist >>> import nltk >>> nltk.download(

Extract Noun phrase using stanford NLP

阅读更多关于 Extract Noun phrase using stanford NLP

问题 I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP For eg: the sentence "the white tiger" I would love to get Theme/Nound phrase as : white tiger. For this I used pos tagger. My sample code is below. Result I am getting is "tiger" which is not correct. Sample code I used to run is public static void main(String[] args) throws IOException { Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,parse"); StanfordCoreNLP pipeline = new

to find the opinion of a sentence as positive or negative

阅读更多关于 to find the opinion of a sentence as positive or negative

问题 i need to find the opinion of certain reviews given in websites. i am using sentiwordnet for this. i first send the file containing all the reviews to POS Tagger. tokens=nltk.word_tokenize(line) #tokenization for line in file tagged=nltk.pos_tag(tokens) #for POSTagging print tagged Is there any other accurate way of tokenizing which considers not good as 1 word other than considering it as 2 separate words. Now i have to give postive and negative score to the tokenized words and then

Python NLTK pos_tag not returning the correct part-of-speech tag

阅读更多关于 Python NLTK pos_tag not returning the correct part-of-speech tag

问题 Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN')] This is incorrect. The tags for quick brown lazy in the sentence should be: ('quick', 'JJ'), ('brown', 'JJ') , ('lazy', 'JJ') Testing this through their online tool gives the same result; quick , brown and fox should be adjectives

Is it possible to append words to an existing OpenNLP POS corpus/model?

阅读更多关于 Is it possible to append words to an existing OpenNLP POS corpus/model?

问题 Is there a way to train the existing Apache OpenNLP POS Tagger model? I need to add a few more proper nouns to the model that are specific to my application. When I try to use the below command: opennlp POSTaggerTrainer -type maxent -model en-pos-maxent.bin \ -lang en -data en-pos.train -encoding UTF-8 the entire model is retrained. I'd only like to append a few new sentences to en-pos-maxent.bin This is how my training file looks: Where_WRB is_VBZ the_DT Seven_DNNP Dwarfs_DNNP Mine_DNNP

How can I remove POS tags before slashes in nltk?

阅读更多关于 How can I remove POS tags before slashes in nltk?

问题 This is part of my project where I need to represent the output after phrase detection like this - (a,x,b) where a, x, b are phrases. I constructed the code and got the output like this: (CLAUSE (NP Jack/NNP) (VP loved/VBD) (NP Peter/NNP)) (CLAUSE (NP Jack/NNP) (VP stayed/VBD) (NP in/IN London/NNP)) (CLAUSE (NP Tom/NNP) (VP is/VBZ) (NP in/IN Kolkata/NNP)) I want to make it just like the previous representation which means I have to remove 'CLAUSE', 'NP', 'VP', 'VBD', 'NNP' etc tags. How to do