pos-tagger

Stanford POS Tagger: How to preserve newlines in the output?

不问归期 提交于 2019-12-10 12:03:51
问题 My input.txt file contains the following sample text: you have to let's come and see me. Now if I invoke the Stanford POS tagger with the default command: java -classpath stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/wsj-0-18-bidirectional-distsim.tagger -textFile input.txt > output.txt I get the following in my output.txt file: you_PRP have_VBP to_TO let_VB 's_POS come_VB and_CC see_VB me_PRP ._. The problem with the above output is that I have lost my

Java Command Fails in NLTK Stanford POS Tagger

a 夏天 提交于 2019-12-10 11:38:54
问题 I request your kind help and assistance in solving the error of "Java Command Fails" which keeps throwing whenever I try to tag an Arabic corpus with size of 2 megabytes. I have searched the web and stanford POS tagger mailing list. However, I did not find the solution. I read some posts on problems similar to this, and it was suggested that the memory is used out. I am not sure of that. Still I have 19GB free memory. I tried every possible solution offered, but the same error keeps showing.

How to obtain better results using NLTK pos tag

烈酒焚心 提交于 2019-12-08 17:47:06
问题 I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ? broke = NN flimsy = NN crap = NN Also I am getting lot of extra words being categorized as NN. How can I filter these out to get better results.? 回答1: Give the context, there you obtained these results. Just as example, I'm obtaining other results with pos_tag on the context phrase "They broke climsy crap": import nltk text=nltk.word

Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

无人久伴 提交于 2019-12-08 06:20:13
问题 I'm using Stanford NLP to do POS tagging for Spanish texts. I can get a POS Tag for each word but I notice that I am only given the first four sections of the Ancora tag and it's missing the last three sections for person, number and gender. Why does Stanford NLP only use a reduced version of the Ancora tag? Is it possible to get the entire tag using Stanford NLP? Here is my code (please excuse the jruby...): props = java.util.Properties.new() props.put("tokenize.language", "es") props.put(

Do we need to use Stopwords filtering before POS Tagging?

﹥>﹥吖頭↗ 提交于 2019-12-08 05:55:22
问题 I am new to Text mining and NLP related stuffs.I am working on a small project where I am trying to extract information out of a few documents.I am basically doing a pos tagging and then using a chunker to find out the pattern based on the tagged words.Do I need to use Stopwords before doing this POS tagging?will using stopwords affect my POS tagger's accuracy? 回答1: Let's use this as an example to train/test a tagger: First get the corpus and stoplist >>> import nltk >>> nltk.download(

Extract Noun phrase using stanford NLP

一个人想着一个人 提交于 2019-12-08 04:50:59
问题 I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP For eg: the sentence "the white tiger" I would love to get Theme/Nound phrase as : white tiger. For this I used pos tagger. My sample code is below. Result I am getting is "tiger" which is not correct. Sample code I used to run is public static void main(String[] args) throws IOException { Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,parse"); StanfordCoreNLP pipeline = new

to find the opinion of a sentence as positive or negative

六月ゝ 毕业季﹏ 提交于 2019-12-08 04:00:59
问题 i need to find the opinion of certain reviews given in websites. i am using sentiwordnet for this. i first send the file containing all the reviews to POS Tagger. tokens=nltk.word_tokenize(line) #tokenization for line in file tagged=nltk.pos_tag(tokens) #for POSTagging print tagged Is there any other accurate way of tokenizing which considers not good as 1 word other than considering it as 2 separate words. Now i have to give postive and negative score to the tokenized words and then

Python NLTK pos_tag not returning the correct part-of-speech tag

好久不见. 提交于 2019-12-08 02:34:19
问题 Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN')] This is incorrect. The tags for quick brown lazy in the sentence should be: ('quick', 'JJ'), ('brown', 'JJ') , ('lazy', 'JJ') Testing this through their online tool gives the same result; quick , brown and fox should be adjectives

Is it possible to append words to an existing OpenNLP POS corpus/model?

拟墨画扇 提交于 2019-12-07 15:48:19
问题 Is there a way to train the existing Apache OpenNLP POS Tagger model? I need to add a few more proper nouns to the model that are specific to my application. When I try to use the below command: opennlp POSTaggerTrainer -type maxent -model en-pos-maxent.bin \ -lang en -data en-pos.train -encoding UTF-8 the entire model is retrained. I'd only like to append a few new sentences to en-pos-maxent.bin This is how my training file looks: Where_WRB is_VBZ the_DT Seven_DNNP Dwarfs_DNNP Mine_DNNP

How can I remove POS tags before slashes in nltk?

不想你离开。 提交于 2019-12-07 08:14:23
问题 This is part of my project where I need to represent the output after phrase detection like this - (a,x,b) where a, x, b are phrases. I constructed the code and got the output like this: (CLAUSE (NP Jack/NNP) (VP loved/VBD) (NP Peter/NNP)) (CLAUSE (NP Jack/NNP) (VP stayed/VBD) (NP in/IN London/NNP)) (CLAUSE (NP Tom/NNP) (VP is/VBZ) (NP in/IN Kolkata/NNP)) I want to make it just like the previous representation which means I have to remove 'CLAUSE', 'NP', 'VP', 'VBD', 'NNP' etc tags. How to do