pos-tagger | 易学教程

Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

阅读更多关于 Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

I'm using Stanford NLP to do POS tagging for Spanish texts. I can get a POS Tag for each word but I notice that I am only given the first four sections of the Ancora tag and it's missing the last three sections for person, number and gender. Why does Stanford NLP only use a reduced version of the Ancora tag? Is it possible to get the entire tag using Stanford NLP? Here is my code (please excuse the jruby...): props = java.util.Properties.new() props.put("tokenize.language", "es") props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse") props.put("ner.model", "edu/stanford/nlp/models

to find the opinion of a sentence as positive or negative

阅读更多关于 to find the opinion of a sentence as positive or negative

i need to find the opinion of certain reviews given in websites. i am using sentiwordnet for this. i first send the file containing all the reviews to POS Tagger. tokens=nltk.word_tokenize(line) #tokenization for line in file tagged=nltk.pos_tag(tokens) #for POSTagging print tagged Is there any other accurate way of tokenizing which considers not good as 1 word other than considering it as 2 separate words. Now i have to give postive and negative score to the tokenized words and then calculate the total score. Is there any function in sentiwordnet for this. please help. See First Extract

Score each sentence in a line based upon a tag and summarize the text. (Java)

阅读更多关于 Score each sentence in a line based upon a tag and summarize the text. (Java)

I'm trying to create a summarizer in Java. I'm using the Stanford Log-linear Part-Of-Speech Tagger to tag the words, and then, for certain tags, I'm scoring the sentence and finally in the summary, I'm printing sentences with a high score value. Here's the code: MaxentTagger tagger = new MaxentTagger("taggers/bidirectional-distsim-wsj-0-18.tagger"); BufferedReader reader = new BufferedReader( new FileReader ("C:\\Summarizer\\src\\summarizer\\testing\\testingtext.txt")); String line = null; int score = 0; StringBuilder stringBuilder = new StringBuilder(); File tempFile = new File("C:\

Train spaCy's existing POS tagger with my own training examples

阅读更多关于 Train spaCy's existing POS tagger with my own training examples

I am trying to train the existing POS tagger on my own lexicon, not starting off from scratch (I do not want to create an "empty model"). In spaCy's documentation, it says "Load the model you want to stat with", and the next step is "Add the tag map to the tagger using add_label method". However, when I try to load the English small model, and add the tag map, it throws this error: ValueError: [T003] Resizing pre-trained Tagger models is not currently supported. I was wondering how it can be fixed. I have also seen Implementing custom POS Tagger in Spacy over existing english model : NLP -

Python NLTK pos_tag not returning the correct part-of-speech tag

阅读更多关于 Python NLTK pos_tag not returning the correct part-of-speech tag

Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN')] This is incorrect. The tags for quick brown lazy in the sentence should be: ('quick', 'JJ'), ('brown', 'JJ') , ('lazy', 'JJ') Testing this through their online tool gives the same result; quick , brown and fox should be adjectives not nouns. alvas In short : NLTK is not perfect. In fact, no model is perfect. Note: As of NLTK version

Output results in conll format (POS-tagging, stanford pos tagger)

阅读更多关于 Output results in conll format (POS-tagging, stanford pos tagger)

问题 I am trying to use Stanford POS-tagger, I want to ask if it is possible to parse (actually only pos tag would be enough) an english text and output the results in conll format. Is there such an option? I am using the full 3.2.0 version of the Stanford pos tagger Thanks a lot 回答1: When it comes to the CONLL format, i presume you mean the CONLL2000 chunking task format as such: He PRP B-NP reckons VBZ B-VP the DT B-NP current JJ I-NP account NN I-NP deficit NN I-NP will MD B-VP narrow VB I-VP

How to use OpenNLP to get POS tags in R?

阅读更多关于 How to use OpenNLP to get POS tags in R?

问题 Here is the R Code: library(NLP) library(openNLP) tagPOS <- function(x, ...) { s <- as.String(x) word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence", 1L, nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2) a3w <- a3[a3$type == "word"] POStags <- unlist(lapply(a3w$features, `[[`, "POS")) POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ") list(POStagged = POStagged, POStags = POStags)} str <

How can I remove POS tags before slashes in nltk?

阅读更多关于 How can I remove POS tags before slashes in nltk?

This is part of my project where I need to represent the output after phrase detection like this - (a,x,b) where a, x, b are phrases. I constructed the code and got the output like this: (CLAUSE (NP Jack/NNP) (VP loved/VBD) (NP Peter/NNP)) (CLAUSE (NP Jack/NNP) (VP stayed/VBD) (NP in/IN London/NNP)) (CLAUSE (NP Tom/NNP) (VP is/VBZ) (NP in/IN Kolkata/NNP)) I want to make it just like the previous representation which means I have to remove 'CLAUSE', 'NP', 'VP', 'VBD', 'NNP' etc tags. How to do that? What I tried First wrote this in a text file, tokenize and used list.remove('word') . But that

Getting additional information (Active/Passive, Tenses …) from a Tagger

阅读更多关于 Getting additional information (Active/Passive, Tenses …) from a Tagger

问题 I'm using the Stanford Tagger for determining the Parts of Speech. However, I want to get more information out of the text. Is there a possibility to get further information like the tense of the sentence or if it is in active/passive? So far, I'm using the very basic PoS-Tagging approach: List<List<TaggedWord>> taggedUnits = new ArrayList<List<TaggedWord>>(); String input = "This sentence is going to be future. The door was opened."; for (List<HasWord> sentence : MaxentTagger.tokenizeText

c/c++ NLP library [closed]

阅读更多关于 c/c++ NLP library [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I am looking for an open source Natural Language Processing library for c/c++ and especially i am interested in Part of speech tagging