pos-tagger | 易学教程

How to use OpenNLP to get POS tags in R?

阅读更多关于 How to use OpenNLP to get POS tags in R?

Here is the R Code: library(NLP) library(openNLP) tagPOS <- function(x, ...) { s <- as.String(x) word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence", 1L, nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2) a3w <- a3[a3$type == "word"] POStags <- unlist(lapply(a3w$features, `[[`, "POS")) POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ") list(POStagged = POStagged, POStags = POStags)} str <- "this is a the first sentence." tagged_str <- tagPOS(str) Output is : tagged_str $POStagged [1]"this

How do I do use non-integer string labels with SVM from scikit-learn? Python

阅读更多关于 How do I do use non-integer string labels with SVM from scikit-learn? Python

Scikit-learn has fairly user-friendly python modules for machine learning. I am trying to train an SVM tagger for Natural Language Processing (NLP) where my labels and input data are words and annotation. E.g. Part-Of-Speech tagging, rather than using double/integer data as input tuples [[1,2], [2,0]] , my tuples will look like this [['word','NOUN'], ['young', 'adjective']] Can anyone give an example of how i can use the SVM with string tuples? the tutorial/documentation given here are for integer/double inputs. http://scikit-learn.org/stable/modules/svm.html Most machine learning algorithm

bad zip file error in POS tagging in NLTK in python

阅读更多关于 bad zip file error in POS tagging in NLTK in python

问题 I am new to python and NLTK ..I want to do word tokenization and POS Tagging in this.I installed Nltk 3.0 in my Ubuntu 14.04 having a default python 2.7.6.First I tried to do tokenization of a simple sentence.But I am getting an error,telling that "BadZipfile: File is not a zip file".How to solve this???? ..One more doubt..i.e. i gave path as "/usr/share/nltk_data" when i installed Nltk data (using command line).Some of the pakages couldnt be installed due to some errors.But it shows other

Stanford Core NLP how to get the probability & margin of error

阅读更多关于 Stanford Core NLP how to get the probability & margin of error

When using the parser or for the matter any of the Annotation in Core NLP, is there a way to access the probability or the margin of error? To put my question into context, I am trying to understand if there is a way programmatically to detect a case of ambiguity. For instance in the sentence below the verb desire is detected as a noun. I would like to be able to know so kind of measure I can access or calculate from the Core NLP APi to tell me there could be an ambiguity. (NP (NP (NNP Whereas)) (, ,) (NP (NNP users) (NN desire) (S (VP (TO to) (VP (VB sell)))))) 来源： https://stackoverflow.com

How do I tag textfiles with hunpos in nltk?

阅读更多关于 How do I tag textfiles with hunpos in nltk?

问题 Can someone help me with the syntax for hunpos tagging a corpus in nltk? What do I import for the hunpos.HunPosTagger module? How do I HunPosTag the corpus? See the code below. import nltk from nltk.corpus import PlaintextCorpusReader from nltk.corpus.util import LazyCorpusLoader corpus_root = './' reader = PlaintextCorpusReader (corpus_root, '.*') ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader) ntuen.fileids() isinstance (ntuen, PlaintextCorpusReader) # So how do I

NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

阅读更多关于 NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already asked questions and other forums but I was still unable to get a soultion to this problem. The problem is when I try to execute the following: from nltk.tag import StanfordPOSTagger st = StanfordPOSTagger('english-bidirectional-distsim.tagger') I get the following: Traceback (most recent call last): `File "<pyshell#13>", line 1, in <module> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')` `File "C:\Users

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

阅读更多关于 How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically. Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP? Edit: I'm actually using the individual NER and POS instructions. So my code was written as instructed in the tutorials given in the Stanford's NER and POS

Getting additional information (Active/Passive, Tenses …) from a Tagger

阅读更多关于 Getting additional information (Active/Passive, Tenses …) from a Tagger

I'm using the Stanford Tagger for determining the Parts of Speech. However, I want to get more information out of the text. Is there a possibility to get further information like the tense of the sentence or if it is in active/passive? So far, I'm using the very basic PoS-Tagging approach: List<List<TaggedWord>> taggedUnits = new ArrayList<List<TaggedWord>>(); String input = "This sentence is going to be future. The door was opened."; for (List<HasWord> sentence : MaxentTagger.tokenizeText(new StringReader(input))) { taggedUnits.add(tagger.tagSentence(sentence)); } You can get tense

How to POS_TAG a french sentence?

阅读更多关于 How to POS_TAG a french sentence?

问题 I'm looking for a way to pos_tag a French sentence like the following code is used for English sentences: def pos_tagging(sentence): var = sentence exampleArray = [var] for item in exampleArray: tokenized = nltk.word_tokenize(item) tagged = nltk.pos_tag(tokenized) return tagged 回答1: The NLTK doesn't come with pre-built resources for French. I recommend using the Stanford tagger, which comes with a trained French model. This code shows how you might set up the nltk for use with Stanford's

c/c++ NLP library [closed]

阅读更多关于 c/c++ NLP library [closed]

I am looking for an open source Natural Language Processing library for c/c++ and especially i am interested in Part of speech tagging. Take a look at this POS Tagger list from Stanford . Some of them are language independent and others are targeted at C/C++ or have specific bindings. Not present on that list, but still important in my opinion is Citar , a C++ free software part of speech tagger using a trigram Hidden Markov Model. 来源： https://stackoverflow.com/questions/1805099/c-c-nlp-library