pos-tagger

How to use OpenNLP to get POS tags in R?

 ̄綄美尐妖づ 提交于 2019-12-04 08:40:37
Here is the R Code: library(NLP) library(openNLP) tagPOS <- function(x, ...) { s <- as.String(x) word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence", 1L, nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2) a3w <- a3[a3$type == "word"] POStags <- unlist(lapply(a3w$features, `[[`, "POS")) POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ") list(POStagged = POStagged, POStags = POStags)} str <- "this is a the first sentence." tagged_str <- tagPOS(str) Output is : tagged_str $POStagged [1]"this

How do I do use non-integer string labels with SVM from scikit-learn? Python

冷暖自知 提交于 2019-12-04 06:54:44
Scikit-learn has fairly user-friendly python modules for machine learning. I am trying to train an SVM tagger for Natural Language Processing (NLP) where my labels and input data are words and annotation. E.g. Part-Of-Speech tagging, rather than using double/integer data as input tuples [[1,2], [2,0]] , my tuples will look like this [['word','NOUN'], ['young', 'adjective']] Can anyone give an example of how i can use the SVM with string tuples? the tutorial/documentation given here are for integer/double inputs. http://scikit-learn.org/stable/modules/svm.html Most machine learning algorithm

bad zip file error in POS tagging in NLTK in python

。_饼干妹妹 提交于 2019-12-04 06:38:42
问题 I am new to python and NLTK ..I want to do word tokenization and POS Tagging in this.I installed Nltk 3.0 in my Ubuntu 14.04 having a default python 2.7.6.First I tried to do tokenization of a simple sentence.But I am getting an error,telling that "BadZipfile: File is not a zip file".How to solve this???? ..One more doubt..i.e. i gave path as "/usr/share/nltk_data" when i installed Nltk data (using command line).Some of the pakages couldnt be installed due to some errors.But it shows other

Stanford Core NLP how to get the probability & margin of error

六月ゝ 毕业季﹏ 提交于 2019-12-04 03:43:59
When using the parser or for the matter any of the Annotation in Core NLP, is there a way to access the probability or the margin of error? To put my question into context, I am trying to understand if there is a way programmatically to detect a case of ambiguity. For instance in the sentence below the verb desire is detected as a noun. I would like to be able to know so kind of measure I can access or calculate from the Core NLP APi to tell me there could be an ambiguity. (NP (NP (NNP Whereas)) (, ,) (NP (NNP users) (NN desire) (S (VP (TO to) (VP (VB sell)))))) 来源: https://stackoverflow.com

How do I tag textfiles with hunpos in nltk?

送分小仙女□ 提交于 2019-12-04 02:33:22
问题 Can someone help me with the syntax for hunpos tagging a corpus in nltk? What do I import for the hunpos.HunPosTagger module? How do I HunPosTag the corpus? See the code below. import nltk from nltk.corpus import PlaintextCorpusReader from nltk.corpus.util import LazyCorpusLoader corpus_root = './' reader = PlaintextCorpusReader (corpus_root, '.*') ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader) ntuen.fileids() isinstance (ntuen, PlaintextCorpusReader) # So how do I

NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

谁都会走 提交于 2019-12-04 01:25:14
I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already asked questions and other forums but I was still unable to get a soultion to this problem. The problem is when I try to execute the following: from nltk.tag import StanfordPOSTagger st = StanfordPOSTagger('english-bidirectional-distsim.tagger') I get the following: Traceback (most recent call last): `File "<pyshell#13>", line 1, in <module> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')` `File "C:\Users

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

半城伤御伤魂 提交于 2019-12-03 16:43:49
I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically. Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP? Edit: I'm actually using the individual NER and POS instructions. So my code was written as instructed in the tutorials given in the Stanford's NER and POS

Getting additional information (Active/Passive, Tenses …) from a Tagger

我是研究僧i 提交于 2019-12-03 16:36:58
I'm using the Stanford Tagger for determining the Parts of Speech. However, I want to get more information out of the text. Is there a possibility to get further information like the tense of the sentence or if it is in active/passive? So far, I'm using the very basic PoS-Tagging approach: List<List<TaggedWord>> taggedUnits = new ArrayList<List<TaggedWord>>(); String input = "This sentence is going to be future. The door was opened."; for (List<HasWord> sentence : MaxentTagger.tokenizeText(new StringReader(input))) { taggedUnits.add(tagger.tagSentence(sentence)); } You can get tense

How to POS_TAG a french sentence?

跟風遠走 提交于 2019-12-03 14:53:18
问题 I'm looking for a way to pos_tag a French sentence like the following code is used for English sentences: def pos_tagging(sentence): var = sentence exampleArray = [var] for item in exampleArray: tokenized = nltk.word_tokenize(item) tagged = nltk.pos_tag(tokenized) return tagged 回答1: The NLTK doesn't come with pre-built resources for French. I recommend using the Stanford tagger, which comes with a trained French model. This code shows how you might set up the nltk for use with Stanford's

c/c++ NLP library [closed]

北战南征 提交于 2019-12-03 08:12:00
I am looking for an open source Natural Language Processing library for c/c++ and especially i am interested in Part of speech tagging. Take a look at this POS Tagger list from Stanford . Some of them are language independent and others are targeted at C/C++ or have specific bindings. Not present on that list, but still important in my opinion is Citar , a C++ free software part of speech tagger using a trigram Hidden Markov Model. 来源: https://stackoverflow.com/questions/1805099/c-c-nlp-library