pos-tagger

Error in Stanford Pos Tagger

和自甴很熟 提交于 2019-12-25 07:10:06
问题 Hello i am trying to do POS tag for a certain sentence using Stanford Pos Tagger. I am using Python 3.4 nltk 3.1 on windows7 Following is the code i used: import nltk from nltk.tag.stanford import POSTagger import os java_path = r"C:\Program Files\Java\jre1.8.0_66\bin\java.exe" os.environ['JAVAHOME'] = java_path St=POSTagger(r"C:\Python34\Scripts\stanford-postagger-2015-12-09\models\english-bidirectional-distsim.tagger",r"C:\Python34\Scripts\stanford-postagger-2015-12-09\stanford-postagger

List of part of speech tags per sentence with POS Tagger Stanford NPL in C#

坚强是说给别人听的谎言 提交于 2019-12-24 15:08:21
问题 Using the POS Tagger of Stanford NPL .NET, I'm trying to extract a detailed list of part of speech tags per sentence. e.g: "Have a look over there. Look at the car!" Have/VB a/DT look/NN over/IN there/RB ./. Look/VB at/IN the/DT car/NN !/. I need: POS Text: "Have" POS tag: "VB" Position in the original text I managed to achieve this by accessing the private fields of the result via reflection. I know it's ugly, not efficient and very bad, but that's the only I found until know. Hence my

What is the accuracy of nltk pos_tagger?

丶灬走出姿态 提交于 2019-12-24 05:30:34
问题 I'm writing a dissertation, and using nltk.pos_tagger in my work. I can't find any information about what the accuracy of this algorithm. Does anybody know where can I find such information? 回答1: NLTK default pos tagger pos_tag is a MaxEnt tagger, see line 82 from https://github.com/nltk/nltk/blob/develop/nltk/tag/init.py from nltk.corpus import brown from nltk.data import load sents = brown.tagged_sents() # test on last 10% of brown corpus. numtest = len(sents) / 10 testsents = sents[numtest

Not able to tag hindi sentence properly

浪尽此生 提交于 2019-12-24 04:43:09
问题 I have recently started a project on Hindi data processing. I have tried executing certain below code but have not got the expected output. e = u"पूर्ण प्रतिबंध हटाओ : इराक" tokens=nltk.word_tokenize(e) from nltk import pos_tag print tokens tag = nltk.pos_tag(tokens) print tag The output I have obtained is shown below: [u'\u092a\u0942\u0930\u094d\u0923', u'\u092a\u094d\u0930\u0924\u093f\u092c\u0902\u0927', u'\u0939\u091f\u093e\u0913', u':', u'\u0907\u0930\u093e\u0915'] [(u'\u092a\u0942\u0930

Exception in thread “main” java.lang.NullPointerException at opennlp.tools.postag.POSTaggerME.train()

笑着哭i 提交于 2019-12-23 05:31:20
问题 There are same problem! I get InputSteram = null , I used IntelliJ IDEA, OpenNLP 1.9.1. on Ubuntu 18.04 public void makeDataTrainingModel() { model = null; System.out.println("POS model started"); //InputStream dataIn = null; InputStreamFactory dataIn = null; try { dataIn = new InputStreamFactory() { public InputStream createInputStream() throws IOException { return NLPClassifier.class.getResourceAsStream("/home/int/src /main/resources/en-pos.txt"); } }; //I get null pointer here in dataIn

Score each sentence in a line based upon a tag and summarize the text. (Java)

a 夏天 提交于 2019-12-23 00:23:28
问题 I'm trying to create a summarizer in Java. I'm using the Stanford Log-linear Part-Of-Speech Tagger to tag the words, and then, for certain tags, I'm scoring the sentence and finally in the summary, I'm printing sentences with a high score value. Here's the code: MaxentTagger tagger = new MaxentTagger("taggers/bidirectional-distsim-wsj-0-18.tagger"); BufferedReader reader = new BufferedReader( new FileReader ("C:\\Summarizer\\src\\summarizer\\testing\\testingtext.txt")); String line = null;

NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

杀马特。学长 韩版系。学妹 提交于 2019-12-21 07:49:06
问题 I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already asked questions and other forums but I was still unable to get a soultion to this problem. The problem is when I try to execute the following: from nltk.tag import StanfordPOSTagger st = StanfordPOSTagger('english-bidirectional-distsim.tagger') I get the following: Traceback (most recent call last): `File "<pyshell#13>"

Does anyone know how to configure the hunpos wrapper class on nltk?

删除回忆录丶 提交于 2019-12-20 01:08:29
问题 i've tried the following code and installed from http://code.google.com/p/hunpos/downloads/list english-wsj-1.0 hunpos-1.0-linux.tgz i've extracted the file onto '~/' directory and when i tried the following python code: import nltk from nltk.tag import hunpos from nltk.tag.hunpos import HunposTagger import os, sys, re, glob cwd = os.getcwd() for infile in glob.glob(os.path.join(cwd, '*.txt')): (PATH, FILENAME) = os.path.split(infile) read = open(infile) ht = HunposTagger('english.model') ht

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

陌路散爱 提交于 2019-12-18 17:33:11
问题 I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically. Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP? Edit: I'm actually using the individual NER and POS

nltk StanfordNERTagger : How to get proper nouns without capitalization

邮差的信 提交于 2019-12-18 13:35:08
问题 I am trying to use the StanfordNERTagger and nltk to extract keywords from a piece of text. docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics." words = re.split("\W+",docText) stops = set(stopwords.words("english")) #remove stop words from the list words = [w for w in words if w not in stops and len(w) > 2] str = " ".join(words) print str stn = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') stp =