pos-tagger | 易学教程

Error in Stanford Pos Tagger

阅读更多关于 Error in Stanford Pos Tagger

问题 Hello i am trying to do POS tag for a certain sentence using Stanford Pos Tagger. I am using Python 3.4 nltk 3.1 on windows7 Following is the code i used: import nltk from nltk.tag.stanford import POSTagger import os java_path = r"C:\Program Files\Java\jre1.8.0_66\bin\java.exe" os.environ['JAVAHOME'] = java_path St=POSTagger(r"C:\Python34\Scripts\stanford-postagger-2015-12-09\models\english-bidirectional-distsim.tagger",r"C:\Python34\Scripts\stanford-postagger-2015-12-09\stanford-postagger

List of part of speech tags per sentence with POS Tagger Stanford NPL in C#

阅读更多关于 List of part of speech tags per sentence with POS Tagger Stanford NPL in C#

问题 Using the POS Tagger of Stanford NPL .NET, I'm trying to extract a detailed list of part of speech tags per sentence. e.g: "Have a look over there. Look at the car!" Have/VB a/DT look/NN over/IN there/RB ./. Look/VB at/IN the/DT car/NN !/. I need: POS Text: "Have" POS tag: "VB" Position in the original text I managed to achieve this by accessing the private fields of the result via reflection. I know it's ugly, not efficient and very bad, but that's the only I found until know. Hence my

What is the accuracy of nltk pos_tagger?

阅读更多关于 What is the accuracy of nltk pos_tagger?

问题 I'm writing a dissertation, and using nltk.pos_tagger in my work. I can't find any information about what the accuracy of this algorithm. Does anybody know where can I find such information? 回答1: NLTK default pos tagger pos_tag is a MaxEnt tagger, see line 82 from https://github.com/nltk/nltk/blob/develop/nltk/tag/init.py from nltk.corpus import brown from nltk.data import load sents = brown.tagged_sents() # test on last 10% of brown corpus. numtest = len(sents) / 10 testsents = sents[numtest

Not able to tag hindi sentence properly

阅读更多关于 Not able to tag hindi sentence properly

问题 I have recently started a project on Hindi data processing. I have tried executing certain below code but have not got the expected output. e = u"पूर्ण प्रतिबंध हटाओ : इराक" tokens=nltk.word_tokenize(e) from nltk import pos_tag print tokens tag = nltk.pos_tag(tokens) print tag The output I have obtained is shown below: [u'\u092a\u0942\u0930\u094d\u0923', u'\u092a\u094d\u0930\u0924\u093f\u092c\u0902\u0927', u'\u0939\u091f\u093e\u0913', u':', u'\u0907\u0930\u093e\u0915'] [(u'\u092a\u0942\u0930

Exception in thread “main” java.lang.NullPointerException at opennlp.tools.postag.POSTaggerME.train()

阅读更多关于 Exception in thread “main” java.lang.NullPointerException at opennlp.tools.postag.POSTaggerME.train()

问题 There are same problem! I get InputSteram = null , I used IntelliJ IDEA, OpenNLP 1.9.1. on Ubuntu 18.04 public void makeDataTrainingModel() { model = null; System.out.println("POS model started"); //InputStream dataIn = null; InputStreamFactory dataIn = null; try { dataIn = new InputStreamFactory() { public InputStream createInputStream() throws IOException { return NLPClassifier.class.getResourceAsStream("/home/int/src /main/resources/en-pos.txt"); } }; //I get null pointer here in dataIn

Score each sentence in a line based upon a tag and summarize the text. (Java)

阅读更多关于 Score each sentence in a line based upon a tag and summarize the text. (Java)

问题 I'm trying to create a summarizer in Java. I'm using the Stanford Log-linear Part-Of-Speech Tagger to tag the words, and then, for certain tags, I'm scoring the sentence and finally in the summary, I'm printing sentences with a high score value. Here's the code: MaxentTagger tagger = new MaxentTagger("taggers/bidirectional-distsim-wsj-0-18.tagger"); BufferedReader reader = new BufferedReader( new FileReader ("C:\\Summarizer\\src\\summarizer\\testing\\testingtext.txt")); String line = null;

NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

阅读更多关于 NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

问题 I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already asked questions and other forums but I was still unable to get a soultion to this problem. The problem is when I try to execute the following: from nltk.tag import StanfordPOSTagger st = StanfordPOSTagger('english-bidirectional-distsim.tagger') I get the following: Traceback (most recent call last): `File "<pyshell#13>"

Does anyone know how to configure the hunpos wrapper class on nltk?

阅读更多关于 Does anyone know how to configure the hunpos wrapper class on nltk?

问题 i've tried the following code and installed from http://code.google.com/p/hunpos/downloads/list english-wsj-1.0 hunpos-1.0-linux.tgz i've extracted the file onto '~/' directory and when i tried the following python code: import nltk from nltk.tag import hunpos from nltk.tag.hunpos import HunposTagger import os, sys, re, glob cwd = os.getcwd() for infile in glob.glob(os.path.join(cwd, '*.txt')): (PATH, FILENAME) = os.path.split(infile) read = open(infile) ht = HunposTagger('english.model') ht

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

阅读更多关于 How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

问题 I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically. Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP? Edit: I'm actually using the individual NER and POS

nltk StanfordNERTagger : How to get proper nouns without capitalization

阅读更多关于 nltk StanfordNERTagger : How to get proper nouns without capitalization

问题 I am trying to use the StanfordNERTagger and nltk to extract keywords from a piece of text. docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics." words = re.split("\W+",docText) stops = set(stopwords.words("english")) #remove stop words from the list words = [w for w in words if w not in stops and len(w) > 2] str = " ".join(words) print str stn = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') stp =