stanford-nlp | 易学教程

Train model using Named entity

阅读更多关于 Train model using Named entity

I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my own model and it doesnt seems to be working. For eg: my input text string is "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q " I go through the examples to train my own models and and look for only some words that I am interested in. My jane-austen-emma-ch1.tsv looks like this Toyota PERS Land Cruiser PERS From the above input text i am only interested

Error in creating the StanfordCoreNLP object

阅读更多关于 Error in creating the StanfordCoreNLP object

I have downloaded and installed required jar files from http://nlp.stanford.edu/software/corenlp.shtml#Download . I have include the five jar files Satnford-postagger.jar Stanford-psotagger-3.3.1.jar Stanford-psotagger-3.3.1.jar-javadoc.jar Stanford-psotagger-3.3.1.jar-src.jar stanford-corenlp-3.3.1.jar and the code is public class lemmafirst { protected StanfordCoreNLP pipeline; public lemmafirst() { // Create StanfordCoreNLP object properties, with POS tagging // (required for lemmatization), and lemmatization Properties props; props = new Properties(); props.put("annotators", "tokenize,

Display Stanford NER confidence score

阅读更多关于 Display Stanford NER confidence score

问题 I'm extracting named-entities from news articles with the use of Stanford NER CRFClassifier and in order to implement active learning, I would like to know what are the confidence scores of the classes for each labelled entity. Exemple of display : LOCATION(0.20) PERSON(0.10) ORGANIZATION(0.60) MISC(0.10) Here is my code for extracting named-entities from a text : AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifierNoExceptions(classifier_path); String annnotatedText

Concurrent processing using Stanford CoreNLP (3.5.2)

阅读更多关于 Concurrent processing using Stanford CoreNLP (3.5.2)

I'm facing a concurrency problem in annotating multiple sentences simultaneously. It's unclear to me whether I'm doing something wrong or maybe there is a bug in CoreNLP. My goal is to annotate sentences with the pipeline "tokenize, ssplit, pos, lemma, ner, parse, dcoref" using several threads running in parallel. Each thread allocates its own instance of StanfordCoreNLP and then uses it for the annotation. The problem is that at some point an exception is thrown: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util

Train model using Named entity

阅读更多关于 Train model using Named entity

问题 I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my own model and it doesnt seems to be working. For eg: my input text string is "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q" I go through the examples to train my own models and and look for only some words that I am interested in. My jane-austen

What do the abbreviations in POS tagging etc mean?

阅读更多关于 What do the abbreviations in POS tagging etc mean?

Say I have the following Penn Tree: (S (NP-SBJ the steel strike) (VP lasted (ADVP-TMP (ADVP much longer) (SBAR than (S (NP-SBJ he) (VP anticipated (SBAR *?*)))))) .) What do abbrevations like VP and SBAR etc mean? Where can I find these definitions? What are these abbreviations called? bdk Those are the Penn Treebank tags, for example, VP means "Verb Phrase". The full list can be found here The full list of Penn Treebank POS tags (so-called tagset) including examples can be found on https://www.sketchengine.eu/penn-treebank-tagset/ If you are interested in detail information on POS tag or POS

How do I include more than one classifiers when using Stanford named entity recogniser?

阅读更多关于 How do I include more than one classifiers when using Stanford named entity recogniser?

问题 I run following command to start ner server java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier ner-model.ser.gz -port 8081 - outputFormat inlineXML In here, I have used classifier(ner-model.ser.gz) manually created by me. I want to use default classifier english.muc.7class.distsim.crf.ser.gz(It is given by them) along with the one created by me I tried following command: java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers

Stanford CoreNLP OpenIE annotator

阅读更多关于 Stanford CoreNLP OpenIE annotator

I have a question regarding Stanford CoreNLP OpenIE annotator. I am using Stanford CoreNLP version stanford-corenlp-full-2015-12-09 in order to extract relations using OpenIE. I don't know much Java that's why I am using the pycorenlp wrapper for Python 3.4. I want to extract relation between all words of a sentence, below is the code I used. I am also interested in showing the confidence of each triplet: import nltk from pycorenlp import * import collections nlp=StanfordCoreNLP("http://localhost:9000/") s="Twenty percent electric motors are pulled from an assembly line" output = nlp.annotate

Can't make Stanford POS tagger working in nltk

阅读更多关于 Can't make Stanford POS tagger working in nltk

问题 I'm trying to work with Stanford POS tagger within NLTK. I'm using the example shown here: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford I'm able to load everything smoothly: >>> import os >>> from nltk.tag import StanfordPOSTagger >>> os.environ['STANFORD_MODELS'] = '/path/to/stanford/folder/models') >>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger',path_to_jar='/path/to/stanford/folder/stanford-postagger.jar') but at the first execution: >>> st.tag('What is

Simplifying the French POS Tag Set with NLTK

阅读更多关于 Simplifying the French POS Tag Set with NLTK

问题 How can one simplify the part of speech tags returned by Stanford's French POS tagger? It is fairly easy to read an English sentence into NLTK, find each word's part of speech, then use map_tag() to simplify the tag set: #!/usr/bin/python # -*- coding: utf-8 -*- import os from nltk.tag.stanford import POSTagger from nltk.tokenize import word_tokenize from nltk.tag import map_tag #set java_home path from within script. Run os.getenv("JAVA_HOME") to test java_home os.environ["JAVA_HOME"] = "C:\