stanford-nlp

Stanford CoreNLP Annotators Thread safe?

核能气质少年 提交于 2019-12-05 06:53:35
the website of Stanford CoreNLP http://nlp.stanford.edu/software/corenlp.shtml lists dozens of Annotators which work like a charm. I would like to use instances of the Annotators for the common tasks (lemmatization, tagging, parsing) by multiple threads. For example to split up the processing of a massively large (GBs of Text) into threads or to provide web services. There has been some discussion in the past referring to LocalThreads which, by my understanding, use one instance of an Annotator per Thread (thus avoiding problems regarding thread-safety). This is an option but that way all

Converting NLTK phrase structure trees to BRAT .ann standoff

流过昼夜 提交于 2019-12-05 05:55:23
问题 I'm trying to annotate a corpus of plain text. I'm working with systemic functional grammar, which is fairly standard in terms of part-of-speech annotation, but differs in terms of phrases/chunks. Accordingly, I've POS tagged my data with NLTK defaults, and made a regex chunker with nltk.RegexpParser . Basically, the output now is an NLTK-style phrase structure tree: Tree('S', [Tree('Clause', [Tree('Process-dependencies', [Tree('Participant', [('This', 'DT')]), Tree('Verbal-group', [('is',

Running Stanford corenlp server with custom models

烂漫一生 提交于 2019-12-05 05:08:11
问题 I've trained a POS tagger and neural dependency parser with Stanford corenlp. I can get them to work via command line, and now would like to access them via a server. However, the documentation for the server doesn't say anything about using custom models. I checked the code and didn't find any obvious way of supplying a configuration file. Any idea how to do this? I don't need all annotators, just the ones I trained. 回答1: Yes, the server should (in theory) support all the functionality of

gender identification in natural language processing

对着背影说爱祢 提交于 2019-12-05 05:08:06
问题 I have written below code using stanford nlp packages. GenderAnnotator myGenderAnnotation = new GenderAnnotator(); myGenderAnnotation.annotate(annotation); But for the sentence "Annie goes to school", it is not able to identify the gender of Annie. The output of application is: [Text=Annie CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Annie NamedEntityTag=PERSON] [Text=goes CharacterOffsetBegin=6 CharacterOffsetEnd=10 PartOfSpeech=VBZ Lemma=go NamedEntityTag=O] [Text=to

Create .conll file as output of Stanford Parser

懵懂的女人 提交于 2019-12-05 04:22:56
问题 I want to use Stanford Parser to create a .conll file for further processing. So far I managed to parse the test sentence with the command: stanford-parser-full-2013-06-20/lexparser.sh stanford-parser-full-2013-06-20/data/testsent.txt > output.txt Instead of a txt file I would like to have a file in .conll. I'm pretty sure it is possible, at it is mentioned in the documentation (see here). Can I somehow modify my command or will I have to write Javacode? Thanks for help! 回答1: If you're

Get certain nodes out of a Parse Tree

纵饮孤独 提交于 2019-12-05 02:58:13
问题 I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm. At the moment, I don't understand how to: Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?). Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much

How to create Custom model using OpenNLP?

寵の児 提交于 2019-12-05 02:46:23
问题 I am trying to extract entities like Names, Skills from document using OpenNLP Java API . but it is not extracting proper Names . I am using model available on opennlp sourceforge link Here is a piece of java code- public class tikaOpenIntro { public static void main(String[] args) throws IOException, SAXException, TikaException { tikaOpenIntro toi = new tikaOpenIntro(); toi.filest(""); String cnt = toi.contentEx(); toi.sentenceD(cnt); toi.tokenization(cnt); String names = toi.namefind(toi

Getting additional information (Active/Passive, Tenses …) from a Tagger

前提是你 提交于 2019-12-05 01:58:02
问题 I'm using the Stanford Tagger for determining the Parts of Speech. However, I want to get more information out of the text. Is there a possibility to get further information like the tense of the sentence or if it is in active/passive? So far, I'm using the very basic PoS-Tagging approach: List<List<TaggedWord>> taggedUnits = new ArrayList<List<TaggedWord>>(); String input = "This sentence is going to be future. The door was opened."; for (List<HasWord> sentence : MaxentTagger.tokenizeText

Stanford OpenIE using customized NER model

你。 提交于 2019-12-05 01:17:09
问题 I am trying to use Stanford's OpenIE (version 3.6.0) to extract relation triples based on a NER model I trained in chemistry domain. However, I couldn't have OpenIE to extract relation triples based on my own NER model. It seems OpenIE extracts relation triples based only on the default NER models provided in the package. Below are what I've done to train and deploy my NER model: Train the NER model based on http://nlp.stanford.edu/software/crf-faq.html#a. Deploy the NER model in CoreNLP

Stanford CoreNLP wrong coreference resolution

一笑奈何 提交于 2019-12-04 23:17:33
I am still playing with Stanford's CoreNLP and I am encountering strange results on a very trivial test of Coreference resolution. Given the two sentences : The hotel had a big bathroom. It was very clean. I would expect "It" in sentence 2 to be coreferenced by "bathroom" or at least "a big bathroom" of sentence 1. Unfortunately it point to "The hotel" which in my opinion is wrong. Is there a way to solve this problem ? Do I need to train anything or is it supposed to work out of the box ? Annotation a = getPipeline().getAnnotation("The hotel had a big bathroom. It was very clean."); System