stanford-nlp | 易学教程

Stanford CoreNLP Annotators Thread safe?

阅读更多关于 Stanford CoreNLP Annotators Thread safe?

the website of Stanford CoreNLP http://nlp.stanford.edu/software/corenlp.shtml lists dozens of Annotators which work like a charm. I would like to use instances of the Annotators for the common tasks (lemmatization, tagging, parsing) by multiple threads. For example to split up the processing of a massively large (GBs of Text) into threads or to provide web services. There has been some discussion in the past referring to LocalThreads which, by my understanding, use one instance of an Annotator per Thread (thus avoiding problems regarding thread-safety). This is an option but that way all

Converting NLTK phrase structure trees to BRAT .ann standoff

阅读更多关于 Converting NLTK phrase structure trees to BRAT .ann standoff

问题 I'm trying to annotate a corpus of plain text. I'm working with systemic functional grammar, which is fairly standard in terms of part-of-speech annotation, but differs in terms of phrases/chunks. Accordingly, I've POS tagged my data with NLTK defaults, and made a regex chunker with nltk.RegexpParser . Basically, the output now is an NLTK-style phrase structure tree: Tree('S', [Tree('Clause', [Tree('Process-dependencies', [Tree('Participant', [('This', 'DT')]), Tree('Verbal-group', [('is',

Running Stanford corenlp server with custom models

阅读更多关于 Running Stanford corenlp server with custom models

问题 I've trained a POS tagger and neural dependency parser with Stanford corenlp. I can get them to work via command line, and now would like to access them via a server. However, the documentation for the server doesn't say anything about using custom models. I checked the code and didn't find any obvious way of supplying a configuration file. Any idea how to do this? I don't need all annotators, just the ones I trained. 回答1: Yes, the server should (in theory) support all the functionality of

gender identification in natural language processing

阅读更多关于 gender identification in natural language processing

问题 I have written below code using stanford nlp packages. GenderAnnotator myGenderAnnotation = new GenderAnnotator(); myGenderAnnotation.annotate(annotation); But for the sentence "Annie goes to school", it is not able to identify the gender of Annie. The output of application is: [Text=Annie CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Annie NamedEntityTag=PERSON] [Text=goes CharacterOffsetBegin=6 CharacterOffsetEnd=10 PartOfSpeech=VBZ Lemma=go NamedEntityTag=O] [Text=to

Create .conll file as output of Stanford Parser

阅读更多关于 Create .conll file as output of Stanford Parser

问题 I want to use Stanford Parser to create a .conll file for further processing. So far I managed to parse the test sentence with the command: stanford-parser-full-2013-06-20/lexparser.sh stanford-parser-full-2013-06-20/data/testsent.txt > output.txt Instead of a txt file I would like to have a file in .conll. I'm pretty sure it is possible, at it is mentioned in the documentation (see here). Can I somehow modify my command or will I have to write Javacode? Thanks for help! 回答1: If you're

Get certain nodes out of a Parse Tree

阅读更多关于 Get certain nodes out of a Parse Tree

问题 I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm. At the moment, I don't understand how to: Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?). Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much

How to create Custom model using OpenNLP?

阅读更多关于 How to create Custom model using OpenNLP?

问题 I am trying to extract entities like Names, Skills from document using OpenNLP Java API . but it is not extracting proper Names . I am using model available on opennlp sourceforge link Here is a piece of java code- public class tikaOpenIntro { public static void main(String[] args) throws IOException, SAXException, TikaException { tikaOpenIntro toi = new tikaOpenIntro(); toi.filest(""); String cnt = toi.contentEx(); toi.sentenceD(cnt); toi.tokenization(cnt); String names = toi.namefind(toi

Getting additional information (Active/Passive, Tenses …) from a Tagger

阅读更多关于 Getting additional information (Active/Passive, Tenses …) from a Tagger

问题 I'm using the Stanford Tagger for determining the Parts of Speech. However, I want to get more information out of the text. Is there a possibility to get further information like the tense of the sentence or if it is in active/passive? So far, I'm using the very basic PoS-Tagging approach: List<List<TaggedWord>> taggedUnits = new ArrayList<List<TaggedWord>>(); String input = "This sentence is going to be future. The door was opened."; for (List<HasWord> sentence : MaxentTagger.tokenizeText

Stanford OpenIE using customized NER model

阅读更多关于 Stanford OpenIE using customized NER model

问题 I am trying to use Stanford's OpenIE (version 3.6.0) to extract relation triples based on a NER model I trained in chemistry domain. However, I couldn't have OpenIE to extract relation triples based on my own NER model. It seems OpenIE extracts relation triples based only on the default NER models provided in the package. Below are what I've done to train and deploy my NER model: Train the NER model based on http://nlp.stanford.edu/software/crf-faq.html#a. Deploy the NER model in CoreNLP

Stanford CoreNLP wrong coreference resolution

阅读更多关于 Stanford CoreNLP wrong coreference resolution

I am still playing with Stanford's CoreNLP and I am encountering strange results on a very trivial test of Coreference resolution. Given the two sentences : The hotel had a big bathroom. It was very clean. I would expect "It" in sentence 2 to be coreferenced by "bathroom" or at least "a big bathroom" of sentence 1. Unfortunately it point to "The hotel" which in my opinion is wrong. Is there a way to solve this problem ? Do I need to train anything or is it supposed to work out of the box ? Annotation a = getPipeline().getAnnotation("The hotel had a big bathroom. It was very clean."); System