stanford-nlp | 易学教程

Stanford Core NLP how to get the probability & margin of error

阅读更多关于 Stanford Core NLP how to get the probability & margin of error

When using the parser or for the matter any of the Annotation in Core NLP, is there a way to access the probability or the margin of error? To put my question into context, I am trying to understand if there is a way programmatically to detect a case of ambiguity. For instance in the sentence below the verb desire is detected as a noun. I would like to be able to know so kind of measure I can access or calculate from the Core NLP APi to tell me there could be an ambiguity. (NP (NP (NNP Whereas)) (, ,) (NP (NNP users) (NN desire) (S (VP (TO to) (VP (VB sell)))))) 来源： https://stackoverflow.com

Formatting NER output from Stanford Corenlp

阅读更多关于 Formatting NER output from Stanford Corenlp

问题 I am working with Stanford CoreNLP and using it for NER. But when I extract organization names, I see that each word is tagged with the annotation. So, if the entity is "NEW YORK TIMES", then it is getting recorded as three different entities : "NEW", "YORK" and "TIMES". Is there a property we can set in the Stanford COreNLP so that we could get the combined output as the entity ? Just like in Stanford NER, when we use command line utility, we can choose out output format as : inlineXML ? Can

How to work around 100K character limit for the StanfordNLP server?

阅读更多关于 How to work around 100K character limit for the StanfordNLP server?

问题 I am trying to parse book-length blocks of text with StanfordNLP. The http requests work great, but there is a non-configurable 100KB limit to the text length, MAX_CHAR_LENGTH in StanfordCoreNLPServer.java. For now, I am chopping up the text before I send it to the server, but even if I try to split between sentences and paragraphs, there is some useful coreference information that gets lost between these chunks. Presumably, I could parse chunks with large overlap and link them together, but

NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

阅读更多关于 NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already asked questions and other forums but I was still unable to get a soultion to this problem. The problem is when I try to execute the following: from nltk.tag import StanfordPOSTagger st = StanfordPOSTagger('english-bidirectional-distsim.tagger') I get the following: Traceback (most recent call last): `File "<pyshell#13>", line 1, in <module> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')` `File "C:\Users

TreebankLanguagePack function in Neural Network Dependency Parser

阅读更多关于 TreebankLanguagePack function in Neural Network Dependency Parser

If I want to train the Stanford Neural Network Dependency Parser for another language, there is a need for a "treebankLanguagePack"(TLP) but the information about this TLP is very limited: particularities of your treebank and the language it contains If I have my "treebank" in another language that follows the same format as PTB, and my data is using CONLL format. The dependency format follows the "Universal Dependency" UD. Do I need this TLP? As of the current CoreNLP release, the TreebankLanguagePack is used within the dependency parser only to 1) determine the input text encoding and 2)

Stanford NER Features

阅读更多关于 Stanford NER Features

I am currently trying to use the Stanford NER system and I am trying to see what features can be extracted through setting of the flags in a properties file. It seems that the features documented at http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html are not comprehensive. For example, all the feature flags related to dist similarity and clustering are not included (e.g. useDistSim, etc.). Is there a more complete list of all the features and corresponding flags that is available somewhere? Thanks for the help! At present, no. You need to look through the

Get certain nodes out of a Parse Tree

阅读更多关于 Get certain nodes out of a Parse Tree

I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm. At the moment, I don't understand how to: Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?). Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with

gender identification in natural language processing

阅读更多关于 gender identification in natural language processing

I have written below code using stanford nlp packages. GenderAnnotator myGenderAnnotation = new GenderAnnotator(); myGenderAnnotation.annotate(annotation); But for the sentence "Annie goes to school", it is not able to identify the gender of Annie. The output of application is: [Text=Annie CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Annie NamedEntityTag=PERSON] [Text=goes CharacterOffsetBegin=6 CharacterOffsetEnd=10 PartOfSpeech=VBZ Lemma=go NamedEntityTag=O] [Text=to CharacterOffsetBegin=11 CharacterOffsetEnd=13 PartOfSpeech=TO Lemma=to NamedEntityTag=O] [Text=school

Running Stanford corenlp server with custom models

阅读更多关于 Running Stanford corenlp server with custom models

I've trained a POS tagger and neural dependency parser with Stanford corenlp. I can get them to work via command line, and now would like to access them via a server. However, the documentation for the server doesn't say anything about using custom models. I checked the code and didn't find any obvious way of supplying a configuration file. Any idea how to do this? I don't need all annotators, just the ones I trained. Yes, the server should (in theory) support all the functionality of the regular pipeline. The properties GET parameter is translated into the Properties object you would normally

NLP to classify/label the content of a sentence (Ruby binding necesarry)

阅读更多关于 NLP to classify/label the content of a sentence (Ruby binding necesarry)

I am analysing a few million emails. My aim is to be able to classify then into groups. Groups could be e.g.: Delivery problems (slow delivery, slow handling before dispatch, incorrect availability information, etc.) Customer service problems (slow email response time, impolite response, etc.) Return issues (slow handling of return request, lack of helpfulness from the customer service, etc.) Pricing complaint (hidden fee's discovered, etc.) In order to perform this classification, I need a NLP that can recognize the combination of word groups like: "[they|the company|the firm|the website|the