stanford-nlp

How to generate sentiment treebank in Stanford NLP

 ̄綄美尐妖づ 提交于 2019-12-12 04:23:02
问题 I'm using Sentiment Stanford NLP library for sentiment analytics. Now I want to generate a treebank from a sentence input sentence: "Effective but too-tepid biopic" output tree bank: (2 (3 (3 Effective) (2 but)) (1 (1 too-tepid) (2 biopic))) Can anybody show me how to do it ? Thank all. 回答1: So I had to push a bug fix for the SentimentPipeline. If you get the latest code from GitHub and use that version: https://github.com/stanfordnlp/CoreNLP you can issue this command: java -Xmx8g edu

Remove tags of POS tagger

不羁岁月 提交于 2019-12-12 04:08:57
问题 Is it possible to remove the tags from the sentences? One can accomplish it by scanning through the file and finding tags and removing them, but since there are many tags( some models have 30+, some have around 48-50, they basically follow the penn treebank pos tags ), is there a fast and sweet way to remove tags in a more efficient manner? I did check the API, but there was no such method for removal of tags. 回答1: There's nothing special built in for this, but since the output includes both

How to modify TokenRegex rule in StanfordNLP?

风格不统一 提交于 2019-12-12 03:58:14
问题 I have rule file for tokenregex as $EDU_FIRST_KEYWORD = (/Education/|/Course[s]?/|/Educational/|/Academic/|/Education/ /and/?|/Professional/|/Certification[s]?/ /and/?) $EDU_LAST_KEYWORD = (/Background/|/Qualification[s]?/|/Training[s]?/|/Detail[s]?/|/Record[s]?/) tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" } { ruleType: "tokens", pattern: ( $EDU_FIRST_KEYWORD $EDU_LAST_KEYWORD ?), result: "EDUCATION" } I want to match EDU_FIRST_KEYWORD

Why does normalizedNER for dates in my local stanford corenlp server doesn't display correctly

风格不统一 提交于 2019-12-12 03:37:42
问题 I have cloned the latest version of stanfordnlp from github. I'm trying to get proper NER for dates like 1990s, 19th century, etc. the corenlp demo server displays the dates properly. For eg. 19th century returns NER as 18XX but on my server 19th century returns NER as 19****19 Am I using the wrong model or something else. any ideas? The model that I'm using is stanford-english-corenlp-2016-01-10-models.jar and stanford-english-corenlp-models-current.jar 来源: https://stackoverflow.com

Stanford-NLP : GC overhead limit excedded when using parser on Tomcat

青春壹個敷衍的年華 提交于 2019-12-12 02:38:00
问题 We are working on integrating Stanford NLP on our system and it is working fine, just that it causes gc overhead limit exceeded . WE have the memory dump and will analyze it, but if ányone has some idea about this issue, please let us know. The server is quite powerful, SSD, 32gb RAM, Xeon E5 series. Code we have: String text = Jsoup.parse(groupNotes.getMnotetext()).text(); String lang; try { DetectorFactory.clear(); DetectorFactory.loadProfile("/home/deploy/profiles/"); Detector detector =

Forcing POS tags in Stanford CoreNLP

故事扮演 提交于 2019-12-12 02:08:47
问题 Is there a way to process an already POS-tagged text using Stanford CoreNLP? For example, I have the sentence in this format They_PRP are_VBP hunting_VBG dogs_NNS ._. and I'd like to annotate with lemma, ner, parse, etc. by forcing the given POS annotation. Update. I tried this code, but it's not working. Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String sentText = "They_PRP are

proposed nlp algorithm for text tagging

浪子不回头ぞ 提交于 2019-12-12 02:06:39
问题 I was looking for opensource tool which can help to identify the tags for any user post on social media and identifying topic/off-topic or spam comment on that post. Even after looking for entire day, I could not find any suitable tool/library. Here I have proposed my own algorithm for tagging user post belonging to 7 categories (jobs, discussion, events, articles, services, buy/sell, talents). Initially when user makes post, he tags his post. Tags can be like marketing, suggestion,

MissingMethodException using IKVM

﹥>﹥吖頭↗ 提交于 2019-12-12 01:57:22
问题 I'm trying to use Stanford CoreNLP (which is a Java project) in C#. I found this Nuget package which contains CoreNLP converted to .NET using IKVM, and it's working fine, however I need to do some modifications on the java project as well. I downloaded CoreNLP from Github, I can build the CoreNLP JAR from Ant, and it's also running fine in eclipse, however I'm having problems in converting JAR to DLL . Based on some build-log that I found in google, I'm doing this: ikvmc.exe -version:2.1 ..

French dependency parsing using CoreNLP

对着背影说爱祢 提交于 2019-12-12 01:40:16
问题 I am following the example in this link. I have downloaded the french jar from here. When I call it as follows, java -mx1g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french.properties -annotators tokenize,ssplit,pos,depparse -file french.txt -outputFormat conllu I always see it loads a english dep-parser model instead of french. Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... PreComputed 100000, Elapsed Time: 1.341 (s) Is this

Stanford Parser questions

眉间皱痕 提交于 2019-12-12 01:35:31
问题 I am writing a project that works with NLP (natural language parser). I am using the stanford parser. I create a thread pool that takes sentences and run the parser with them. When I create one thread its all works fine, but when I create more, I get errors. The "test" procedure is finding words that have some connections. If I do an synchronized its supposed to work like one thread but still I get errors. My problem is that I have errors on this code: public synchronized String test(String s