stanford-nlp | 易学教程

Using Stanford CoreNLP

阅读更多关于 Using Stanford CoreNLP

I am trying to get around using the Stanford CoreNLP. I used some code from the web to understand what is going on with the coreference tool. I tried running the project in Eclipse but keep encountering an out of memory exception. I tried increasing the heap size but there isnt any difference. Any ideas on why this keeps happening? Is this a code specific problem? Any directions of using CoreNLP would be awesome. EDIT - Code Added import edu.stanford.nlp.dcoref.CorefChain; import edu.stanford.nlp.dcoref.CorefCoreAnnotations; import edu.stanford.nlp.pipeline.Annotation; import edu.stanford.nlp

What does NER model to find person names inside a resume/CV?

阅读更多关于 What does NER model to find person names inside a resume/CV?

问题 i just have started with Stanford CoreNLP, I would like to build a custom NER model to find persons . Unfortunately, I did not find a good ner model for italian. I need to find these entities inside a resume/CV document. The problem here is that document like those can have different structure, for example i can have: CASE 1 - Name: John - Surname: Travolta - Last name: Travolta - Full name: John Travolta (so many labels that can represent the entity of the person i need to extract) CASE 2 My

How to Train GloVe algorithm on my own corpus

阅读更多关于 How to Train GloVe algorithm on my own corpus

问题 I tried to follow this. But some how I wasted a lot of time ending up with nothing useful. I just want to train a GloVe model on my own corpus (~900Mb corpus.txt file). I downloaded the files provided in the link above and compiled it using cygwin (after editing the demo.sh file and changed it to VOCAB_FILE=corpus.txt . should I leave CORPUS=text8 unchanged?) the output was: cooccurrence.bin cooccurrence.shuf.bin text8 corpus.txt vectors.txt How can I used those files to load it as a GloVe

How to shutdown Stanford CoreNLP Redwood logging?

阅读更多关于 How to shutdown Stanford CoreNLP Redwood logging?

How can I shut down the Stanford CoreNLP messages (see end of post)? I first tried setting log4j.category.edu.stanford=OFF in log4j.properties but that didn't help so I found out that apparently it uses a nonstandard logging framework called "Redwood". According to http://nlp.stanford.edu/nlp/javadoc/javanlp/ there is a documentation but it is password protected. I tried RedwoodConfiguration.empty().apply(); but that doesn't help either. The logging messages: Adding annotator tokenize Adding annotator ssplit Adding annotator pos Loading default properties from tagger edu/stanford/nlp/models

how do I create my own training corpus for stanford tagger?

阅读更多关于 how do I create my own training corpus for stanford tagger?

问题 I have to analyze informal english text with lots of short hands and local lingo. Hence I was thinking of creating the model for the stanford tagger. How do i create my own set of labelled corpus for the stanford tagger to train on? What is the syntax of the corpus and how long should my corpus be in order to achieve a desirable performance? 回答1: To train the PoS tagger, see this mailing list post which is also included in the JavaDocs for the MaxentTagger class. The javadocs for the edu

Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP

阅读更多关于 Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP

generally A head of a nounphrase is a noun which is rightmost of the NP as shown below tree is the head of the parent NP. So ROOT | S ___|________________________ NP | ___|_____________ | | PP VP | ____|____ ____|___ NP | NP | PRT ___|_______ | | | | DT JJ NN NN IN NNP VBD RP | | | | | | | | The old oak tree from India fell down Out[40]: Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', ['old']), Tree('NN', ['oak']), Tree('NN', ['tree'])]), Tree('PP', [Tree('IN', ['from']), Tree('NP', [Tree('NNP', ['India'])])])]), Tree('VP', [Tree('VBD', ['fell']), Tree('PRT', [Tree('RP', [

OpenNLP vs Stanford CoreNLP

阅读更多关于 OpenNLP vs Stanford CoreNLP

I've been doing a little comparison of these two packages and am not sure which direction to go in. What I am looking for briefly is: Named Entity Recognition (people, places, organizations and such). Gender identification. A decent training API. From what I can tell, OpenNLP and Stanford CoreNLP expose pretty similar capabilities. However, Stanford CoreNLP looks like it has a lot more activity whereas OpenNLP has only had a few commits in the last six months. Based on what I saw, OpenNLP appears to be easier to train new models and might be more attractive for that reason alone. However, my

Anaphora resolution using Stanford Coref

阅读更多关于 Anaphora resolution using Stanford Coref

问题 I have sentences (Text I) : Tom is a smart boy. He know a lot of thing. I want to change He in the second sentence to Tom , so final sentences will become (Text II) : Tom is a smart boy. Tom know a lot of thing. I've wrote some code, but my coref object always null . Besides I have no idea what to do next to get correct result. String text = "Tom is a smart boy. He know a lot of thing."; Annotation document = new Annotation(text); Properties props = new Properties(); props.put("annotators",

How to split an NLP parse tree to clauses (independent and subordinate)?

阅读更多关于 How to split an NLP parse tree to clauses (independent and subordinate)?

问题 Given an NLP parse tree like (ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .))) Original sentence is "You could say that they regularly catch a shower, which adds to their exhilaration and joie de vivre." How could the clauses be

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

阅读更多关于 Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

I'm using Eclipse to run java program class, while I run it i got this error Exception in thread "main" java.lang.OutOfMemoryError: Java heap space then i changed the VM from the Properties > Run > VM Options, and I run the program again i got a new error, Error occurred during initialization of VM Incompatible initial and maximum heap sizes specified I'm trying to apply stanford libraries in my program, any idea how to solve this error . to change the VM for Eclipse you can change the amount of the MV from Windows> Preferences> Java> Installed JREs from there select the JRE and click edit,