stanford-nlp | 易学教程

Why is stanford corenlp gender identification nondeterministic?

阅读更多关于 Why is stanford corenlp gender identification nondeterministic?

I have the following results and as you can see the name edward has different results (null and male). This has happened with several names. edward, Gender: null james, Gender: MALE karla, Gender: null edward, Gender: MALE Additionally, how can I customize the gender dictionaries? I want to add Spanish and Chinese names. You have raised a lot of issues! 1.) Karla is not in the default gender mappings file, so that is why that's getting null 2.) If you want to make your own custom file, it should be in this format: JOHN\tMALE There should be one NAME\tGENDER entry per line The GenderAnnotator

Stanford Universal Dependencies on Python NLTK

阅读更多关于 Stanford Universal Dependencies on Python NLTK

Is there any way I can get the Universal dependencies using python, or nltk?I can only produce the parse tree. Example: Input sentence: My dog also likes eating sausage. Output: Universal dependencies nmod:poss(dog-2, My-1) nsubj(likes-4, dog-2) advmod(likes-4, also-3) root(ROOT-0, likes-4) xcomp(likes-4, eating-5) dobj(eating-5, sausage-6) Wordseer's stanford-corenlp-python fork is a good start as it works with the recent CoreNLP release (3.5.2). However it will give you raw output, which you need manually transform. For example, given you have the wrapper running: >>> import json, jsonrpclib

How to extract noun phrases from the parsed text

阅读更多关于 How to extract noun phrases from the parsed text

I have parsed a text with constituency parser copy the result in a text file like below: (ROOT (S (NP (NN Yesterday)) (, ,) (NP (PRP we)) (VP (VBD went) (PP (TO to).... (ROOT (FRAG (SBAR (SBAR (IN While) (S (NP (PRP I)) (VP (VBD was) (NP (NP (EX... (ROOT (S (NP (NN Yesterday)) (, ,) (NP (PRP I)) (VP (VBD went) (PP (TO to..... (ROOT (FRAG (SBAR (SBAR (IN While) (S (NP (NNP Jim)) (VP (VBD was) (NP (NP (.... (ROOT (S (S (NP (PRP I)) (VP (VBD started) (S (VP (VBG talking) (PP..... I need to extract all NounPhrases (NP) from this text file. I wrote the following code that extract only the first NP

Getting sentiment analysis result using stanford core nlp java code

阅读更多关于 Getting sentiment analysis result using stanford core nlp java code

When we test it on Stanford demo page: http://nlp.stanford.edu:8080/sentiment/rntnDemo.html it gives the tree with the sentiment score of each node as below: I am trying to test it on my local system using command: H:\Drive E\Stanford\stanfor-corenlp-full-2013~>java -cp "*" edu.stanford.nlp.sen timent.Evaluate edu/stanford/nlp/models/sentiment/sentiment.ser.gz test.txt text.txt has This movie doesn't care about cleverness, wit or any other kind of intelligent humor. Those who find ugly meanings in beautiful things are corrupt without being charming. which yields result: Can anyone please tell

Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

阅读更多关于 Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

The NLTK documentation is rather poor in this integration. The steps I followed were: Download http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip to /home/me/stanford Download http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar to /home/me/stanford Then in a ipython console: In [11]: import nltk In [12]: nltk.__version__ Out[12]: '3.1' In [13]: from nltk.tag import StanfordNERTagger Then st = StanfordNERTagger('/home/me/stanford/stanford-postagger-full-2015-04-20.zip', '/home/me/stanford/stanford-spanish-corenlp-2015-01-08-models.jar') But when

Python NLTK code snippet to train a classifier (naive bayes) using feature frequency

阅读更多关于 Python NLTK code snippet to train a classifier (naive bayes) using feature frequency

问题 I was wondering if anyone could help me through a code snippet that demonstrates how to train Naive Bayes classifier using a feature frequency method as opposed to feature presence. I presume the below as shown in Chap 6 link text refers to creating a featureset using Feature Presence (FP) - def document_features(document): document_words = set(document) features = {} for word in word_features: features['contains(%s)' % word] = (word in document_words) return features Please advice 回答1: In

Splitting chinese document into sentences [closed]

阅读更多关于 Splitting chinese document into sentences [closed]

Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . I have to split Chinese text into multiple sentences. I tried the Stanford DocumentPreProcessor. It worked quite well for English but not for Chinese. Please can you let me know any good sentence splitters for Chinese preferably in Java or Python. Using some regex tricks in Python (c.f. a modified regex of Section 2.3 of http://aclweb.org/anthology/Y/Y11/Y11-1038.pdf ): import re paragraph = u'\u70ed

Lexicon dictionary for synonym words

阅读更多关于 Lexicon dictionary for synonym words

There are few dictionaries available for natural language processing. Like positive, negative words dictionaries etc. Is there any dictionary available which contains list of synonym for all dictionary words? Like for nice synonyms: enjoyable, pleasant, pleasurable, agreeable, delightful, satisfying, gratifying, acceptable, to one's liking, entertaining, amusing, diverting, marvellous, good; alvas Although WordNet is a good resource to start for finding synonym, one must note its limitations, here's an example with python API in NLTK library: Firstly, words have multiple meanings (i.e. senses)

Using Stanford CoreNLP - Java heap space

阅读更多关于 Using Stanford CoreNLP - Java heap space

I am trying to use the coreference module of the Stanford CoreNLP pipeline, but I end up getting an OutOfMemory error in Java. I already increased the heap size (via Run->Run Configurations->VM Arguments in Eclipse) and set them to -Xmx3g -Xms1g. I even tried -Xmx12g -Xms4g, but that didn't help either. I'm using Eclipse Juno on OS X 10.8.5 with Java 1.6 on a 64-bit machine. Does anyone have an idea what else I could try? I'm using the example code from the website ( http://nlp.stanford.edu/software/corenlp.shtml ): Properties props = new Properties(); props.put("annotators", "tokenize, ssplit

Getting output in the desired format using TokenRegex

阅读更多关于 Getting output in the desired format using TokenRegex

问题 I am using TokensRegex for rule based entity extraction. It works well but I am having trouble getting my output in the desired format. The following snippet of code gives me an output given below for the sentence: Earlier this month Trump targeted Toyota, threatening to impose a hefty fee on the world's largest automaker if it builds its Corolla cars for the U.S. market at a plant in Mexico. for (CoreMap sentence : sentences) { List<MatchedExpression> matched = extractor.extractExpressions