nlp

How can i determine the head word in a Sentence, using NLP?

夙愿已清 提交于 2019-12-23 20:40:54
问题 For example, if I have been given a sentence: A British soldier was killed in the fighting in Afghanistan The head word of that sentence is "killed". How can I find it, given the nltk package in Python? I am not talking about stemming, I refer to the head word. 回答1: You are looking for the head word Sentence Parsing. It's available in Python's nltk package, as you can see in this link. It's also much related to Dependency Parsing, as you can see from Stanford NLP package in this link and

Keras Text Preprocessing - Saving Tokenizer object to file for scoring

风流意气都作罢 提交于 2019-12-23 18:53:39
问题 I've trained a sentiment classifier model using Keras library by following the below steps(broadly). Convert Text corpus into sequences using Tokenizer object/class Build a model using the model.fit() method Evaluate this model Now for scoring using this model, I was able to save the model to a file and load from a file. However I've not found a way to save the Tokenizer object to file. Without this I'll have to process the corpus every time I need to score even a single sentence. Is there a

BucketIterator throws 'Field' object has no attribute 'vocab'

▼魔方 西西 提交于 2019-12-23 18:43:52
问题 It's not a new question, references I found without any solution working for me first and second. I'm a newbie to PyTorch, facing AttributeError: 'Field' object has no attribute 'vocab' while creating batches of the text data in PyTorch using torchtext . Following up the book Deep Learning with PyTorch I wrote the same example as explained in the book. Here's the snippet: from torchtext import data from torchtext import datasets from torchtext.vocab import GloVe TEXT = data.Field(lower=True,

psutil.AccessDenied Error while trying to load StanfordCoreNLP

喜夏-厌秋 提交于 2019-12-23 18:12:57
问题 I'm trying to load the package StanfordCoreNLP to get the correct parsing for the movie reviews presented in their page (https://nlp.stanford.edu/sentiment/treebank.html): (I'm using MAC) nlp = StanfordCoreNLP("/Users//NLP_models/stanford-corenlp-full-2018-01-31") But get the error: Traceback (most recent call last): File "/Users/anaconda3/lib/python3.6/site-packages/psutil/_psosx.py", line 295, in wrapper return fun(self, *args, **kwargs) File "/Users/anaconda3/lib/python3.6/site-packages

JAPE rule Sentence contains multiple cases

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-23 16:34:41
问题 How can i check whether a sentence contain combinations? For example consider sentence. John appointed as new CEO for google. I need to write a rule to check whether sentence contains < 'new' + 'Jobtitle' >. How can i achieve this. I tried following. I need to check is there 'new' before word . Rule: CustomRules ( { Sentence contains {Lookup.majorType == "organization"}, Sentence contains {Lookup.majorType == "jobtitle"}, Sentence contains {Lookup.majorType == "person_first"} } ) 回答1: One way

Are high values for c or gamma problematic when using an RBF kernel SVM?

拟墨画扇 提交于 2019-12-23 16:27:30
问题 I'm using WEKA/LibSVM to train a classifier for a term extraction system. My data is not linearly separable, so I used an RBF kernel instead of a linear one. I followed the guide from Hsu et al. and iterated over several values for both c and gamma. The parameters which worked best for classifying known terms (test and training material differ of course) are rather high, c=2^10 and gamma=2^3. So far the high parameters seem to work ok, yet I wonder if they may cause any problems further on,

extract NP-VP-NP from Stanford dependency parse tree

别说谁变了你拦得住时间么 提交于 2019-12-23 15:40:25
问题 I need to extract triplets of the form NP-VP-NP from the dependency parse tree produced as the output of lexalized parsing in Stanford Parser. Whats the best way to do this. e.g. If the parse tree is as follows: (ROOT (S (S (NP (NNP Exercise)) (VP (VBZ reduces) (NP (NN stress))) (. .)) (NP (JJ Regular) (NN exercise)) (VP (VBZ maintains) (NP (JJ mental) (NN fitness))) (. .))) I need to extract 2 triplets: Exercise-reduces-stress and Regular Exercise-maintains-mental fitness Any ideas? 回答1:

Is there a BNF with arguments for non-terminal symbols?

会有一股神秘感。 提交于 2019-12-23 13:05:36
问题 In working with Prolog DCG to parse input it is nice to have an accompaning BNF of the grammar. For example: BNF <Sentence> ::= <Noun_phrase> <Verb_phrase> <Noun_phrase> ::= <Determiner> <Noun> <Verb_phrase> ::= <Verb> <Phrase> <Determiner> ::= a <Determiner> ::= the <Noun> ::= cat <Noun> ::= mouse <Verb> ::= scares <Verb> ::= hates as Prolog DCG sentence --> noun_phrase, verb_phrase. verb_phrase --> verb, noun_phrase. noun_phrase --> determiner, noun. determiner --> [a]. determiner --> [the]

Difference between IOB Accuracy and Precision

◇◆丶佛笑我妖孽 提交于 2019-12-23 12:54:22
问题 I'm doing some works on NLTK with named entity recognition and chunkers. I retrained a classifier using nltk/chunk/named_entity.py for that and I got the following mesures: ChunkParse score: IOB Accuracy: 96.5% Precision: 78.0% Recall: 91.9% F-Measure: 84.4% But I don't understand what is the exact difference between IOB Accuracy and Precision in this case. Actually, I found on the docs (here) the following for an specific example: The IOB tag accuracy indicates that more than a third of the

Generate ngrams with Julia

你说的曾经没有我的故事 提交于 2019-12-23 12:42:57
问题 To generate word bigrams in Julia, I could simply zip through the original list and a list that drops the first element, e.g.: julia> s = split("the lazy fox jumps over the brown dog") 8-element Array{SubString{String},1}: "the" "lazy" "fox" "jumps" "over" "the" "brown" "dog" julia> collect(zip(s, drop(s,1))) 7-element Array{Tuple{SubString{String},SubString{String}},1}: ("the","lazy") ("lazy","fox") ("fox","jumps") ("jumps","over") ("over","the") ("the","brown") ("brown","dog") To generate a