nlp | 易学教程

How can i determine the head word in a Sentence, using NLP?

阅读更多关于 How can i determine the head word in a Sentence, using NLP?

问题 For example, if I have been given a sentence: A British soldier was killed in the fighting in Afghanistan The head word of that sentence is "killed". How can I find it, given the nltk package in Python? I am not talking about stemming, I refer to the head word. 回答1: You are looking for the head word Sentence Parsing. It's available in Python's nltk package, as you can see in this link. It's also much related to Dependency Parsing, as you can see from Stanford NLP package in this link and

Keras Text Preprocessing - Saving Tokenizer object to file for scoring

阅读更多关于 Keras Text Preprocessing - Saving Tokenizer object to file for scoring

问题 I've trained a sentiment classifier model using Keras library by following the below steps(broadly). Convert Text corpus into sequences using Tokenizer object/class Build a model using the model.fit() method Evaluate this model Now for scoring using this model, I was able to save the model to a file and load from a file. However I've not found a way to save the Tokenizer object to file. Without this I'll have to process the corpus every time I need to score even a single sentence. Is there a

BucketIterator throws 'Field' object has no attribute 'vocab'

阅读更多关于 BucketIterator throws 'Field' object has no attribute 'vocab'

问题 It's not a new question, references I found without any solution working for me first and second. I'm a newbie to PyTorch, facing AttributeError: 'Field' object has no attribute 'vocab' while creating batches of the text data in PyTorch using torchtext . Following up the book Deep Learning with PyTorch I wrote the same example as explained in the book. Here's the snippet: from torchtext import data from torchtext import datasets from torchtext.vocab import GloVe TEXT = data.Field(lower=True,

psutil.AccessDenied Error while trying to load StanfordCoreNLP

阅读更多关于 psutil.AccessDenied Error while trying to load StanfordCoreNLP

问题 I'm trying to load the package StanfordCoreNLP to get the correct parsing for the movie reviews presented in their page (https://nlp.stanford.edu/sentiment/treebank.html): (I'm using MAC) nlp = StanfordCoreNLP("/Users//NLP_models/stanford-corenlp-full-2018-01-31") But get the error: Traceback (most recent call last): File "/Users/anaconda3/lib/python3.6/site-packages/psutil/_psosx.py", line 295, in wrapper return fun(self, *args, **kwargs) File "/Users/anaconda3/lib/python3.6/site-packages

JAPE rule Sentence contains multiple cases

阅读更多关于 JAPE rule Sentence contains multiple cases

问题 How can i check whether a sentence contain combinations? For example consider sentence. John appointed as new CEO for google. I need to write a rule to check whether sentence contains < 'new' + 'Jobtitle' >. How can i achieve this. I tried following. I need to check is there 'new' before word . Rule: CustomRules ( { Sentence contains {Lookup.majorType == "organization"}, Sentence contains {Lookup.majorType == "jobtitle"}, Sentence contains {Lookup.majorType == "person_first"} } ) 回答1: One way

Are high values for c or gamma problematic when using an RBF kernel SVM?

阅读更多关于 Are high values for c or gamma problematic when using an RBF kernel SVM?

问题 I'm using WEKA/LibSVM to train a classifier for a term extraction system. My data is not linearly separable, so I used an RBF kernel instead of a linear one. I followed the guide from Hsu et al. and iterated over several values for both c and gamma. The parameters which worked best for classifying known terms (test and training material differ of course) are rather high, c=2^10 and gamma=2^3. So far the high parameters seem to work ok, yet I wonder if they may cause any problems further on,

extract NP-VP-NP from Stanford dependency parse tree

阅读更多关于 extract NP-VP-NP from Stanford dependency parse tree

问题 I need to extract triplets of the form NP-VP-NP from the dependency parse tree produced as the output of lexalized parsing in Stanford Parser. Whats the best way to do this. e.g. If the parse tree is as follows: (ROOT (S (S (NP (NNP Exercise)) (VP (VBZ reduces) (NP (NN stress))) (. .)) (NP (JJ Regular) (NN exercise)) (VP (VBZ maintains) (NP (JJ mental) (NN fitness))) (. .))) I need to extract 2 triplets: Exercise-reduces-stress and Regular Exercise-maintains-mental fitness Any ideas? 回答1:

Is there a BNF with arguments for non-terminal symbols?

阅读更多关于 Is there a BNF with arguments for non-terminal symbols?

问题 In working with Prolog DCG to parse input it is nice to have an accompaning BNF of the grammar. For example: BNF <Sentence> ::= <Noun_phrase> <Verb_phrase> <Noun_phrase> ::= <Determiner> <Noun> <Verb_phrase> ::= <Verb> <Phrase> <Determiner> ::= a <Determiner> ::= the <Noun> ::= cat <Noun> ::= mouse <Verb> ::= scares <Verb> ::= hates as Prolog DCG sentence --> noun_phrase, verb_phrase. verb_phrase --> verb, noun_phrase. noun_phrase --> determiner, noun. determiner --> [a]. determiner --> [the]

Difference between IOB Accuracy and Precision

阅读更多关于 Difference between IOB Accuracy and Precision

问题 I'm doing some works on NLTK with named entity recognition and chunkers. I retrained a classifier using nltk/chunk/named_entity.py for that and I got the following mesures: ChunkParse score: IOB Accuracy: 96.5% Precision: 78.0% Recall: 91.9% F-Measure: 84.4% But I don't understand what is the exact difference between IOB Accuracy and Precision in this case. Actually, I found on the docs (here) the following for an specific example: The IOB tag accuracy indicates that more than a third of the

Generate ngrams with Julia

阅读更多关于 Generate ngrams with Julia

问题 To generate word bigrams in Julia, I could simply zip through the original list and a list that drops the first element, e.g.: julia> s = split("the lazy fox jumps over the brown dog") 8-element Array{SubString{String},1}: "the" "lazy" "fox" "jumps" "over" "the" "brown" "dog" julia> collect(zip(s, drop(s,1))) 7-element Array{Tuple{SubString{String},SubString{String}},1}: ("the","lazy") ("lazy","fox") ("fox","jumps") ("jumps","over") ("over","the") ("the","brown") ("brown","dog") To generate a