nlp | 易学教程

how to search a word in xml file and print it in python

阅读更多关于 how to search a word in xml file and print it in python

问题 i want to search a specific word(which is entered by user) in .xml file. This is my xml file. <?xml version="1.0" encoding="UTF-8"?> <words> <entry> <word>John</word> <pron>()</pron> <gram>[Noun]</gram> <poem></poem> <meanings> <meaning>name</meaning> </meanings> </entry> </words> here is my Code import nltk from nltk.tokenize import word_tokenize import os import xml.etree.ElementTree as etree sen = input("Enter Your sentence - ") print(sen) print("\n") print(word_tokenize(sen)[0]) tree =

Arabic WordNet with not-formatted words

阅读更多关于 Arabic WordNet with not-formatted words

问题 Is it necessary for the word input to WordNet to be formatted like "التُّفَّاحْ" and can't expect "التفاح"... is there any library or service taking not-formatted Arabic word returning a list of all its possible synonyms. 回答1: From التُّفَّاحْ to التفاح , you simply want to remove the diacritics then you need a lexical normalization tool. Try Tashaphyne, download and install then use the normalize module http://pythonhosted.org/Tashaphyne/Tashaphyne.normalize-module.html : from Tashaphyne

Python - Can a web server avoid imporing for every request?

阅读更多关于 Python - Can a web server avoid imporing for every request?

问题 I'm working on a Python project, currently using Django, which does quite a bit of NLP work in a form post process. I'm using the NLTK package, and profiling my code and experimenting I've realised that the majority of the time the code takes is performing the import process of NLTK and various other packages. My question is, is there a way I can have this server start up, do these imports and then just wait for requests, passing them to a function that uses the already imported packages?

quanteda kwic regex operation

阅读更多关于 quanteda kwic regex operation

问题 Further edit to original question . Question originated by expectation that regexes would work identically or nearly to "grep" or to some programming language. This below is what I expected and the fact that it did not happen generated my question (using cygwin): echo "regex unusual operation will deport into a different" > out.txt grep "will * dep" out.txt "regex unusual operation will deport into a different" Originary question Trying to follow https://github.com/kbenoit/ITAUR/blob/master

Semantic Similarity across multiple languages

阅读更多关于 Semantic Similarity across multiple languages

问题 I am using word embeddings for finding similarity between two sentences. Using word2vec, I also get a similarity measure if one sentence is in English and the other one in Dutch (though not very good). So I started wondering if it's possible to compute the similarity between two sentences in two different languages (without an explicit translation), especially if the languages have some similarities (Englis/Dutch)? 回答1: Let's assume that your sentence-similarity scheme uses only word-vectors

Lemmatizing words after POS tagging produces unexpected results

阅读更多关于 Lemmatizing words after POS tagging produces unexpected results

问题 I am using python3.5 with the nltk pos_tag function and the WordNetLemmatizer. My goal is to flatten words in our database to classify text. I am trying to test using the lemmatizer and I encounter strange behavior when using the POS tagger on identical tokens. In the example below, I have a list of three strings and when running them in the POS tagger every other element is returned as a noun(NN) and the rest are return as verbs (VBG). This affects the lemmatization. The out put looks like

Mapping Wordnet Senses to Verbnet

阅读更多关于 Mapping Wordnet Senses to Verbnet

问题 http://digital.library.unt.edu/ark:/67531/metadc30973/m2/1/high_res_d/Mihalcea-2005-Putting_Pieces_Together-Combining_FrameNet.pdf In the link above on the sixth page, the paper mentions that a mapping was made. "The process of mapping VerbNet to WordNet is thus semi-automatic. We first manually link all semantic constraints defined in VerbNet (there are 36 such constraints) to one or more nodes in the WordNet semantic hierarchy." I am trying to use this mapping on NLTK Python with Verbnet

How to use entitymentions annotator in stanford CoreNLP?

阅读更多关于 How to use entitymentions annotator in stanford CoreNLP?

问题 I am trying the newest version of Stanford CoreNLP. When I extract location or organisation names, I see that every single word is tagged with the annotation. So, if the entity is "NEW YORK TIMES", then it is getting recorded as three different entities : "NEW", "YORK" and "TIMES". I find that the newest CoreNLP have "entitymentions" annotator. I think this annotator may help me to solve this problem. However, there is no usage instruction or example for this annotator. Could anyone give me

Use string as input in Keras IMDB example

阅读更多关于 Use string as input in Keras IMDB example

问题 I was looking at the Keras IMDB Movie reviews sentiment classification example (and the corresponding model on github), which learns to decide whether a review is positive or negative. The data has been preprocessed such that each review is encoded as a sequence of integers, e.g. the review "This movie is awesome!" would be [11, 17, 6, 1187] and for this input the model gives the output 'positive'. The dataset also makes available the word index used for encoding the sequences, i.e. I know

Use string as input in Keras IMDB example

阅读更多关于 Use string as input in Keras IMDB example