nlp

how to search a word in xml file and print it in python

纵饮孤独 提交于 2020-01-05 11:07:30
问题 i want to search a specific word(which is entered by user) in .xml file. This is my xml file. <?xml version="1.0" encoding="UTF-8"?> <words> <entry> <word>John</word> <pron>()</pron> <gram>[Noun]</gram> <poem></poem> <meanings> <meaning>name</meaning> </meanings> </entry> </words> here is my Code import nltk from nltk.tokenize import word_tokenize import os import xml.etree.ElementTree as etree sen = input("Enter Your sentence - ") print(sen) print("\n") print(word_tokenize(sen)[0]) tree =

Arabic WordNet with not-formatted words

梦想的初衷 提交于 2020-01-05 08:47:42
问题 Is it necessary for the word input to WordNet to be formatted like "التُّفَّاحْ" and can't expect "التفاح"... is there any library or service taking not-formatted Arabic word returning a list of all its possible synonyms. 回答1: From التُّفَّاحْ to التفاح , you simply want to remove the diacritics then you need a lexical normalization tool. Try Tashaphyne, download and install then use the normalize module http://pythonhosted.org/Tashaphyne/Tashaphyne.normalize-module.html : from Tashaphyne

Python - Can a web server avoid imporing for every request?

若如初见. 提交于 2020-01-05 08:17:10
问题 I'm working on a Python project, currently using Django, which does quite a bit of NLP work in a form post process. I'm using the NLTK package, and profiling my code and experimenting I've realised that the majority of the time the code takes is performing the import process of NLTK and various other packages. My question is, is there a way I can have this server start up, do these imports and then just wait for requests, passing them to a function that uses the already imported packages?

quanteda kwic regex operation

六月ゝ 毕业季﹏ 提交于 2020-01-05 06:47:59
问题 Further edit to original question . Question originated by expectation that regexes would work identically or nearly to "grep" or to some programming language. This below is what I expected and the fact that it did not happen generated my question (using cygwin): echo "regex unusual operation will deport into a different" > out.txt grep "will * dep" out.txt "regex unusual operation will deport into a different" Originary question Trying to follow https://github.com/kbenoit/ITAUR/blob/master

Semantic Similarity across multiple languages

感情迁移 提交于 2020-01-05 05:36:06
问题 I am using word embeddings for finding similarity between two sentences. Using word2vec, I also get a similarity measure if one sentence is in English and the other one in Dutch (though not very good). So I started wondering if it's possible to compute the similarity between two sentences in two different languages (without an explicit translation), especially if the languages have some similarities (Englis/Dutch)? 回答1: Let's assume that your sentence-similarity scheme uses only word-vectors

Lemmatizing words after POS tagging produces unexpected results

老子叫甜甜 提交于 2020-01-05 03:54:05
问题 I am using python3.5 with the nltk pos_tag function and the WordNetLemmatizer. My goal is to flatten words in our database to classify text. I am trying to test using the lemmatizer and I encounter strange behavior when using the POS tagger on identical tokens. In the example below, I have a list of three strings and when running them in the POS tagger every other element is returned as a noun(NN) and the rest are return as verbs (VBG). This affects the lemmatization. The out put looks like

Mapping Wordnet Senses to Verbnet

血红的双手。 提交于 2020-01-04 15:32:27
问题 http://digital.library.unt.edu/ark:/67531/metadc30973/m2/1/high_res_d/Mihalcea-2005-Putting_Pieces_Together-Combining_FrameNet.pdf In the link above on the sixth page, the paper mentions that a mapping was made. "The process of mapping VerbNet to WordNet is thus semi-automatic. We first manually link all semantic constraints defined in VerbNet (there are 36 such constraints) to one or more nodes in the WordNet semantic hierarchy." I am trying to use this mapping on NLTK Python with Verbnet

How to use entitymentions annotator in stanford CoreNLP?

巧了我就是萌 提交于 2020-01-04 13:38:29
问题 I am trying the newest version of Stanford CoreNLP. When I extract location or organisation names, I see that every single word is tagged with the annotation. So, if the entity is "NEW YORK TIMES", then it is getting recorded as three different entities : "NEW", "YORK" and "TIMES". I find that the newest CoreNLP have "entitymentions" annotator. I think this annotator may help me to solve this problem. However, there is no usage instruction or example for this annotator. Could anyone give me

Use string as input in Keras IMDB example

不羁岁月 提交于 2020-01-04 05:51:30
问题 I was looking at the Keras IMDB Movie reviews sentiment classification example (and the corresponding model on github), which learns to decide whether a review is positive or negative. The data has been preprocessed such that each review is encoded as a sequence of integers, e.g. the review "This movie is awesome!" would be [11, 17, 6, 1187] and for this input the model gives the output 'positive'. The dataset also makes available the word index used for encoding the sequences, i.e. I know

Use string as input in Keras IMDB example

江枫思渺然 提交于 2020-01-04 05:51:05
问题 I was looking at the Keras IMDB Movie reviews sentiment classification example (and the corresponding model on github), which learns to decide whether a review is positive or negative. The data has been preprocessed such that each review is encoded as a sequence of integers, e.g. the review "This movie is awesome!" would be [11, 17, 6, 1187] and for this input the model gives the output 'positive'. The dataset also makes available the word index used for encoding the sequences, i.e. I know