nltk | 易学教程

Semantic Similarity across multiple languages

阅读更多关于 Semantic Similarity across multiple languages

问题 I am using word embeddings for finding similarity between two sentences. Using word2vec, I also get a similarity measure if one sentence is in English and the other one in Dutch (though not very good). So I started wondering if it's possible to compute the similarity between two sentences in two different languages (without an explicit translation), especially if the languages have some similarities (Englis/Dutch)? 回答1: Let's assume that your sentence-similarity scheme uses only word-vectors

Lemmatizing words after POS tagging produces unexpected results

阅读更多关于 Lemmatizing words after POS tagging produces unexpected results

问题 I am using python3.5 with the nltk pos_tag function and the WordNetLemmatizer. My goal is to flatten words in our database to classify text. I am trying to test using the lemmatizer and I encounter strange behavior when using the POS tagger on identical tokens. In the example below, I have a list of three strings and when running them in the POS tagger every other element is returned as a noun(NN) and the rest are return as verbs (VBG). This affects the lemmatization. The out put looks like

Mapping Wordnet Senses to Verbnet

阅读更多关于 Mapping Wordnet Senses to Verbnet

问题 http://digital.library.unt.edu/ark:/67531/metadc30973/m2/1/high_res_d/Mihalcea-2005-Putting_Pieces_Together-Combining_FrameNet.pdf In the link above on the sixth page, the paper mentions that a mapping was made. "The process of mapping VerbNet to WordNet is thus semi-automatic. We first manually link all semantic constraints defined in VerbNet (there are 36 such constraints) to one or more nodes in the WordNet semantic hierarchy." I am trying to use this mapping on NLTK Python with Verbnet

lemmatize plural nouns using nltk and wordnet

阅读更多关于 lemmatize plural nouns using nltk and wordnet

问题 I want to lemmatize using from nltk import word_tokenize, sent_tokenize, pos_tag from nltk.stem.wordnet import WordNetLemmatizer from nltk.corpus import wordnet lmtzr = WordNetLemmatizer() POS = pos_tag(text) def get_wordnet_pos(treebank_tag): #maps pos tag so lemmatizer understands from nltk.corpus import wordnet if treebank_tag.startswith('J'): return wordnet.ADJ elif treebank_tag.startswith('V'): return wordnet.VERB elif treebank_tag.startswith('N'): return wordnet.NOUN elif treebank_tag

Extracting words from txt file using python

阅读更多关于 Extracting words from txt file using python

问题 I want to extract all the words that are between single quotation marks from a text file. The text file looks like this: u'MMA': 10, =u'acrylic'= : 19, == u'acting lessons': 2, =u'aerobic': 141, =u'alto': 2= 4, =u&#= 39;art therapy': 4, =u'ballet': 939, =u'ballroom'= ;: 234, = =u'banjo': 38, And ideally, my output would look lie this: MMA, acrylic, acting lessons, ... From browsing posts, it seems like I should use some combination of NLTK / regex for python to accomplish this. I've tried the

How can I get the stanford NLTK python module?

阅读更多关于 How can I get the stanford NLTK python module?

问题 I have the python (2.7.5) and python-nltk packages installed in Ubuntu 13.10. Running apt-cache policy python-nltk returns: python-nltk: Installed: 2.0~b9-0ubuntu4 And according to the Stanford site, 2.0+ should have the stanford module. Yet when I try to import it, I get an error: >>> import nltk.tag.stanford Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named stanford How can I get the stanford module? (Preferably through the usual

How can I get the stanford NLTK python module?

阅读更多关于 How can I get the stanford NLTK python module?

NLTK words lemmatizing

阅读更多关于 NLTK words lemmatizing

问题 I am trying to do lemmatization on words with NLTK . What I can find now is that I can use the stem package to get some results like transform "cars" to "car" and "women" to "woman", however I cannot do lemmatization on some words with affixes like "acknowledgement". When using WordNetLemmatizer() on "acknowledgement", it returns "acknowledgement" and using .PorterStemmer() , it returns "acknowledg" rather than "acknowledge". Can anyone tell me how to eliminate the affixes of words? Say, when

Get gender from noun using NLTK with German corpora

阅读更多关于 Get gender from noun using NLTK with German corpora

问题 I'm experimenting with NTLK. My question is if the library can detect the gender of a noun in German. I want to receive this information in order to determine if a text is written gender neutral. See here for more information: https://en.wikipedia.org/wiki/Gender_neutrality_in_languages_with_grammatical_gender The underlying code categorizes my sentence, but I can't see any information about the gender of "Mitarbeiter" . My code so far: sentence = """Der Mitarbeiter geht.""" tokens = nltk

Get gender from noun using NLTK with German corpora

阅读更多关于 Get gender from noun using NLTK with German corpora