nltk

Unicode Tagging in Python NLTK

烈酒焚心 提交于 2020-01-06 13:54:21
问题 I am working on a python NLTK tagging program. My input file is Hindi text containing several lines. On tokenizing the text and using pos_tag the output I get is with NN tag only. but with English sentence as input it does proper tagging. Kindly Help. Version - Python 3.4.1, from NLTK 3.0 documentation Kindly help! here is what I tried. word_to_be_tagged = u"ताजो स्वास आनी चकचकीत दांत तुमचें व्यक्तीमत्व परजळायतात." from nltk.corpus import indian train_data = indian.tagged_sents('hindi.pos')[

How to calculate prediction probability in python and NLTK?

微笑、不失礼 提交于 2020-01-06 08:43:10
问题 I am trying to calculate each prediction probability in SVM model by using LinearSVC and OneVsRestClassifier but getting the error AttributeError: 'LinearSVC' object has no attribute 'predict_proba' tried code: model = Pipeline([('vectorizer', CountVectorizer(ngram_range=(1,2))), ('tfidf', TfidfTransformer(use_idf=True)), ('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))]) model.fit(X_train, y_train) y_train.shape pred = model.predict(X_test) probas = model.predict_proba(X_test

Sorting FreqDist in NLTK with get vs get()

被刻印的时光 ゝ 提交于 2020-01-06 02:35:11
问题 I am playing around with NLTK and the module freqDist import nltk from nltk.corpus import gutenberg print(gutenberg.fileids()) from nltk import FreqDist fd = FreqDist() for word in gutenberg.words('austen-persuasion.txt'): fd[word] += 1 newfd = sorted(fd, key=fd.get, reverse=True)[:10] So I am playing around with NLTK and have a question regarding the sort portion. When I run the code like this it properly sorts the freqDist object. However when I run it with get() instead of get I encounter

Force Stanford CoreNLP Parser to Prioritize 'S' Label at Root Level

旧时模样 提交于 2020-01-06 01:32:23
问题 Greetings NLP Experts, I am using the Stanford CoreNLP software package to produce constituency parses, using the most recent version (3.9.2) of the English language models JAR, downloaded from the CoreNLP Download page. I access the parser via the Python interface from the NLTK module nltk.parse.corenlp. Here is a snippet from the top of my main module: import nltk from nltk.tree import ParentedTree from nltk.parse.corenlp import CoreNLPParser parser = CoreNLPParser(url='http://localhost

How to save a nltk FreqDist plot?

£可爱£侵袭症+ 提交于 2020-01-06 01:17:16
问题 I've tried different methods to save my plot but every thing I've tried has turned up with a blank image and I'm not currently out of ideas. Any help with other suggestions that could fix this? The code sample is below. word_frequency = nltk.FreqDist(merged_lemmatizedTokens) #obtains frequency distribution for each token print("\nMost frequent top-10 words: ", word_frequency.most_common(10)) word_frequency.plot(10, title='Top 10 Most Common Words in Corpus') plt.savefig('img_top10_common.png'

Real difficulty installing NLTK on MAC OS X 10.9

霸气de小男生 提交于 2020-01-05 12:29:30
问题 I'm new to Python/Mac OS and I'm looking to work through the NLTK textbook, but I'm having some problems installing it. I've been looking for solutions to this for a while now but unfortunately all the solutions don't seem to be able to work for me (or I'm misunderstanding exactly how to utilize them). The basic problem I'm having is that NLTK just doesn't seem to be installed despite following the instructions. The following code gives me an error that no such module exists: import nltk nltk

how to search a word in xml file and print it in python

纵饮孤独 提交于 2020-01-05 11:07:30
问题 i want to search a specific word(which is entered by user) in .xml file. This is my xml file. <?xml version="1.0" encoding="UTF-8"?> <words> <entry> <word>John</word> <pron>()</pron> <gram>[Noun]</gram> <poem></poem> <meanings> <meaning>name</meaning> </meanings> </entry> </words> here is my Code import nltk from nltk.tokenize import word_tokenize import os import xml.etree.ElementTree as etree sen = input("Enter Your sentence - ") print(sen) print("\n") print(word_tokenize(sen)[0]) tree =

Getting the closest noun from a stemmed word

限于喜欢 提交于 2020-01-05 10:09:46
问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its

Getting the closest noun from a stemmed word

牧云@^-^@ 提交于 2020-01-05 10:09:10
问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its

How to print out tags in python

社会主义新天地 提交于 2020-01-05 09:33:37
问题 If I have a string such as this: text = "They refuse to permit us." txt = nltk.word_tokenize(text) With this if I print POS tags; nltk.pos_tag(txt) I get [('They','PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP')] How can I print out only this: ['PRP', 'VBP', 'TO', 'VB', 'PRP'] 回答1: You got a list of tuples, you should iterate through it to get only the second element of each tuple. >>> tagged = nltk.pos_tag(txt) >>> tags = [ e[1] for e in tagged] >>> tags ['PRP', 'VBP'