nltk | 易学教程

Unicode Tagging in Python NLTK

阅读更多关于 Unicode Tagging in Python NLTK

问题 I am working on a python NLTK tagging program. My input file is Hindi text containing several lines. On tokenizing the text and using pos_tag the output I get is with NN tag only. but with English sentence as input it does proper tagging. Kindly Help. Version - Python 3.4.1, from NLTK 3.0 documentation Kindly help! here is what I tried. word_to_be_tagged = u"ताजो स्वास आनी चकचकीत दांत तुमचें व्यक्तीमत्व परजळायतात." from nltk.corpus import indian train_data = indian.tagged_sents('hindi.pos')[

How to calculate prediction probability in python and NLTK?

阅读更多关于 How to calculate prediction probability in python and NLTK?

问题 I am trying to calculate each prediction probability in SVM model by using LinearSVC and OneVsRestClassifier but getting the error AttributeError: 'LinearSVC' object has no attribute 'predict_proba' tried code: model = Pipeline([('vectorizer', CountVectorizer(ngram_range=(1,2))), ('tfidf', TfidfTransformer(use_idf=True)), ('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))]) model.fit(X_train, y_train) y_train.shape pred = model.predict(X_test) probas = model.predict_proba(X_test

Sorting FreqDist in NLTK with get vs get()

阅读更多关于 Sorting FreqDist in NLTK with get vs get()

问题 I am playing around with NLTK and the module freqDist import nltk from nltk.corpus import gutenberg print(gutenberg.fileids()) from nltk import FreqDist fd = FreqDist() for word in gutenberg.words('austen-persuasion.txt'): fd[word] += 1 newfd = sorted(fd, key=fd.get, reverse=True)[:10] So I am playing around with NLTK and have a question regarding the sort portion. When I run the code like this it properly sorts the freqDist object. However when I run it with get() instead of get I encounter

Force Stanford CoreNLP Parser to Prioritize 'S' Label at Root Level

阅读更多关于 Force Stanford CoreNLP Parser to Prioritize 'S' Label at Root Level

问题 Greetings NLP Experts, I am using the Stanford CoreNLP software package to produce constituency parses, using the most recent version (3.9.2) of the English language models JAR, downloaded from the CoreNLP Download page. I access the parser via the Python interface from the NLTK module nltk.parse.corenlp. Here is a snippet from the top of my main module: import nltk from nltk.tree import ParentedTree from nltk.parse.corenlp import CoreNLPParser parser = CoreNLPParser(url='http://localhost

How to save a nltk FreqDist plot?

阅读更多关于 How to save a nltk FreqDist plot?

问题 I've tried different methods to save my plot but every thing I've tried has turned up with a blank image and I'm not currently out of ideas. Any help with other suggestions that could fix this? The code sample is below. word_frequency = nltk.FreqDist(merged_lemmatizedTokens) #obtains frequency distribution for each token print("\nMost frequent top-10 words: ", word_frequency.most_common(10)) word_frequency.plot(10, title='Top 10 Most Common Words in Corpus') plt.savefig('img_top10_common.png'

Real difficulty installing NLTK on MAC OS X 10.9

阅读更多关于 Real difficulty installing NLTK on MAC OS X 10.9

问题 I'm new to Python/Mac OS and I'm looking to work through the NLTK textbook, but I'm having some problems installing it. I've been looking for solutions to this for a while now but unfortunately all the solutions don't seem to be able to work for me (or I'm misunderstanding exactly how to utilize them). The basic problem I'm having is that NLTK just doesn't seem to be installed despite following the instructions. The following code gives me an error that no such module exists: import nltk nltk

how to search a word in xml file and print it in python

阅读更多关于 how to search a word in xml file and print it in python

问题 i want to search a specific word(which is entered by user) in .xml file. This is my xml file. <?xml version="1.0" encoding="UTF-8"?> <words> <entry> <word>John</word> <pron>()</pron> <gram>[Noun]</gram> <poem></poem> <meanings> <meaning>name</meaning> </meanings> </entry> </words> here is my Code import nltk from nltk.tokenize import word_tokenize import os import xml.etree.ElementTree as etree sen = input("Enter Your sentence - ") print(sen) print("\n") print(word_tokenize(sen)[0]) tree =

Getting the closest noun from a stemmed word

阅读更多关于 Getting the closest noun from a stemmed word

问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its

Getting the closest noun from a stemmed word

阅读更多关于 Getting the closest noun from a stemmed word

How to print out tags in python

阅读更多关于 How to print out tags in python

问题 If I have a string such as this: text = "They refuse to permit us." txt = nltk.word_tokenize(text) With this if I print POS tags; nltk.pos_tag(txt) I get [('They','PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP')] How can I print out only this: ['PRP', 'VBP', 'TO', 'VB', 'PRP'] 回答1: You got a list of tuples, you should iterate through it to get only the second element of each tuple. >>> tagged = nltk.pos_tag(txt) >>> tags = [ e[1] for e in tagged] >>> tags ['PRP', 'VBP'