nltk

How to grab streaming data from twitter connect with pycurl using nltk - regular expression

旧时模样 提交于 2020-01-01 19:05:12
问题 I am newbie in Python and given a task from my boss to do this : Grab streaming data from twitter connect with pycurl and output in JSON Parsing using NLTK and Regular Expression Save it to database file(mySQL) or file base(txt) Note : this is the url that i want to grab ('http://search.twitter.com/search.json?geocode=-0.789275%2C113.921327%2C1.0km&q=+near%3Aindonesia+within%3A1km&result_type=recent&rpp=10') Is there anyone know how to grab a streaming data from twitter using the step above ?

To clean text belonging to different languages in Python

南笙酒味 提交于 2020-01-01 15:33:50
问题 I have a collection of text which has sentences either entirely in English or Hindi or Marathi with ids attached to each of these sentences as 0,1,2 respectively representing the language of the text. The text irrespective of any language may have HTML tags, punctuation etc. I could clean the English sentences using my code below: import HTMLParser import re from nltk.corpus import stopwords from collections import Counter import pickle from string import punctuation #creating html_parser

Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

China☆狼群 提交于 2020-01-01 12:11:32
问题 The NLTK documentation is rather poor in this integration. The steps I followed were: Download http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip to /home/me/stanford Download http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar to /home/me/stanford Then in a ipython console: In [11]: import nltk In [12]: nltk.__version__ Out[12]: '3.1' In [13]: from nltk.tag import StanfordNERTagger Then st = StanfordNERTagger('/home/me/stanford/stanford

NLTK Data installation issues

流过昼夜 提交于 2020-01-01 09:45:54
问题 I am trying to install NLTK Data on Mac OSX 10.9 . The download directory to be set, as mentioned in NLTK 3.0 documentation, is /usr/share/nltk_data for central installation. But for this path, I get the error OSError: [Errno 13] Permission denied: '/usr/share/nltk_data' Can I set the download directory as /Users/ananya/nltk_data for central installation? I have Python 2.7 installed in my machine Thanks, Ananya 回答1: Have you tried: $ sudo python >>> import nltk >>> nltk.download() To check if

Kneser-Ney smoothing of trigrams using Python NLTK

百般思念 提交于 2020-01-01 09:18:29
问题 I'm trying to smooth a set of n-gram probabilities with Kneser-Ney smoothing using the Python NLTK. Unfortunately, the whole documentation is rather sparse. What I'm trying to do is this: I parse a text into a list of tri-gram tuples. From this list I create a FreqDist and then use that FreqDist to calculate a KN-smoothed distribution. I'm pretty sure though, that the result is totally wrong. When I sum up the individual probabilities I get something way beyond 1. Take this code example:

NLTK classify interface using trained classifier

偶尔善良 提交于 2020-01-01 06:54:22
问题 I have this little chunk of code I found here: import nltk.classify.util from nltk.classify import NaiveBayesClassifier from nltk.corpus import movie_reviews from nltk.corpus import stopwords def word_feats(words): return dict([(word, True) for word in words]) negids = movie_reviews.fileids('neg') posids = movie_reviews.fileids('pos') negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids] posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in

FreqDist in NLTK not sorting output

*爱你&永不变心* 提交于 2020-01-01 04:40:08
问题 I'm new to Python and I'm trying to teach myself language processing. NLTK in python has a function called FreqDist that gives the frequency of words in a text, but for some reason it's not working properly. This is what the tutorial has me write: fdist1 = FreqDist(text1) vocabulary1 = fdist1.keys() vocabulary1[:50] So basically it's supposed to give me a list of the 50 most frequent words in the text. When I run the code, though, the result is the 50 least frequent words in order of least

Get synonyms from synset returns error - Python

◇◆丶佛笑我妖孽 提交于 2020-01-01 03:42:09
问题 I'm trying to get synonyms of a given word using Wordnet. The problem is that despite I'm doing the same as is written here: here, it returns error. Here is my code: from nltk.corpus import wordnet as wn import nltk dog = wn.synset('dog.n.01') print dog.lemma_names >>> <bound method Synset.lemma_names of Synset('dog.n.01')> for i,j in enumerate(wn.synsets('small')): print "Synonyms:", ", ".join(j.lemma_names) >>> Synonyms: Traceback (most recent call last): File "C:/Users/Python

extracting relations from text

不问归期 提交于 2020-01-01 03:31:17
问题 I want to extract relations from unstructured text in the form of (SUBJECT,OBJECT,ACTION) relations, for instance, "The boy is sitting on the table eating the chicken" would give me, (boy,chicken,eat) (boy,table,LOCATION) etc.. although a python program + NLTK could process such a simple sentence as above. I'd like to know if any of you have used tools or libraries preferably opensource to extract relations from a much wider domain such as a large collection of text documents or the web. 回答1:

python nltk keyword extraction from sentence

Deadly 提交于 2020-01-01 03:21:10
问题 "First thing we do, let's kill all the lawyers." - William Shakespeare Given the quote above, I would like to pull out "kill" and "lawyers" as the two prominent keywords to describe the overall meaning of the sentence. I have extracted the following noun/verb POS tags: [["First", "NNP"], ["thing", "NN"], ["do", "VBP"], ["lets", "NNS"], ["kill", "VB"], ["lawyers", "NNS"]] The more general problem I am trying to solve is to distill a sentence to the "most important"* words/tags to summarise the