nltk

How to install NLTK modules in Heroku

有些话、适合烂在心里 提交于 2020-12-29 09:33:08
问题 Hey i'd like to install the NLTK pos_tag on my Heroku server. How can i do so. Please give me the steps as im new to the Heroku server system. 回答1: I just added official nltk support to the buildpack! Simply add a nltk.txt file with a list of corpora you want installed, and everything should work as expected. 回答2: Update As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txt file to your root directory and list your corpora inside.

python nltk — stemming list of sentences/phrases

假如想象 提交于 2020-12-26 09:02:51
问题 I have bunch of sentences in a list and I wanted to use nltk library to stem it. I am able to stem one sentence at a time, however I am having issues stemming sentences from a list and joining them back together. Is there a step I am missing? Quite new to nltk library. Thanks! import nltk from nltk.stem import PorterStemmer ps = PorterStemmer() # Success: one sentences at a time data = 'the gamers playing games' words = word_tokenize(data) for w in words: print(ps.stem(w)) # Fails: data_list

Any way to import Python's nltk.download('punkt') into Google Cloud Functions?

不打扰是莪最后的温柔 提交于 2020-12-15 05:05:14
问题 Any way to import Python's nltk.download('punkt') into Google Cloud Functions? I've found that adding the statement manually into my code block in main.py significantly slows down my function processing, since punkt has to be downloaded every time it is run. Is there any method to eliminate this by calling punkt in some other way? EDIT#1:- I edited my code and program structure to match what Barak suggested, but I keep getting the same error: Error: function terminated. Recommended action:

Any way to import Python's nltk.download('punkt') into Google Cloud Functions?

笑着哭i 提交于 2020-12-15 05:03:40
问题 Any way to import Python's nltk.download('punkt') into Google Cloud Functions? I've found that adding the statement manually into my code block in main.py significantly slows down my function processing, since punkt has to be downloaded every time it is run. Is there any method to eliminate this by calling punkt in some other way? EDIT#1:- I edited my code and program structure to match what Barak suggested, but I keep getting the same error: Error: function terminated. Recommended action:

Any way to import Python's nltk.download('punkt') into Google Cloud Functions?

筅森魡賤 提交于 2020-12-15 05:02:01
问题 Any way to import Python's nltk.download('punkt') into Google Cloud Functions? I've found that adding the statement manually into my code block in main.py significantly slows down my function processing, since punkt has to be downloaded every time it is run. Is there any method to eliminate this by calling punkt in some other way? EDIT#1:- I edited my code and program structure to match what Barak suggested, but I keep getting the same error: Error: function terminated. Recommended action:

Sentence structure analysis

限于喜欢 提交于 2020-12-15 01:41:40
问题 I am trying to look at the structure similarity of sentences, specifically to the position of verbs, adj, nouns. For instance, I have three (or more) sentences which look likes as follows: I ate an apple pie, yesterday. I ate an orange, yesterday. I eat a lemon, today. All of them starts with a pronoun (I) followed by a verb (ate/eat) and a noun (apple pie, orange, lemon) and, finally, an adverb (yesterday/tomorrow). I would like to know if there is a way to identify the structure, i.e.

How to clean a string to get value_counts for words of interest by date?

牧云@^-^@ 提交于 2020-12-08 05:08:43
问题 I have the following data generated from a groupby('Datetime') and value_counts() Datetime 0 01/01/2020 Paul 8 03 2 01/02/2020 Paul 2 10982360967 1 01/03/2020 religion 3 .. 02/28/2020 l 18 02/29/2020 Paul 78 march 22 03/01/2020 church 63 l 21 I would like to remove a specific name (in this case I would like to remove 'Paul') and all the numbers (03, 10982360967 in this specific example). I do not know why there is a character 'l' as I had tried to remove stopwords including alphabet (and

How to rate quality of a (scraped) sentence?

六月ゝ 毕业季﹏ 提交于 2020-12-06 15:09:07
问题 I am running a scrape and process routine in Python3 - but some of the sentences I get are garbage. I would like to reject these but cant figure out how to do it. I am using POS tagging and chunking with NLTK but that doesn't seem to help me identify non-valid sentences. The number of NNs, VBs etc. doesn't seem to be any different in a garbage "sentence" than a good one. I guess I am just looking for a simple method to score the grammar of a sentence and reject ones with too many "errors". I

自然语言处理(NLP)

不羁的心 提交于 2020-11-18 23:57:34
正如我在< 自然语言处理(NLP) - 数学基础(1) - 总述 >一文中所提到的NLP所关联的概率论(Probability Theory)知识点是如此的多, 饭只能一口一口地吃了, 我们先开始最为大家熟知和最基础的知识点吧, 排列组合. 虽然排列组合这个知识点大家是相当地熟知, 也是相当地基础, 但是却是十分十分十分地重要. NLP届掌门人斯坦福大学的Daniel Jurafsky(D. 朱夫斯凯)和科罗拉多大学James H. Martin(J. H. 马丁)在其NLP巨作《 自然语言处理综论 》一书第二版第5页中提到:“几乎所有的语音处理和语言处理问题都可以这样来表述: 对于某个歧义的输入给出N个可能性, 选择其中概率最高的一个.” 现在让我们来看看排列组合概念的定义吧: 所谓排列,就是指从给定个数的元素中取出指定个数的元素进行排序。所谓组合则是指从给定个数的元素中仅仅取出指定个数的元素,不考虑排序。 See, 与掌门人上面这句话相比, 是如此的相似! 排列组合有两条基本原理组成: 加法原理(分类计数法)- 做一件事,完成它可以有n类办法,在第一类办法中有m1种不同的方法,在第二类办法中有m2种不同的方法,……,在第n类办法中有mn种不同的方法,那么完成这件事共有N=m1+m2+m3+…+mn种不同方法。每一种方法都能够直接达成目标。 乘法原理(分步计数法). 做一件事