nlp

Coreference resolution in python nltk using Stanford coreNLP

末鹿安然 提交于 2019-12-29 06:20:19
问题 Stanford CoreNLP provides coreference resolution as mentioned here, also this thread, this, provides some insights about its implementation in Java. However, I am using python and NLTK and I am not sure how can I use Coreference resolution functionality of CoreNLP in my python code. I have been able to set up StanfordParser in NLTK, this is my code so far. from nltk.parse.stanford import StanfordDependencyParser stanford_parser_dir = 'stanford-parser/' eng_model_path = stanford_parser_dir +

Coreference resolution in python nltk using Stanford coreNLP

≡放荡痞女 提交于 2019-12-29 06:18:14
问题 Stanford CoreNLP provides coreference resolution as mentioned here, also this thread, this, provides some insights about its implementation in Java. However, I am using python and NLTK and I am not sure how can I use Coreference resolution functionality of CoreNLP in my python code. I have been able to set up StanfordParser in NLTK, this is my code so far. from nltk.parse.stanford import StanfordDependencyParser stanford_parser_dir = 'stanford-parser/' eng_model_path = stanford_parser_dir +

How does language detection work?

泄露秘密 提交于 2019-12-29 04:42:31
问题 I have been wondering for some time how does Google translate(or maybe a hypothetical translator) detect language from the string entered in the "from" field. I have been thinking about this and only thing I can think of is looking for words that are unique to a language in the input string. The other way could be to check sentence formation or other semantics in addition to keywords. But this seems to be a very difficult task considering different languages and their semantics. I did some

Simple Natural Language Processing Startup for Java [duplicate]

假如想象 提交于 2019-12-29 03:33:32
问题 This question already has answers here : Is there a good natural language processing library [closed] (3 answers) Closed 5 years ago . I am willing to start developing a project on NLP. I dont know much of the tools available. After googling for about a month. I realized that openNLP can be my solution. Unfortunately i dont see any complete tutorial over using the API. All of them are lacking of some general steps. I need a tutorial from ground level. I have seen a lot of downloads over the

nltk language model (ngram) calculate the prob of a word from context

青春壹個敷衍的年華 提交于 2019-12-29 03:21:40
问题 I am using Python and NLTK to build a language model as follows: from nltk.corpus import brown from nltk.probability import LidstoneProbDist, WittenBellProbDist estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2) lm = NgramModel(3, brown.words(categories='news'), estimator) # Thanks to miku, I fixed this problem print lm.prob("word", ["This is a context which generates a word"]) >> 0.00493261081006 # But I got another program like this one... print lm.prob("b", ["This is a context

Find multi-word terms in a tokenized text in Python

雨燕双飞 提交于 2019-12-29 01:49:07
问题 I have a text that I have tokenized, or in general a list of words is ok as well. For example: >>> from nltk.tokenize import word_tokenize >>> s = '''Good muffins cost $3.88\nin New York. Please buy me ... two of them.\n\nThanks.''' >>> word_tokenize(s) ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.'] If I have a Python dict that contains single word as well as multi-word keys, how can I efficiently and

Split string into sentences using regex

有些话、适合烂在心里 提交于 2019-12-28 12:14:11
问题 I have random text stored in $sentences . Using regex, I want to split the text into sentences, see: function splitSentences($text) { $re = '/ # Split sentences on whitespace between them. (?<= # Begin positive lookbehind. [.!?] # Either an end of sentence punct, | [.!?][\'"] # or end of sentence punct and quote. ) # End positive lookbehind. (?<! # Begin negative lookbehind. Mr\. # Skip either "Mr." | Mrs\. # or "Mrs.", | T\.V\.A\. # or "T.V.A.", # or... (you get the idea). ) # End negative

what is the true difference between lemmatization vs stemming?

≡放荡痞女 提交于 2019-12-28 07:36:31
问题 When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was? 回答1: Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off

How to train the Stanford NLP Sentiment Analysis tool

左心房为你撑大大i 提交于 2019-12-28 01:55:20
问题 Hell everyone! I'm using the Stanford Core NLP package and my goal is to perform sentiment analysis on a live-stream of tweets. Using the sentiment analysis tool as is returns a very poor analysis of text's 'attitude' .. many positives are labeled neutral, many negatives rated positive. I've gone ahead an acquired well over a million tweets in a text file, but I haven't a clue how to actually train the tool and create my own model. Link to Stanford Sentiment Analysis page "Models can be

NLP系列(7)_Transformer详解

这一生的挚爱 提交于 2019-12-27 16:31:12
Ref https://jalammar.github.io/illustrated-transformer/ , https://blog.csdn.net/han_xiaoyang/article/details/86560459 编者按:前一段时间谷歌推出的BERT模型在11项NLP任务中夺得SOTA结果,引爆了整个NLP界。而BERT取得成功的一个关键因素是Transformer的强大作用。谷歌的Transformer模型最早是用于机器翻译任务,当时达到了SOTA效果。Transformer改进了RNN最被人诟病的训练慢的缺点,利用self-attention机制实现快速并行。并且Transformer可以增加到非常深的深度,充分发掘DNN模型的特性,提升模型准确率。在本文中,我们将研究Transformer模型,把它掰开揉碎,理解它的工作原理。 正文: Transformer由论文《Attention is All You Need》提出,现在是谷歌云TPU推荐的参考模型。论文相关的Tensorflow的代码可以从GitHub获取,其作为Tensor2Tensor包的一部分。哈佛的NLP团队也实现了一个基于PyTorch的版本,并注释该论文。 在本文中,我们将试图把模型简化一点,并逐一介绍里面的核心概念,希望让普通读者也能轻易理解。 Attention is All