nlp

How to detect language of user entered text? [closed]

感情迁移 提交于 2019-12-27 11:39:13
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI. Is there an existing Java library to detect the language of a text? I want

How to detect language of user entered text? [closed]

て烟熏妆下的殇ゞ 提交于 2019-12-27 11:39:13
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI. Is there an existing Java library to detect the language of a text? I want

机器不学习:浅析深度学习在实体识别和关系抽取中的应用

我怕爱的太早我们不能终老 提交于 2019-12-27 11:26:56
机器不学习 jqbxx.com -机器学习好网站 命名实体识别(Named Entity Recognition,NER)就是从一段自然语言文本中找出相关实体,并标注出其位置以及类型,如下图。命名实体识别是NLP领域中的一些复杂任务的基础问题,诸如自动问答,关系抽取,信息检索等 ,其效果直接影响后续处理的效果,因此是NLP研究的一个基础问题。 NER一直是NLP领域中的研究热点,现在越来越多的被应用于专业的领域,如医疗、生物等。这类行业往往具有大量的专业名词,名词与名词之间相互之间存在着不同种类的关系。NER的研究从一开始的基于词典和规则的方法,基于统计机器学习的方法,到近年来基于深度学习的方法,NER研究的进展趋势如下图所示。 基于统计机器学习的方法主要包括:隐马尔可夫模型(HiddenMarkovModel HMM)、最大熵(MaxmiumEntropy,ME)、支持向量机(Support VectorMachine,SVM)、条件随机场( Conditional Random Fields,CRF)等。 隐马尔可夫模型(HMM)主要利用Viterbi算法求解命名实体类别序列,在训练和识别时的效率较高且速度较快。隐马尔可夫模型适用于一些对实时性有要求以及像信息检索这样需要处理大量文本的应用,如短文本命名实体识别。 最大熵模型(ME)结构紧凑,具有较好的通用性

NLP前沿研究成果大开源,百度PaddleNLP-研究版发布

北战南征 提交于 2019-12-26 16:05:57
为了更好服务 NLP 研究者,百度 PaddleNLP 于近日完成了针对其研究能力的升级,即 PaddleNLP-研究版。 PaddleNLP-研究版旨在基于飞桨(PaddlePaddle)深度学习平台和百度 NLP 深厚的技术积累,为广大研究者提供 NLP 领域前沿方向的研究成果、代码与数据,让广大研究者们可以快速复现已发表学术论文的实验效果,并据此开展新的研究。 目前,PaddleNLP 已经开放了包括 ACL2019、NAACL2019、IJCAI2019、MRQA2019 等顶级 NLP 会议 5 篇最新论文,支持了 3 个竞赛的代码复现,配合开放了 2 个相关论文的数据集,包括 DuConv、MMPMS、MPM、ARNOR 等模型和数据,覆盖信息抽取、智能对话、问答、阅读理解、评论建议挖掘等领域。 未来,PaddleNLP 还将持续升级,开源更多百度大脑在 NLP 领域的研究成果,例如发表于 ACL2019 的 KTNET、SEEDS、STACL 等模型与框架,覆盖了机器阅读理解、个性化对话、同声传译、机器翻译等场景。 PaddleNLP 作为一个同时覆盖工业应用和学术研究的全方位工具与数据集,将持续依托飞桨和百度 NLP 强大的技术保障,让开发者以越来越低的门槛获取更多前沿的 NLP 技术,欢迎持续关注。 百度 PaddleNLP-研究版开源与即将开源项目概览

How to pick out a the subject, predicate, and object and adjectives in a sentence

隐身守侯 提交于 2019-12-25 19:54:43
问题 I want to extract the subject, predicate, and object of a sentence and find out which adjectives go to the subject, predicate, or object with Stanford CoreNLP in java code. I have tried to use the dependency parser to solve this by finding the dependency index, checking the dependency tag if it equals amod, then adding it to an ArrayList, but with this method sometimes the adjective's dependency tag is not amod and is nmod, and other tags may come up. With determining the object and predicate

How to pick out a the subject, predicate, and object and adjectives in a sentence

蹲街弑〆低调 提交于 2019-12-25 19:54:29
问题 I want to extract the subject, predicate, and object of a sentence and find out which adjectives go to the subject, predicate, or object with Stanford CoreNLP in java code. I have tried to use the dependency parser to solve this by finding the dependency index, checking the dependency tag if it equals amod, then adding it to an ArrayList, but with this method sometimes the adjective's dependency tag is not amod and is nmod, and other tags may come up. With determining the object and predicate

xml format in stanford pos tagger

﹥>﹥吖頭↗ 提交于 2019-12-25 17:00:30
问题 i have tagged 20 sentences and this is my code: public class myTag { public static void main(String[] args) { Properties props = new Properties(); try { props.load(new FileReader("D:/tagger/english-bidirectional-distsim.tagger.props")); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim

NLP Sentiments: Giving wrong result when using negative word in positive way

给你一囗甜甜゛ 提交于 2019-12-25 08:33:59
问题 I am using NodeJs to create my application with the help of sentiment lib The problem is that it is giving wrong results when a negative word is used in a positive manner. var sentiment = require('sentiment'); var result = sentiment('I am dying to eat a kitkat!'); console.dir(result); { score: -3, comparative: -0.42857142857142855, tokens: [ 'i', 'am', 'dying', 'to', 'eat', 'a', 'kitkat' ], words: [ 'dying' ], positive: [], negative: [ 'dying' ] } ///or result = sentiment('your internet is

How to set delimiters for PTB tokenizer?

落爺英雄遲暮 提交于 2019-12-25 07:39:41
问题 I'm using StanfordCore NLP Library for my project.It uses PTB Tokenizer for tokenization.For a statement that goes like this- go to room no. #2145 or go to room no. *2145 tokenizer is splitting #2145 into two tokens: #,2145. Is there any way possible to set tokenizer so that it does't identify #,* like a delimiter? 回答1: A quick solution is to use this option: (command-line) -tokenize.whitespace (in Java code) props.setProperty("tokenize.whitespace", "true"); This will cause the tokenizer to

CoreNLP on Apache Spark

谁都会走 提交于 2019-12-25 06:14:45
问题 I'm not sure if this is related to Spark or NLP. Please help.I'm currently trying to run Stanford CoreNLP Library on Apache Spark and when I try to run it on multiple cores, I get the following exception. I'm using the latest NLP Library which is thread safe. This is happening during the map phase on line. pipeline.annotate(document); java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java