nlp

Do I need to rewrite my entire java project if I want to use a single UIMA-dependent library?

核能气质少年 提交于 2020-01-11 11:42:46
问题 I want to use https://code.google.com/p/heideltime/ in a java project. That code "fits into the UIMA pipeline", which is something I don't understand at all. UIMA looks like it's designed to solve a ton of problems that I don't have, so I'd just like to get the minimal amount of UIMA needed to run that code. Is there a simple example out there of how I can run a simple UIMA program? I've added <dependency> <groupId>org.uimafit</groupId> <artifactId>uimafit</artifactId> <version>1.4.0</version

Coreference resolution using Stanford CoreNLP

筅森魡賤 提交于 2020-01-11 11:26:07
问题 I am new to the Stanford CoreNLP toolkit and trying to use it for a project to resolve coreferences in news texts. In order to use the Stanford CoreNLP coreference system, we would usually create a pipeline, which requires tokenization, sentence splitting, part-of-speech tagging, lemmarization, named entity recoginition and parsing. For example: Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline =

Date Extraction from Text

Deadly 提交于 2020-01-11 09:21:09
问题 I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text. Here's a link! for the demo of this tool Can u help me in how to train the classifier to identify date ( 8/11/2012 ). I tried using training data as Woodhouse PERS 8/18/2012 Date , O handsome O but does not work for same test data . 回答1: Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular

Entities on my gazette are not recognized

大兔子大兔子 提交于 2020-01-11 09:05:10
问题 I would like to create a custom NER model. That's what i did: TRAINING DATA (stanford-ner.tsv): Hello O ! O My O name O is O Damiano PERSON . O PROPERTIES (stanford-ner.prop): trainFile = stanford-ner.tsv serializeTo = ner-model.ser.gz map = word=0,answer=1 maxLeft=1 useClassFeature=true useWord=true useNGrams=true noMidNGrams=true maxNGramLeng=6 usePrev=true useNext=true useDisjunctive=true useSequences=true usePrevSequences=true useTypeSeqs=true useTypeSeqs2=true useTypeySequences=true

How to find dates in the sentence using NLP, RegEx in Python

时光怂恿深爱的人放手 提交于 2020-01-11 06:17:11
问题 Can anyone suggest me some way of finding and parsing dates (in any format, "Aug06", "Aug2006", "August 2 2008", "19th August 2006", "08-06", "01-08-06") in the python. I came across this question, but it is in perl... Extract inconsistently formatted date from string (date parsing, NLP) Any suggestion would be helpful. 回答1: This finds all the dates in your example sentence: for match in re.finditer( r"""(?ix) # case-insensitive, verbose regex \b # match a word boundary (?: # match the

Stanford typed dependencies using coreNLP in python

[亡魂溺海] 提交于 2020-01-11 03:25:27
问题 In Stanford Dependency Manual they mention "Stanford typed dependencies" and particularly the type "neg" - negation modifier. It is also available when using Stanford enhanced++ parser using the website. for example, the sentence: "Barack Obama was not born in Hawaii" The parser indeed find neg(born,not) but when I'm using the stanfordnlp python library, the only dependency parser I can get will parse the sentence as follow: ('Barack', '5', 'nsubj:pass') ('Obama', '1', 'flat') ('was', '5',

NSLinguisticTagger enumerateTagsInRange doesn't work on device with NSLinguisticTagSchemeNameTypeOrLexicalClass

纵饮孤独 提交于 2020-01-11 03:15:27
问题 Here's the code I'm using, it prints nothing no matter what sentence I use on the device. On simulator it works fine! - (NSMutableArray *)getTagEntries:(NSString *)sentence { NSArray<NSLinguisticTagScheme> *tagSchemes = [NSLinguisticTagger availableTagSchemesForLanguage:@"en"]; NSLinguisticTaggerOptions options = NSLinguisticTaggerJoinNames | NSLinguisticTaggerOmitWhitespace; NSLinguisticTagger *linguisticTagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagSchemes options:options];

What do the abbreviations in POS tagging etc mean?

时光毁灭记忆、已成空白 提交于 2020-01-11 02:15:13
问题 Say I have the following Penn Tree: (S (NP-SBJ the steel strike) (VP lasted (ADVP-TMP (ADVP much longer) (SBAR than (S (NP-SBJ he) (VP anticipated (SBAR *?*)))))) .) What do abbrevations like VP and SBAR etc mean? Where can I find these definitions? What are these abbreviations called? 回答1: Those are the Penn Treebank tags, for example, VP means "Verb Phrase". The full list can be found here 回答2: The full list of Penn Treebank POS tags (so-called tagset) including examples can be found on

自然语言处理之:语义分析-1

我的梦境 提交于 2020-01-11 01:25:10
语义分析(或者叫意义生成)是 NLP 中的任务之一。它被定义为确定字符或单词序列 意义的过程,其可用于执行语义消歧任务。 本章将包含以下主题: • NER。 • 使用 HMM 的 NER 系统。 • 使用机器学习工具包训练 NER。 • 使用词性标注执行 NER。 • 使用 Wordnet 生成同义词集 id。 • 使用 Wordnet 进行词义消歧。 NLP 指的是在自然语言上执行计算。语义分析是处理自然语言时需要执行的步骤之一。 在分析一个给定的句子时,如果已经构建了句子的句法结构,那么这个句子的语义分析就算完成了。语义解释指的是将意义分配给句子,上下文解释指的是将逻辑形式分配给知识 表示。语义分析的原语或基本单位被称为意义或语义(meaning 或 sense)。ELIZA 是处理语义的工具之一,是由 Joseph Weizenbaum 在六十年代开发出来的,它使用替换和模式匹配技术来分析句子并且为给定的输入提供输出。MARGIE 是由 Robert Schank 在七十年代开 发出来的,它可以使用 11 种原语来表示所有的英语动词。MARGIE 可以解释一个句子的语义并借助原语来表示其语义。MARGIE 之后进一步让位于脚本的概念,脚本应用机制(Script Applier Mechanism,SAM)就是基于 MARGIE 开发出来的,它可以翻译来自不同语言的句子