opennlp

Querying part-of-speech tags with Lucene 7 OpenNLP

故事扮演 提交于 2021-02-20 03:50:40
问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

Querying part-of-speech tags with Lucene 7 OpenNLP

点点圈 提交于 2021-02-20 03:49:49
问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

Querying part-of-speech tags with Lucene 7 OpenNLP

半腔热情 提交于 2021-02-20 03:49:00
问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

How to resolve IllegalStateException and ParserConfigurationException for finding location in android studio by using apache OpenNLP

£可爱£侵袭症+ 提交于 2021-02-08 05:39:15
问题 package com.example.dell.apacheopennlp; import android.os.StrictMode; import android.support.v7.app.AppCompatActivity; import android.os.Bundle; import android.widget.TextView; import java.io.IOException; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; public class apacheOpenNLP extends

How to split Japanese text?

限于喜欢 提交于 2021-01-27 17:16:25
问题 What is the best way of splitting Japanese text using Java? For Example, for the below text: こんにちは。私の名前はオバマです。私はアメリカに行く。 I need the following output: こんにちは 私の名前はオバマです 私はアメリカに行く Is it possible using Kuromoji? 回答1: You can use java.text.BreakIterator. String TEXT = "こんにちは。私の名前はオバマです。私はアメリカに行く。"; BreakIterator boundary = BreakIterator.getSentenceInstance(Locale.JAPAN); boundary.setText(TEXT); int start = boundary.first(); for (int end = boundary.next(); end != BreakIterator.DONE; start = end,

OpenNLP: Unable to locate the model file for Lemmatizer

杀马特。学长 韩版系。学妹 提交于 2020-12-12 18:18:46
问题 Summary : Unable to find the model file used for Lemmatizer (english-lemmatizer.bin) Details : OpenNLP Tools Models appears to be a comprehensive repository for the various models used by the different components of the Apache OpenNLP library. However, I am unable to find the model file en-lemmatizer.bin , which is used with the lemmatizer. The Apache OpenNLP Developer Manual provides the following code snippet for the Lemmatization step: InputStream dictLemmatizer = null; try (dictLemmatizer

NLPChina_ansj_seg JAVA 实现热词及分词统计

那年仲夏 提交于 2020-10-08 10:24:40
前言: 笔者最近遇到一个需求:将文章输入后输出文章中的高频词,这是个简短的需求,但细分下便会出现许多细节重点。笔者细化需求后确定了这几个步骤:1. 文章分词(包括中英文混词)——> 2. 分词统计——>3. 推荐热词。 根据上述的简单需求,我就想用原生JAVA通过某些数据结构实现,由于知识面有限且笔者目前是名在校的学生,实现了英文下的分词、中文下的分词。但是遇到中英文混排的怎么也合并不了。经过两天的各种思考各种分析结果以失败告终。在查阅资料的时候发现了阿帕奇的OpenNLP 工具,然后仔细的看了看源码。。看的也是云里雾里的,但基本思想也了解了。虽然阿帕奇的OpenNLP很牛逼,但是我还是选择了一个国人自产基于n-Gram+CRF+HMM的分词JAVA实现。具体开发文档和源码可以访问 GITHUB 。 废话不多说上源码。 工具类: package com.sim; import org.ansj.splitWord.analysis.ToAnalysis; import java.io.*; import java.util.*; public class NLPTools { public static Map<String,String> wordFrequency(String article) { Map<String, Integer> map = new HashMap

12种自然语言处理的开源工具

倾然丶 夕夏残阳落幕 提交于 2020-02-26 07:29:05
让我们看看可以用在你自己的 NLP 应用中的十几个工具吧。 在过去的几年里,自然语言处理(NLP)推动了聊天机器人、语音助手、文本预测等这些渗透到我们的日常生活中的语音或文本应用程技术的发展。目前有着各种各样开源的 NLP 工具,所以我决定调查一下当前开源的 NLP 工具来帮助你制定开发下一个基于语音或文本的应用程序的计划。 尽管我并不熟悉所有工具,但我将从我所熟悉的编程语言出发来介绍这些工具(对于我不熟悉的语言,我无法找到大量的工具)。也就是说,出于各种原因,我排除了三种我熟悉的语言之外的工具。 R 语言可能是没有被包含在内的最重要的语言,因为我发现的大多数库都有一年多没有更新了。这并不一定意味着它们没有得到很好的维护,但我认为它们应该得到更多的更新,以便和同一领域的其他工具竞争。我还选择了最有可能用在生产场景中的语言和工具(而不是在学术界和研究中使用),而我主要是使用 R 作为研究和发现工具。 我也惊讶地发现 Scala 的很多库都没有更新了。我上次使用 Scala 已经过去了两年了,当时它非常流行。但是大多数库从那个时候就再没有更新过,或者只有少数一些有更新。 最后,我排除了 C++。 这主要是因为我上次使用 C++ 编写程序已经有很多年了,而我所工作的组织还没有将 C++ 用于 NLP 或任何数据科学方面的工作。 Python 工具 自然语言工具包(NLTK) 毋庸置疑,

Counting sentence in text file using java

吃可爱长大的小学妹 提交于 2020-01-25 10:29:26
问题 The source code below is about to detect the sentences in the textfile using openNLP. However I don't know how to count and print the number of sentences in text file? package com.mycompany.app; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileReader; import java.io.IOException; import java.io.InputStream; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.util

OpenNLP: Training a custom NER Model for multiple entities

妖精的绣舞 提交于 2020-01-23 10:59:34
问题 I am trying training a custom NER model for multiple entities. Here is the sample training data: count all <START:item_type> operating tables <END> on the <START:location_id> third <END> <START:location_type> floor <END> count all <START:item_type> items <END> on the <START:location_id> third <END> <START:location_type> floor <END> how many <START:item_type> beds <END> are in <START:location_type> room <END> <START:location_id> 2 <END> The NameFinderME.train(.) method takes a string parameter