lucene

Exception : java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene410' does not exist

喜你入骨 提交于 2020-01-25 06:46:06
问题 I work with a multi-module gradle project (12 modules). I inherited the project and I need to update the versions of some libraries used in it. I can’t understand the cause of this error: ... 67 more Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene410' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene54] at org.apache

Java Lucene定时更新索引

坚强是说给别人听的谎言 提交于 2020-01-25 00:57:47
需求:每晚2点开始对所有数据建立索引,其它时间,每隔一定的时间更新索引。 经过测试,5000条数据建立索引只需600ms,20000条数据约1000ms...几十万的数据也只需要几秒。 若根据初步方案,白天更新数据索引只更新新添加或者改动的数据,需要将数据库查出的数据于IndexReader中的数据进行检索剔除,此操作耗时较多。初步测试结果:5000条数据需要50s;20000条数据需要220s... 若有20w条数据,则光剔除数据的时间就需要4h,明显行不通。 故还不如直接每次都重建所有索引。 不多说,贴初步方案的代码888: java package net.lucene.buildindex; import java.io.File; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document

Sitecore Lucene Exclude Item From Index

你说的曾经没有我的故事 提交于 2020-01-24 13:59:28
问题 I'm trying to allow a content editor to have the option to exclude items from a search page. There is a checkbox on the template being searched which indicates whether or not it should show up or not. I've seen a few answer that involve inheriting from Sitecore.Search.Crawlers.DatabaseCrawler and overriding the AddItem method (Excluding items selectively from Sitecore's Lucene search index - works when rebuilding with IndexViewer, but not when using Sitecore's built-in tools). This does not

Sitecore Lucene Exclude Item From Index

送分小仙女□ 提交于 2020-01-24 13:58:00
问题 I'm trying to allow a content editor to have the option to exclude items from a search page. There is a checkbox on the template being searched which indicates whether or not it should show up or not. I've seen a few answer that involve inheriting from Sitecore.Search.Crawlers.DatabaseCrawler and overriding the AddItem method (Excluding items selectively from Sitecore's Lucene search index - works when rebuilding with IndexViewer, but not when using Sitecore's built-in tools). This does not

splitting lucene index into two halves

我的未来我决定 提交于 2020-01-24 13:42:49
问题 what is the best way to split an existing Lucene index into two halves i.e. each split should contain half of the total number of documents in the original index 回答1: The easiest way to split an existing index (without reindexing all the documents) is to: Make another copy of the existing index (i.e. cp -r myindex mycopy) Open the first index, and delete half the documents (range 0 to maxDoc / 2) Open the second index, and delete the other half (range maxDoc / 2 to maxDoc) Optimize both

splitting lucene index into two halves

痞子三分冷 提交于 2020-01-24 13:42:48
问题 what is the best way to split an existing Lucene index into two halves i.e. each split should contain half of the total number of documents in the original index 回答1: The easiest way to split an existing index (without reindexing all the documents) is to: Make another copy of the existing index (i.e. cp -r myindex mycopy) Open the first index, and delete half the documents (range 0 to maxDoc / 2) Open the second index, and delete the other half (range maxDoc / 2 to maxDoc) Optimize both

splitting lucene index into two halves

ぐ巨炮叔叔 提交于 2020-01-24 13:42:26
问题 what is the best way to split an existing Lucene index into two halves i.e. each split should contain half of the total number of documents in the original index 回答1: The easiest way to split an existing index (without reindexing all the documents) is to: Make another copy of the existing index (i.e. cp -r myindex mycopy) Open the first index, and delete half the documents (range 0 to maxDoc / 2) Open the second index, and delete the other half (range maxDoc / 2 to maxDoc) Optimize both

Lucene 原理与代码分析完整版

孤街浪徒 提交于 2020-01-24 11:06:29
Lucene 原理与代码分析系列文章已经基本告一段落,可能问题篇还会有新的更新。 完整版pdf可由以下链接下载。 Lucene 原理与代码分析完整版 目录如下: 目录 目录 第一篇:原理篇 第一章:全文检索的基本原理 一、总论 二、索引里面究竟存些什么 三、如何创建索引 第一步:一些要索引的原文档 (Document) 。 第二步:将原文档传给分次组件 (Tokenizer) 。 第三步:将得到的词元 (Token) 传给语言处理组件 (Linguistic Processor) 。 第四步:将得到的词 (Term) 传给索引组件 (Indexer) 。 1. 利用得到的词 (Term) 创建一个字典。 2. 对字典按字母顺序进行排序。 3. 合并相同的词 (Term) 成为文档倒排 (Posting List) 链表。 四、如何对索 引进行搜索? 第一步:用户输入查询语句。 第二步:对查询语句进行词法分析,语法分析,及语言处理。 1. 词法分 析主要用来识别单词和关键字。 2. 语法分析主要是根据查询语句的语法规则来形成一棵语法树。 3. 语言处理同索引过程中的语言处理几乎相同。 第三步:搜索索引,得到符合语法树的文档。 第四步:根据得到的文档和查询语句的相关性,对结果进行排序。 1. 计算权重 (Term weight) 的过程。 2. 判断 Term

Lucene Query on a DateField indexed by Solr

北城余情 提交于 2020-01-24 11:01:49
问题 We are using a solr index for various search applications. In most cases we use it just as you would with the admin interface. for example: +text:Mr +text:burns +publish_date[2012-09-10T00:00:00Z TO 2012-10-10T00:00:00Z] This works fine. My problem is that in one app we use complex lucene Queries directly against the index (without using solr) and in these queries i cant find how to search on a date field. In schema.xml: <field name="publish_date" type="date" indexed="true" stored="true"/> It

Lucene Query on a DateField indexed by Solr

随声附和 提交于 2020-01-24 11:01:48
问题 We are using a solr index for various search applications. In most cases we use it just as you would with the admin interface. for example: +text:Mr +text:burns +publish_date[2012-09-10T00:00:00Z TO 2012-10-10T00:00:00Z] This works fine. My problem is that in one app we use complex lucene Queries directly against the index (without using solr) and in these queries i cant find how to search on a date field. In schema.xml: <field name="publish_date" type="date" indexed="true" stored="true"/> It