lucene | 易学教程

Exception while integrating openNLP with Solr

阅读更多关于 Exception while integrating openNLP with Solr

问题 I am trying to integrate openNLP with Solr 6.1.0.I configured the schema and solrconfig files as detailed in the wiki link: https://wiki.apache.org/solr/OpenNLP . changes made in solrconfig.xml file : <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lucene-libs" regex=".*\.jar" /> <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lib" regex="opennlp-.*\.jar" /> changes made in schema file : <fieldType name="text_opennlp_nvf" class="solr.TextField"

Lucene web paging

阅读更多关于 Lucene web paging

问题 I am creating a web app with Lucene that I need to implement paging. I have seen here the different examples about using an offset on the collector. However, those seem to be old. Lucene 3.5 (or 3.6 can't remember which) added this I believe. I have seen the IndexSearcher method searchAfter . However, it requires you pass it the last ScoreDoc . Because this is a web app, I have no way to pass the last result (as a ScoreDoc object) to the next request. So, my question is how is this typically

Is the order of multi-valued fields in Lucene stable?

阅读更多关于 Is the order of multi-valued fields in Lucene stable?

问题 Suppose I add several values to a Document under the same field name: doc.Add( new Field( "tag", "one" ) ); doc.Add( new Field( "tag", "two" ) ); doc.Add( new Field( "tag", "three" ) ); doc.Add( new Field( "tag", "four" ) ); If I then later retrieve these fields from a new instance of Document (from a search result), am I guaranteed that the order of the Field s in the array will remain the same? Field[] fields = doc.GetFields( "tag" ); Debug.Assert( fields[0].StringValue() == "one" ); Debug

Has anyone used lucene.net with Linq-to-Entities?

阅读更多关于 Has anyone used lucene.net with Linq-to-Entities?

问题 If anyone has done this, please let me know. I don't know anything about lucene.net. I have never used it, but I heard about it. I was wondering how something like that would integrate with the Linq entity framework? 回答1: Check out Linq to Lucene project. 回答2: Check this article in linq to lucene discussion Linq to Lucene for Entity Framework working with entity framework only one class add 来源： https://stackoverflow.com/questions/153290/has-anyone-used-lucene-net-with-linq-to-entities

Lucene Indexing

阅读更多关于 Lucene Indexing

问题 I would like to use Lucene for indexing a table in an existing database. I have been thinking the process is like: Create a 'Field' for every column in the table Store all the Fields 'ANALYZE' all the Fields except for the Field with the primary key Store each row in the table as a Lucene Document. While most of the columns in this table are small in size, one is huge. This column is also the one containing the bulk of the data on which searches will be performed. I know Lucene provides an

浅谈Solr和ElasticSearch建索引性能优化策略

阅读更多关于浅谈Solr和ElasticSearch建索引性能优化策略

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 由于Solr和ElasticSearch都是基于Lucene构建的，所以他们之间有很大程度的相似性，故而他们的一些优化策略基本也是通用的，面对越来越多的海量数据，如何优化全量索引的写入性能呢？散仙简单总结了下面几个方向的优化策略，如有疑问，欢迎拍砖。（一）硬件优化：（1）CPU加大，有利于并发写入（2）内存提升，加大写入缓冲（3）磁盘IO，使用SSD或者IO读写更快的磁盘（4）网络IO，保证客户端与服务端的通信带宽充足（二）服务端框架优化：（1）加大shard的数目，理论上shard越多，写入速度越快（2）设置较大的索引flush触发条件，ramBufferSizeMB 或者 maxBufferedDocs （3）写索引时，关闭副本，因为同步索引会大大降低写入速度（4）监控GC，调整JVM参数如果Full GC频繁，加大JVM堆内存，如果Yong GC频繁，加大新生代的比例，如果使用的是CMS垃圾收集器，必要时，可以关闭survive区，避免survive区和Eden区来回拷贝（5）尽量使用稳定的新版本如JDK和框架本身（6）内存大的，可以尝试G1垃圾收集器（三）客户端优化（1）如果公司有大数据部门，可以使用Hadoop或者Spark分布式集群构建索引（2

ElasticSearch实战基础知识点

阅读更多关于 ElasticSearch实战基础知识点

1、什么是全文检索和Lucene？（1）全文检索，倒排索引（2）lucene，就是一个jar包，里面包含了封装好的各种建立倒排索引，以及进行搜索的代码，包括各种算法。我们就用java开发的时候，引入lucene jar，然后基于lucene的api进行去进行开发就可以了。用lucene，我们就可以去将已有的数据建立索引，lucene会在本地磁盘上面，给我们组织索引的数据结构。另外的话，我们也可以用lucene提供的一些功能和api来针对磁盘上额 2. 什么是Elasticsearch？ 1)自动维护数据的分布到多个节点的索引的建立,还有搜索请求分布到多个节点的执行; 2)自动维护数据的冗余副本,保证说,一些机器当即了,不会丢失任何数据 3)封装了更多高级功能,以给我们提供更多高级智齿,让我们快速开发应用,开发更加复杂的应用:复杂的搜索功能,聚合分析功能,基于地理位置的搜索 3. Elasticsearch的功能，干什么的 1)分布式搜索引擎和数据分析引擎 2)全文检索,结构化检索,数据分析 3)对海量数据进行近实时的处理 4. Elasticsearch的特点（1）可以作为一个大型分布式集群（数百台服务器）技术，处理PB级数据，服务大公司；也可以运行在单机上，服务小公司（2）Elasticsearch不是什么新技术，主要是将全文检索、数据分析以及分布式技术，合并在了一起

Lucene.NET - do an AND search multiple words on multiple fields

阅读更多关于 Lucene.NET - do an AND search multiple words on multiple fields

问题 I define a Document object for my product entity which has several fields: Title, Brand, Category, Size, Color, Material. Now I want to support user to do an AND search on multiple fields. Any document that have one, two or more fields contain all the search words will be responded. For example, when user enter "gucci shirt red" I want to return all documents that have fields matched with all 3 tokens "gucci", "shirt" AND "red". So all documents below will be responded: 1.Documents with title

Hibernate HQL query does not update the Lucene Index

阅读更多关于 Hibernate HQL query does not update the Lucene Index

问题 I am using Hibernate 3.6.3 Final and Hibernate Search 3.4.1. I wrote an HQL delete query. The objects are deleted from the database but they are not removed from the Lucene Index after the transaction completes. Here is the query: Session session = factory.getCurrentSession(); Query q = session.createQuery("delete from Charges cg where cg.id in (:charges)"); q.setParameterList("charges", chargeIds); int result = q.executeUpdate();` What am I missing? What do I need to do to solve issue? I

Lucene: Completely disable weighting, scoring, ranking,

阅读更多关于 Lucene: Completely disable weighting, scoring, ranking,

问题 I'm using Lucene to build a big index of token co-occurences (e.g. [elephant,animal] , [melon,fruit] , [bmw,car] , ...). I query the index for those co-occurences using a BooleanQuery to get an absolute count, how often those two tokens co-occured in my index like so: // search for documents which contain word+category BooleanQuery query = new BooleanQuery(); query.add(new TermQuery(new Term("word", word)), Occur.MUST); query.add(new TermQuery(new Term("category", category)), Occur.MUST); //