lucene

ElasticSearch

走远了吗. 提交于 2019-12-21 05:45:15
ElasticSearch 一、概念 二、特点 三、es安装使用 3.1 安装es 3.2 安装kibana客户端 3.3 安装head工具 一、概念 ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。ElasticSearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。官方客户端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示,Elasticsearch是最受欢迎的企业搜索引擎,其次是Apache Solr,也是基于Lucene。 二、特点 原生Lucene使用的不足,优化Lucene的调用方式 高可用的分布式集群 ,处理PB级别数据 它的目的是通过简单的 RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。达到开箱即用的效果 Solr和ES区别: ​ (1)Solr重量级, 支持很多种类型操作,支持分布式式,它里面有很多功能,但是在实时领域上面,没有es好 ​ (2)Es 轻量级, 支持json的操作格式,

Building Pylucene on ubuntu 14.04(trusty tahr)

青春壹個敷衍的年華 提交于 2019-12-21 05:36:28
问题 As per the installation instructions, JCC is successfully built. Dependencies Installed were: ant, openjdk-7-jdk, python-setuptools, python-dev. Then procedding to make pylucene, in "Makefile" i choose specs corresponding to Ubuntu 11. # Linux (Ubuntu 11.10 64-bit, Python 2.7.2, OpenJDK 1.7, setuptools 0.6.16) # Be sure to also set JDK['linux2'] in jcc's setup.py to the JAVA_HOME value # used below for ANT (and rebuild jcc after changing it). PREFIX_PYTHON=/usr ANT=JAVA_HOME=/usr/lib/jvm/java

Solr proximity ordered vs unordered

不羁岁月 提交于 2019-12-21 05:27:04
问题 In Solr you can perform an ordered proximity search using syntax "word1 word2"~10 By ordered, I mean word1 will always come before word2 in the document. I would like to know if there is an easy way to perform an unordered proximity search, ie. word1 and word2 occur within 10 words of each other and it doesn't matter which comes first. One way to do this would be: "word1 word2"~10 OR "word2 word1"~10 The above will work but I'm looking for something simpler, if possible. 回答1: Slop means how

Term frequency in Lucene 4.0

感情迁移 提交于 2019-12-21 05:21:49
问题 Trying to calculate term frequency using Lucene 4.0. I got document frequency working just fine, but can't figure out how to do term frequency using the API. Here's the code I have: private static void addDoc(IndexWriter writer, String content) throws IOException { FieldType fieldType = new FieldType(); fieldType.setStoreTermVectors(true); fieldType.setStoreTermVectorPositions(true); fieldType.setIndexed(true); fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS); fieldType.setStored(true);

Lucene - retrieve all values for a multi-valued field in a document

丶灬走出姿态 提交于 2019-12-21 05:04:22
问题 I added a field in Lucene which is multi-valued as such: String categoriesForItem = getCategories(); // returns "category1, category2, cat3" from a DB call String [] categoriesForItems = categoriesForItem.split(","; for(String cat : categoriesForItems) { doc.add(new StringField("categories", cat , Field.Store.YES)); // doc is a Document } later when I am searching for items in a category everything works as expected, but when I get a Document and do: String categories= doc.getField(

Solr Filter Cache (FastLRUCache) takes too much memory and results in out of memory?

一曲冷凌霜 提交于 2019-12-21 04:55:23
问题 I have a Solr setup. One master and two slaves for replication. We have about 70 Millions documents in index. The slaves have 16 GBs of RAM. 10GBs for the OS and HD, 6GBs for Solr. But from time to time, the slaves are out of memory. When we downloaded the dump file just before it was out of memory, we could see that the class : org.apache.solr.util.ConcurrentLRUCache$Stats @ 0x6eac8fb88 is using up to 5Gb of memory. We are using filter caches extensively, it has a 93% hit ratio. And here's

How to speed up Elasticsearch recovery?

佐手、 提交于 2019-12-21 04:35:14
问题 I'm working on ES cluster of 6B of small documents, organized in 6.5K indexes, for a total of 6TB. The indexes are replicated and sharded among 7 servers. The indexes occupancy varies from few KB to hundreds of GB. Before using ES, I used Lucene with the same documents organization. The recovery of the Lucene based application was quite immediate . In fact, the indexes were lazy loaded when a query arrived and then the IndexReader were cached, to speed up future replies. Now, with

What would be the motivation to integrate mongodb with solr [closed]

∥☆過路亽.° 提交于 2019-12-21 04:34:11
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 3 years ago . Mongodb is a nosql db and any query can be run on it except full text search since it reduces the overall performance. Solr is a search engine to search. When we integrate these two together then dont we have the same data in both systems? So if we already going to store the

Lucene and SQL Server - best practice

ぃ、小莉子 提交于 2019-12-21 03:53:05
问题 I am pretty new to Lucene, so would like to get some help from you guys :) BACKGROUND: Currently I have documents stored in SQL Server and want to use Lucene for full-text/tag searches on those documents in SQL Server. Q1) In this case, in order to do the keyword search on the documents, should I insert all of those documents to the Lucene index? Does this mean there will be data duplication (one in SQL Server and the other one in the Lucene index?) It could be a matter since we have a

Lucene not null query?

天涯浪子 提交于 2019-12-21 03:39:32
问题 How can we construct a query to search for particular field to be not null? field_name:* is not working. I tried field_name:[a* to z*] this works fine for English, but does not cover all languages. Any alternative suggestions? 回答1: This is currently not supported by Lucene. See this for a discussion. An alternative option may be to store some pre-defined string (like nullnullnullnull ) as the field value if it is null. Then you can use a negative filter to remove these records. (I don't like