lucene

Using Solr and Zends Lucene port together

﹥>﹥吖頭↗ 提交于 2019-12-19 21:18:40
问题 Afternoon chaps, After my adventures with Zend-Lucene-Search, and discovering it isn't all its cracked up to be when indexing large datasets, I've turned to Solr (thanks to Bill Karwin for that :) ) I've got Solr indexing the db far far quicker now, taking just over 8 minutes to index a table of just over 1.7million rows - which I'm very pleased with. However, when I come to try and search the index with the Zend port, I run into the following error; Fatal error: Uncaught exception 'Zend

Adding Boost to Score According to Payload of Multivalued Field at Solr

混江龙づ霸主 提交于 2019-12-19 19:53:17
问题 Here is my case; I have a field at my schema named elmo_field . I want that elmo_field should have payloaded values. i.e. dorothy|0.46 sesame|0.37 big bird|0.19 bird|0.22 When a user searches for a keyword i.e. dorothy I want to add 0.46 to usual score. If user searches for big bird , 0.19 should be added and if user searches for bird , 0.22 should be added (payloads are added - or payloads * normalize coefficient will be added). I mean I will make a search on my index at my other fields of

create new core directories in SOLR on the fly

╄→гoц情女王★ 提交于 2019-12-19 14:49:20
问题 i am using solr 1.4.1 for building a distributed search engine, but i dont want to use only one index file - i want to create new core "index"-directories on the fly in my java code. i found following rest api to create new cores using an EXISTING core directory (http://wiki.apache.org/solr/CoreAdmin). http://localhost:8983/solr/admin/cores?action=CREATE&name=coreX&instanceDir=path_to_instance_directory&config=config_file_name.xml&schema=schem_file_name.xml&dataDir=data is there a way to

Example using WikipediaTokenizer in Lucene

ε祈祈猫儿з 提交于 2019-12-19 11:42:31
问题 I want to use WikipediaTokenizer in lucene project - http://lucene.apache.org/java/3_0_2/api/contrib-wikipedia/org/apache/lucene/wikipedia/analysis/WikipediaTokenizer.html But I never used lucene. I just want to convert a wikipedia string into a list of tokens. But, I see that there are only four methods available in this class, end, incrementToken, reset, reset(reader). Can someone point me to an example to use it. Thank you. 回答1: In Lucene 3.0, next() method is removed. Now you should use

Solr suggester in SolrCloud mode

我的梦境 提交于 2019-12-19 11:42:19
问题 I am running the solr in CloudSolr mode with three shards. The data is already indexed into solr. Now I have configured the solr suggester in solrconfig.xml. This is the configuration from solrconfig file. I am using solr 4.10 version. <searchComponent name="suggest" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">mysuggest</str> <str name="lookupImpl">FuzzyLookupFactory</str> <str name="storeDir">suggester_fuzzy_dir</str> <str name="dictionaryImpl"

Solr suggester in SolrCloud mode

江枫思渺然 提交于 2019-12-19 11:42:05
问题 I am running the solr in CloudSolr mode with three shards. The data is already indexed into solr. Now I have configured the solr suggester in solrconfig.xml. This is the configuration from solrconfig file. I am using solr 4.10 version. <searchComponent name="suggest" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">mysuggest</str> <str name="lookupImpl">FuzzyLookupFactory</str> <str name="storeDir">suggester_fuzzy_dir</str> <str name="dictionaryImpl"

蚂蚁金服 ZSearch 在向量检索上的探索

不羁的心 提交于 2019-12-19 11:26:19
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 图为 ZSearch 基础架构负责人十倍 2019 Elastic Dev Day 现场分享 引言 ElasticSearch(简称 ES)是一个非常受欢迎的分布式全文检索系统,常用于数据分析,搜索,多维过滤等场景。蚂蚁金服从2017年开始向内部业务方提供 ElasticSearch 服务,我们在蚂蚁金服的金融级场景下,总结了不少经验,此次主要给大家分享我们在向量检索上的探索。 ElasticSearch 的痛点 ElasticSearch 广泛应用于蚂蚁金服内部的日志分析、多维分析、搜索等场景。当我们的 ElasticSearch 集群越来越多,用户场景越来越丰富,我们会面临越来越多的痛点: 如何管理集群; 如何方便用户接入和管理用户; 如何支持用户不同的个性化需求; ... 为了解决这些痛点,我们开发了 ZSearch 通用搜索平台: 基于 K8s 底座,快速创建 ZSearch 组件,快捷运维,故障机自动替换; 跨机房复制,重要业务方高保; 插件平台,用户自定义插件热加载; SmartSearch 简化用户搜索,开箱即用; Router 配合 ES 内部多租户插件,提高资源利用率; 向量检索需求 基于 ElasticSearch 的通用搜索平台 ZSearch 日趋完善,用户越来越多,场景更加丰富。

lucene phrase query not working

前提是你 提交于 2019-12-19 10:28:15
问题 I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits public class HelloLucene { public static void main(String[] args) throws IOException, ParseException{ // TODO Auto-generated method stub StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29); Directory index = new RAMDirectory(); IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED); addDoc(w, "Lucene in Action"); addDoc(w,

How to match exact text in Lucene search?

眉间皱痕 提交于 2019-12-19 10:22:18
问题 Im trying to match a text Config migration from ASA5505 8.2 to ASA5516 in column TITLE . My program looks like this. Directory directory = FSDirectory.open(indexDir); MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35)); IndexReader reader = IndexReader.open(directory); IndexSearcher searcher = new IndexSearcher(reader); queryParser.setPhraseSlop(0); queryParser.setLowercaseExpandedTerms(true); Query

Java threads slow down towards the end of processing

女生的网名这么多〃 提交于 2019-12-19 09:06:27
问题 I have a Java program that takes in a text file containing a list of text files and processes each line separately. To speed up the processing, I make use of threads using an ExecutorService with a FixedThreadPool with 24 threads. The machine has 24 cores and 48GB of RAM. The text file that I'm processing has 2.5 million lines. I find that for the first 2.3 million lines or so things run very well with high CPU utilization. However, beyond some point (at around the 2.3 lines), the performance