lucene

Order by field with SQLite

只谈情不闲聊 提交于 2019-12-19 07:41:22
问题 I'm actually working on a Symfony project at work and we are using Lucene for our search engine. I was trying to use SQLite in-memory database for unit tests (we are using MySQL) but I stumbled upon something. The search engine part of the project use Lucene indexing. Basically, you query it and you get an ordered list of ids, which you can use to query your database with a Where In() clause. The problem is that there is an ORDER BY Field(id, ...) clause in the query, which order the result

Solr Function Query : How to use “score” field for creating custom scoring

℡╲_俬逩灬. 提交于 2019-12-19 05:45:15
问题 After searching extensively and coming across answers such as these - Solr: Sort by score & an int field value Use function query for boosting score in Solr I am still unable to solve the following problem : How do I use the "score" field of a document to create a new scoring function and rank the results accordingly . Something like this - new_score = score * my_other_field Current Query - http://localhost:8984/solr/suggest_new/select?q=tom&wt=json&indent=true&bq=_val_:"product(score,count

how can i escape a group of special characters in java in one method?

梦想的初衷 提交于 2019-12-19 05:35:07
问题 i use lucene search but lucene has a bunch of special characters to escape like: - && || ! ( ) { } [ ] ^ " ~ * ? : \ i am having problem escaping these characters because they are too many and if i use the String.replaceAll() method, i'll just end up having a really long line of code just for escaping the characters. what is the best way to do? thanks! 回答1: There is also a method called QueryParser#escape, which may be useful: Returns a String where those characters that QueryParser expects

Apache Lucene indexing of large XML file

旧城冷巷雨未停 提交于 2019-12-19 04:36:19
问题 I am new in lucene I want to indexing with lucene of large xml files(15GB) that contain plain text as well as attribute and so many xml tags. how to parse and indexing this xml file using lucene with any sample and if we use lucene we need any database How to parse and index huge xml file using lucene ? Any sample or links would be helpful to me to understand the process. Another one, if I use lucene, will I need any database, as I have seen and done indexing with Databases.. 回答1: Your

what is the difference between TermQuery and QueryParser in Lucene 6.0?

ぐ巨炮叔叔 提交于 2019-12-19 04:23:35
问题 There are two queries,one is created by QueryParser: QueryParser parser = new QueryParser(field, analyzer); Query query1 = parser.parse("Lucene"); the other is term query: Query query2=new TermQuery(new Term("title", "Lucene")); what is the difference between query1 and query2? 回答1: This is the definition of Term from lucene docs. A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that

how to integrate RAMDirectory into FSDirectory in lucene

送分小仙女□ 提交于 2019-12-19 03:59:17
问题 I had a question now, this one regarding lucene. I was trying to make a lucene source code that can do indexing and store them first in a memory using RAMDirectory and then flush this index in a memory into a disk using FSDirectory. I had done some modifications of this code but to no avail. maybe some of you can help me out a bit. so what's the best way for me to integrate RAMDirectory in this source code before putting them in FSDirectory. any help would be appreciated though here is the

Lucene搜索核心代码TermInfosReader

独自空忆成欢 提交于 2019-12-19 03:27:16
TermInfosReader类是Lucene搜索的核心代码,所有的搜索最终都是落到通过term查询,TermInfosReader里定义了支持的基础的term查询功能。 前置知识: 词元字典文件(tis): 文件描述: 文件中的term按照从小到大排序,排序 规则:先 按照属性名排,如果属性名相同,再按照词元内容排,简单的字符比较。tis文件中存储的词元列表按照IndexInterval分成多个块,后面在查询逻辑里叙述通过块如何优化搜索。 文件结构: TermInfos --> <TermInfo>TermCount TermInfo --> <Term, DocFreq, FreqDelta, ProxDelta, SkipDelta> 词元索引文件(tii): 文件描述: tii文件是tis文件的索引文件,按照tis文件中存储的IndexInterval间隔存储tii文件,tii文件中词元内容和tis一样,除了词元外,tii文件中每个词元附加一个IndexDelta数据,存储了该词元在tis文件中的位置, 文件结构: TermIndices --> <TermInfo, IndexDelta>IndexTermCount IndexDelta --> VInt   // IndexDelta表示这个索引词元在tis文件中的具体位置,类似指针 核心方法一: TermInfo

Lucene performance

主宰稳场 提交于 2019-12-19 03:14:26
问题 could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed) 回答1: Read Scaling Lucene and Solr. Define your needs from Lucene (for example: you are indexing PDFs - do you need to store the full text, just to make it searchable, or not at all?) Make a small-scale experiment - index a few documents, see whether retrieval is good enough. Try to index the whole thing (considering the paper's tips for quick indexing

介绍几本搜索引擎的基础书

岁酱吖の 提交于 2019-12-19 02:20:04
介绍几本搜索引擎的书给大家 我觉得要想研究搜索引擎,以下三本是目前为止最好的书,我们期待有更好的书以飨读者,我也将为你做些引介。我后面还会为大家介绍些 关于无线搜索方面的书,请大家多多关注。 1、书名:开发自己的搜索引擎 Lucene 2.0+Heritrix-(附光盘) 作 者:邱哲 【内容简介】 本书详细介绍了如何应用Lucene进行搜索引擎开发,通过学习本书,读者可以完成构建一个企业级的搜索引擎网站。. 全书共分为14章,内容包括搜索引擎与信息检索基础,Lucene入门实例,Lucene索引的建立,使用Lucene构建搜索,Lucene的排序,Lucene的分析器,对Word、Excel和PDF格式文档的解析,Compass搜索引擎框架,Lucene分布式和Google Search API,爬虫Heritrix,综合实例之准备篇,综合实例之HTMLParser篇,综合实例之DWR篇,综合实例之Web编。.. 本书是国内第一本使用Lucene和Heritrix来讲解搜索引擎构建的书,通过详细的对API和源代码的分析,力求使读者在应用的基础上,能够深入其核心,自行扩展和开发相应组件,发挥想象力,开发出更具有创意的搜索引擎产品。本书适合Java程序员和从事计算机软件开发的其他编程人员阅读,同时也可以作为搜索引擎爱好者的入门书籍。 由于目前市面上从技术层面介绍搜索引擎的书并不多

How to optimize solr index

强颜欢笑 提交于 2019-12-18 19:07:04
问题 How to optimize solr index. I want to optimize my solr indexing for i try to change in solrconfig.xml it getting indexed but i want to how to verify that they are optimized and with which thing are involve in index optimization. 回答1: I find this to be the easiest way to optimize a Solr index. In my context "optimize" means to merge all index segments. curl http://localhost:8983/solr/<core_name>/update -F stream.body=' <optimize />' 回答2: Check the size of respective core before you start. Open