lucene

20191026 elasticSearch 搜索服务

跟風遠走 提交于 2019-12-23 09:51:52
文章目录 Lucene 数据分类 Lucene 简介 倒排索引* Lucene 架构 ElasticSearch 分布式安装 基础环境 安装工作 注意 中文分词器 REST REST 简介 REST 操作 ES 内置的 REST 接口 curl 命令 curl 使用 ES 操作 ES API操作 Lucene 数据分类 结构化(关系型数据库),全文检索 表:字段数量,字段类型 非结构化: 文本文档,图片,视频,音乐… 半结构化: json,html,xml Lucene 简介 Lucene 是 Apache 软件基金会的一个项目,是一个开发源码的全文检索引擎工具包,是一个全文检索引擎的一个架构。提供了完成的查询引擎和检索引擎,部分文本分析引擎。 官方解释: Lucene is a Java full-text search engine. Lucene is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications. 倒排索引* 通俗解释,我们通常都是通过查找文件位置及文件名,再查找文件的内容。倒排索引可以理解为通过文件内容来查找文件位置及文件名的。 倒排索引是一种索引方法

Random Sorting Results in Lucene.Net 2.4

心不动则不痛 提交于 2019-12-23 06:08:32
问题 How do I sort my results in a random order. my code looks something like this at the moment: Dim searcher As IndexSearcher = New IndexSearcher(dir, True) Dim collector As TopScoreDocCollector = TopScoreDocCollector.create(100, True) searcher.Search(query, collector) Dim hits() As ScoreDoc = collector.TopDocs.scoreDocs For Each sDoc As ScoreDoc In hits 'get doc and return Next 回答1: Since this is an IEnumerable, you can use standard linq to randomize it. You can find an example here: public

SOLR query comma separated fields without order

ⅰ亾dé卋堺 提交于 2019-12-23 05:24:05
问题 I have a field which has comma separated values for e.g JSON,AngularJS and another as AngularJS,JSON and other having JSON,HTML only. Now i have been trying to query SOLR using fq=field:( JSON AngularJS*), but it returns only the record with JSON before AngularJS. How can i query SOLR so that it returns both the records having JSON and AngularJS but not considering the order. Attaching SOLR Analysis for the field: Query formed: http://localhost:8983/solr/my_core/select?fq=field:(JSON%20AND

Searching books in Apache Solr

大兔子大兔子 提交于 2019-12-23 05:19:50
问题 I'm very new to Solr and I'm evaluating it. My task is to look for words within a corpus of books and return them within a small context . So far, I'm storing the books in a database split by paragraphs (slicing the books by line breaks), I do a fulltext search and return the row. In Solr, would I have to do the same, or can I add the whole book (in .txt format) and, whenever a match is found, return something like the match plus 100 words before and 100 words after or something like that?

Apache Lucene - Optimizing Searching

我是研究僧i 提交于 2019-12-23 05:13:29
问题 I am developing a web application in Java (using Spring) that uses a SQL Server database. I use Apache Lucene to implement a search feature for my web application. With Apache Lucene, before I perform a search I create an index of titles. I do this by first obtaining a list of all titles from the database. Then I loop through the list of titles and add each one of them to the index. This happens every time a user searches for something. I would like to know if there is a better, more

Lucene in java method not found [duplicate]

*爱你&永不变心* 提交于 2019-12-23 04:39:30
问题 This question already has answers here : Lucene using FSDirectory (2 answers) Closed 3 years ago . I am using those code below for searching but some methods shows error; FSDirectory.open(new File(indexDirectoryPath)); writer = new IndexWriter(indexDirectory, new StandardAnalyzer(),true, IndexWriter.MaxFieldLength.UNLIMITED); In this code open and MaxFieldLength shows error. I am using lucene 6.0.0. The open() method shows the error The method open(Path) in the type FSDirectory is not

vm.max_map_count and mmapfs

﹥>﹥吖頭↗ 提交于 2019-12-23 03:26:19
问题 What are the pros and cons of increasing vm.max_map_count from 64k to 256k? Does vm.max_map_count = 65530 imply --> 64k addresses * 64kb page size = upto 4GB of data can be referenced by the process? And if i exceed 4GB - the addressable space due to the vm.max_map_count limit, will OS need to page out some of the older accessed index data? Maybe my above understanding is not correct as FS cache can be pretty huge How does this limit result in OOM? I posted a similar question on elasticsearch

SOLR/LUCENE Experts, please help me design a simple keyword search from PDF index?

杀马特。学长 韩版系。学妹 提交于 2019-12-23 02:51:07
问题 I dabbled with solr but couldn't figure out a way to tailor it to my reuqirement. What I have : A bunch of PDF files. A set of keywords. What I am trying to achieve : Index the PDF files (solrcell - done) Search for a keyword (works ok) Tailor the output to spit out the names of the PDF files, an excerpt where the keyword occurred (No clue/idea how to) Tried manipulating ResponseHandler/Schema.xml/Solrconfig.xml to no avail. Lucene/solr experts, do you think what I am trying to achieve is

Using MultiFieldQueryParser

自古美人都是妖i 提交于 2019-12-23 02:40:38
问题 Am using MultiFieldQueryParser for parsing strings like a.a., b.b., etc But after parsing, its removing the dots in the string. What am i missing here? Thanks. 回答1: I'm not sure the MultiFieldQueryParser does what you think it does. Also...I'm not sure I know what you're trying to do. I do know that with any query parser, strings like 'a.a.' and 'b.b.' will have the periods stripped out because, at least with the default Analyzer, all punctuation is treated as white space. As far as the

Elasticsearch get a selection of predefined types as result in one query

蓝咒 提交于 2019-12-23 02:38:48
问题 I've got an ElasticSearch index with a large set of product properties. They are all looking like that: {'_id':1,'type':'manufacturer','name':'Toyota'}, {'_id':2,'type':'color','name':'Green'}, {'_id':3,'type':'category','name':'SUV Cars'}, {'_id':4,'type':'material','name':'Leather'}, {'_id':5,'type':'manufacturer','name':'BMW'}, {'_id':6,'type':'color','name':'Red'}, {'_id':7,'type':'category','name':'Cabrios'}, {'_id':8,'type':'material','name':'Steel'}, {'_id':9,'type':'category','name':