lucene | 易学教程

Finding total number of matches to a query with lucene

阅读更多关于 Finding total number of matches to a query with lucene

问题 I'm new to lucene so I don't know if it is possible, but I have an index and I would like to get the total amount of phrases in a subset of the index(the subset is defined by a filter). I can use FilteredQuery with my Filter and a PhraseQuery to search for the phrase and thus I can count the documents in which this phrase occurs, but I can't seem to find a way to count the number of matches per document as well. 回答1: You can do this, see LUCENE-2590 for details. For example code you can look

ToTitleCase in solr to stop SCREAMING CAPS in Solr

阅读更多关于 ToTitleCase in solr to stop SCREAMING CAPS in Solr

问题 I'm using solr's faceting and i've run into a problem that i was hoping i could get around using filters. Basically some times a town name will come through to SOLR as "CAMBRIDGE" and sometime's it will come through as "Cambridge" I wanted to use a filter in Solr to stop the SCREAMING CAPS version of the town name. It seems there is a fitler to make all the text lower case.  <fieldType name="text_facet" class="solr.TextField"

Why does 'delete document' in lucene 2.4 not work?

阅读更多关于 Why does 'delete document' in lucene 2.4 not work?

问题 I want to delete a document in lucene 2.4 with java. My code is Directory directory = FSDirectory.getDirectory("c:/index"); IndexReader indexReader = IndexReader.open(directory); System.out.println("num="+indexReader.maxDoc()); indexReader.deleteDocuments(new Term("name","1")); System.out.println("num="+indexReader.maxDoc()); output num=1 num=1 回答1: In my opinion it is best to use Indexwriter to delete the documents, since Indexreader buffers the deletions and does not write changes to the

Lucene Indexing/Query strategy for hyphenated words

阅读更多关于 Lucene Indexing/Query strategy for hyphenated words

问题 There are many words which are hyphenated or whitespace separated but often used as one word. Eg : Basket Ball or basket-ball can be written as basketball. Now when i index as sentence, say : "Hey dude, I played basket ball yesterday". Now i try to query "basketball" [without double quotes].. This case, or in the vice versa case, (index basketball and query basket ball ) I will not get any results. Is there any way to solve this problem directly or indirectly ? Edit: I gave the example to

Neo4j Cypher: Handling whitespace and wildcards in a Lucene fulltext search

阅读更多关于 Neo4j Cypher: Handling whitespace and wildcards in a Lucene fulltext search

问题 I created a FullText index named: myFullTextIndex . When I want to search for the pattern: Hello World , the query looks like: START w=node:myFullTextIndex('title:"Hello World"') That works pretty well. However, I don't manage to search for the same string surrounded by wildcards. I expect a search on this pattern to return a result: *Hello World* I tried: START w=node:myFullTextIndex('title:"*Hello World*"') and START w=node:myFullTextIndex('title:*"Hello World"*') but doesn't work (syntax

Lucene 4.0 IndexWriter updateDocument for Numeric Term

阅读更多关于 Lucene 4.0 IndexWriter updateDocument for Numeric Term

问题 I just wanted to know how it is possible to to update (delete/insert) a document based on a numeric field. So far I did this: LuceneManager.updateDocument(writer, new Term("id", NumericUtils.intToPrefixCoded(sentenceId)), newDoc); But now with Lucene 4.0 the NumericUtils class has changed to this which I don't really understand. Any help? 回答1: I would recommend, if possible, it would be better to store an ID as a keyword string, rather than a number. If it is simply a unique identifier,

How do I pass a list of 'allowed' IDs to filter a Lucene search?

阅读更多关于 How do I pass a list of 'allowed' IDs to filter a Lucene search?

问题 I need to return just the documents that a user has access to from a Lucene search. I can get a list of IDs from a database that make up the 'allowed' subset. How can I pass these to Lucene? The articles I've found on the web suggest I need to use a BitSet and FieldCache (am I right?), but I'm having trouble finding good examples. Does anyone have any? I'm using C#, but any language would be great. Thanks. 回答1: A simple way would be to build a MultiPhraseQuery with an array of all the

Faceted search with geo-index using CouchDB

阅读更多关于 Faceted search with geo-index using CouchDB

问题 CouchDB offers the ability to perform faceted search via Lucene. I would like to perform a faceted search where one of the facets is geospatial (e.g. within 30km of a lat/long). Is this possible, and if so how? 回答1: Check out GeoCouch it's a fork of CouchDB that supports geospatial queries/indexes. 来源： https://stackoverflow.com/questions/9101367/faceted-search-with-geo-index-using-couchdb

Solr- multivalued date field, range queries to match “any”/“count”?

阅读更多关于 Solr- multivalued date field, range queries to match “any”/“count”?

问题 I'm using Solr as part of a property booking engine- my entries have a multivalued date field which stores the dates that the property is already booked, and thus, not available. I want to be able to query against this, and return entries that have no dates within the window specified. I'm half way there- but right now Solr appears to be returning the entry if it has even one free date- I want it to only return entries that are totally empty within the range. Example of my entity: <doc> <arr

搜索引擎框架介绍

阅读更多关于搜索引擎框架介绍

原文: 搜索引擎框架介绍一、搜索引擎基础介绍二、常见搜索引擎框架介绍与比较三、参考文章一、搜索引擎基础介绍 1. 什么是搜索引擎搜索引擎，通常指的是收集了万维网上几千万到几十亿个网页并对网页中的每一个词（即关键词）进行索引，建立索引数据库的全文搜索引擎。当用户查找某个关键词的时候，所有在页面内容中包含了该关键词的网页都将作为搜索结果被搜出来。再经过复杂的算法进行排序(或者包含商业化的竞价排名、商业推广或者广告)后，这些结果将按照与搜索关键词的相关度高低（或与相关度毫无关系），依次排列。 2. 传统的搜索与搜索引擎对比 2.1 传统做法（1）文档中使用系统的Find查找（2）mysql中使用like模糊查询存在问题：（1）海量数据中不能及时响应,少量数据可以通过传统的MySql建立索引解决（2）一些无用词不能进行过滤，没法分词（3）数据量大的话难以拓展（4）相同的数据难以进行相似度最高的进行排序 2.2 搜索引擎做法（1）存储非结构化的数据（2）快速检索和响应我们需要的信息，快-准（3）进行相关性的排序，过滤等（4）可以去掉停用词(没有特殊含义的词，比如英文的a,is等，中文：这，的，是等)，框架一般支持可以自定义停用词二、常见搜索引擎框架介绍与比较 1. Java 全文搜索引擎框架 Lucene 1.1 简介