lucene

KeywordAnalyzer and LowerCaseFilter/LowerCaseTokenizer

谁都会走 提交于 2019-12-18 06:55:03
问题 I want to build my own analyzer that uses both filters/tokenizers. I mean, the same field is Keyword (entire stream as a single token) and lowercase If KeywordAnalyzer use only, the value of field keeps the case-insensitive. If I use LowerCaseTokenizer or LowerCaseFilter I have to combine them with other analyzers that do the same thing KeywordAnalyzer (separated by no letter, by spaces, remove stop-words, etc.) The question is : Is there any way to make that field as Keyword (entire stream

Lucene 4 Pagination

空扰寡人 提交于 2019-12-18 04:16:15
问题 I am using Lucene 4.2 and am implementing result pagination. IndexSearcher.searchAfter provides an efficient way of implementing "next page" functionality but what is the best way to go about implementing "previous page" or even "go to page" functionality? There is no IndexSearcher.searchBefore for example. I was considering determining the total number of pages given the page size and keeping a ScoreDoc[] array to track the "after" ScoreDoc for each page (the array would be populated as

Lucene: how to boost some specific field

你说的曾经没有我的故事 提交于 2019-12-18 04:08:32
问题 In my case, documents have two fields, for example, "title" and "views". "views" is represented the num of times that people have visited this document. like: "title":"iphone", "views":"10". I have to develop a strategy that will assign some weights to views, such as the relevance score is calculated by score(title)*0.8+score(views)*0.2. Does lucene can do this? And I want to know whether there are some algorithms related to this question. 回答1: Here is how you can do that: Query titleQuery,

Searching phrases in Lucene

纵然是瞬间 提交于 2019-12-18 03:45:16
问题 Could somebody point me to an example how to search for phrases with Lucene.net? Let's say I have in my index a document with field "name", value "Jon Skeet". Now I want to be able to find that document when searching for "jon skeet". 回答1: You can use a proximity search to find terms within a certain distance of each other. The Lucene query syntax looks like this "jon skeet"~3 , meaning find "jon" and "skeet" within three words of each other. With this syntax, relative order doesn't matter;

How do I make the QueryParser in Lucene handle numeric ranges?

吃可爱长大的小学妹 提交于 2019-12-18 03:42:19
问题 new QueryParser(.... ).parse (somequery); it works only for string indexed fields. Say i have a field called count where count is a integer field (while indexing the field I considered the data type) new QueryParser(....).parse("count:[1 TO 10]"); The above one is not works. Instead If i used "NumericRangeQuery.newIntRange" which is working. But, i need the above one only... 回答1: Had the same issue and solved it, so here I share my solution: To create a custom query parser that will parse the

lucene - give more weight the closer term is to beginning of title

丶灬走出姿态 提交于 2019-12-18 03:25:50
问题 I understand how to boost fields either at index time or query time. However, how could I increase the score of matching a term closer to the beginning of a title? Example: Query = "lucene" Doc1 title = "Lucene: Homepage" Doc2 title = "I have a question about lucene?" I would like the first document to score higher since "lucene" is closer to the beginning (ignoring term freq for now). I see how to use the SpanQuery for specifying the proximity between terms, but I'm not sure how to use the

How to search an int field in Lucene 4?

纵然是瞬间 提交于 2019-12-18 03:10:33
问题 I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like: Document doc = new Document(); doc.add(new StringField("ticket_number", rs.getString("ticket_number"), Field.Store.YES)); doc.add(new IntField("ticket_id", rs.getInt("ticket_id"), Field.Store.YES)); doc.add(new StringField("id_s", rs.getString("ticket_id"), Field.Store.YES)); w.addDocument(doc); It seems I can't query the ticket_id field at

Term Vector Frequency in Lucene 4.0

≡放荡痞女 提交于 2019-12-18 02:49:09
问题 I'm upgrading from Lucene 3.6 to Lucene 4.0-beta. In Lucene 3.x, the IndexReader contains a method IndexReader.getTermFreqVectors() , which I can use to extract the frequency of each term in a given document and field. This method is now replaced by IndexReader.getTermVectors() , which returns Terms . How can I make use of this (or probably other methods) to extract the term frequency in a document and a field? 回答1: Perhaps this will help you: // get terms vectors for one document and one

Scoring of solr multivalued field

血红的双手。 提交于 2019-12-17 23:26:11
问题 If I have a document with a multivalued field in Solr are the multiple values scored independently or just concatenated and scored as one big field? I'm hoping they're scored independently. Here's an example of what I mean: I have a document with a field for a person's name, where there may be multiple names for the same person. The names are all different (very different in some cases) but they all are the same person/document. Person 1: David Bowie, David Robert Jones, Ziggy Stardust, Thin

lucene good practice and thread safety

梦想与她 提交于 2019-12-17 22:51:31
问题 i'm using lucene to index documents and perform a search after which, i immediately delete them. all this can be considered as a somewhat atomic action that includes the following steps: index (writer) --> search (searcher) --> get docs by score (reader) --> delete docs (reader) this action can be performed by multiple concurrent threads on the same index (using FSDirectory ). IMPORTANT NOTE: each thread handles a separate set of documents so one thread will not touch another thread's