lucene | 易学教程

Sorting and latest records in ElasticSearch

阅读更多关于 Sorting and latest records in ElasticSearch

问题 I've two questions related to ElasticSearch. 1) Is there any way to specify that I want results with specific field sorted in descending order? An equivalt SQL query will be: select * from table1 where a="b" order by myprimarykey desc; 2) How to get first and last(latest) record? 回答1: 1) Elasticsearch has quite sophisticated Sorting API that allows you to control sort order. So, in elasticsearch, an equivalent to your MySql query would look like this: { "query" : { "term" : { "a" : "b" } },

sort the solr search result. give error can not sort on multivalued field: name

阅读更多关于 sort the solr search result. give error can not sort on multivalued field: name

问题 I am newer to the Apache Solr search. I am trying to sort the result set in the Solr query. Query : name:abc* AND hidden:false & sort=name desc It's showing the error : can not sort on the multivalued field: name Solr version is: 7.2.1 回答1: If you’re using recent versions of the Solr (>5.3) you should be able to use min or max functions to do sorting on multivalued fileds like this: sort=field(field_to_sort_on,min) asc The only requirement to achieve this is to use DocValues on this field -

Lucene, indexing already/externally tokenized tokens and defining own analyzing process

阅读更多关于 Lucene, indexing already/externally tokenized tokens and defining own analyzing process

问题 in the process of using Lucene, I am a bit disapointed. I do not see or understand how i should proceed to feed any Lucene analyzers with something that is already and directly indexable. Or how i should proceed to create my own analyzer... for example, if i have a List<MyCustomToken> , which already contains many tokens (and actually many more informations about capitalization, etc. that i would also like to index as features on each of MyCustomToken) if i understand well what i have read, i

Lucene Analyzer for Indexing and Searching

阅读更多关于 Lucene Analyzer for Indexing and Searching

问题 I have a field that I am indexing with Lucene like so: @Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES) public HungerState getHungerState() { The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY When these values are indexed using the StandardAnalyzer , the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not". If I change the index to index=Index.UN_TOKENIZED , the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT

Manipulate Lucene query before performing search

阅读更多关于 Manipulate Lucene query before performing search

问题 I'm working on a Java webapp (Spring 3.x) that uses SOLR for its search engine. I want to be able to intercept the Lucene query and substitute a "virtual" search field for either one of two indexed fields, based upon a lookup service (if successful use a range search otherwise search a regular field). E.g., given a query like field0:foo (field1:bar OR field1:bash) AND field2:bing (field1 being a virtual field) manipulate the query to get field0:foo (field3:[42 TO 45] OR field4:bash) AND

Custom tokenizer solr only is invoked at the first

阅读更多关于 Custom tokenizer solr only is invoked at the first

问题 I created a custom tokenizer, it seem work fine by checking with admin/analysis.jsp and with system.out log. However when I perform querying in the field which use this custom tokenizer, I saw that custom tokenizer solr only is invoked for the first query string (check by system.out log). Could you help me by point out what I am wrong ?. These are my code: package com.fosp.searchengine; import java.io.Reader; import org.apache.lucene.analysis.WhitespaceTokenizer; import org.apache.solr

lucene indexing of html files

阅读更多关于 lucene indexing of html files

问题 Dear Users I am working on apache lucene for indexing and searching . I have to index html files stored on the local disc of computer . I have to make indexing on filename and contents of the html files . I am able to store the file names in the lucene index but not the html file contents which should index not only the data but the entire page consisting images link and url and how can i access the contents from those indexed files for indexing i am using the following code: File indexDir =

Luke says my Lucene index directory is Invalid

阅读更多关于 Luke says my Lucene index directory is Invalid

问题 I'm trying to learn about Lucene, and hope to use Luke to investigate it. I tried building an index with the IndexFiles demo in Lucene 4.3, then tried viewing the index with the latest version of Luke, and I'm getting the message: Invalid directory at the location, check console for more information. Last exception: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: ChecksumIndexInput(MMapIndexInput(path="/home/lavin/sep20.index/segments_2"))): 1

mg4j vs. apache lucene

阅读更多关于 mg4j vs. apache lucene

问题 Can anyone provide a simple comparative analysis of these search engines? What advantages does either framework have? BTW, I've seen the following basic explanations of choosing mg4j from several academic papers: combining indices over the same collection multi-index queries Update: These slides (from mir2ed.org) contain a more fresh overview of open source search engines including Lucene and mg4j on benchmarking various aspects: memory & CPU, index size, search performance, search quality

Neo4j indexing (with Lucene) - good way to organize node “types”?

阅读更多关于 Neo4j indexing (with Lucene) - good way to organize node “types”?

问题 This is more actually more of a Lucene question, but it's in the context of a neo4j database. I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't. When searching, I always want to find nodes of a specific type, never across all nodes. I can see three ways of organizing this: One index per type, properties map naturally to index fields: index