lucene

Sorting and latest records in ElasticSearch

て烟熏妆下的殇ゞ 提交于 2019-12-23 18:10:39
问题 I've two questions related to ElasticSearch. 1) Is there any way to specify that I want results with specific field sorted in descending order? An equivalt SQL query will be: select * from table1 where a="b" order by myprimarykey desc; 2) How to get first and last(latest) record? 回答1: 1) Elasticsearch has quite sophisticated Sorting API that allows you to control sort order. So, in elasticsearch, an equivalent to your MySql query would look like this: { "query" : { "term" : { "a" : "b" } },

sort the solr search result. give error can not sort on multivalued field: name

橙三吉。 提交于 2019-12-23 17:16:08
问题 I am newer to the Apache Solr search. I am trying to sort the result set in the Solr query. Query : name:abc* AND hidden:false & sort=name desc It's showing the error : can not sort on the multivalued field: name Solr version is: 7.2.1 回答1: If you’re using recent versions of the Solr (>5.3) you should be able to use min or max functions to do sorting on multivalued fileds like this: sort=field(field_to_sort_on,min) asc The only requirement to achieve this is to use DocValues on this field -

Lucene, indexing already/externally tokenized tokens and defining own analyzing process

我与影子孤独终老i 提交于 2019-12-23 17:06:33
问题 in the process of using Lucene, I am a bit disapointed. I do not see or understand how i should proceed to feed any Lucene analyzers with something that is already and directly indexable. Or how i should proceed to create my own analyzer... for example, if i have a List<MyCustomToken> , which already contains many tokens (and actually many more informations about capitalization, etc. that i would also like to index as features on each of MyCustomToken) if i understand well what i have read, i

Lucene Analyzer for Indexing and Searching

主宰稳场 提交于 2019-12-23 17:05:17
问题 I have a field that I am indexing with Lucene like so: @Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES) public HungerState getHungerState() { The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY When these values are indexed using the StandardAnalyzer , the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not". If I change the index to index=Index.UN_TOKENIZED , the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT

Manipulate Lucene query before performing search

余生颓废 提交于 2019-12-23 15:52:29
问题 I'm working on a Java webapp (Spring 3.x) that uses SOLR for its search engine. I want to be able to intercept the Lucene query and substitute a "virtual" search field for either one of two indexed fields, based upon a lookup service (if successful use a range search otherwise search a regular field). E.g., given a query like field0:foo (field1:bar OR field1:bash) AND field2:bing (field1 being a virtual field) manipulate the query to get field0:foo (field3:[42 TO 45] OR field4:bash) AND

Custom tokenizer solr only is invoked at the first

∥☆過路亽.° 提交于 2019-12-23 12:52:41
问题 I created a custom tokenizer, it seem work fine by checking with admin/analysis.jsp and with system.out log. However when I perform querying in the field which use this custom tokenizer, I saw that custom tokenizer solr only is invoked for the first query string (check by system.out log). Could you help me by point out what I am wrong ?. These are my code: package com.fosp.searchengine; import java.io.Reader; import org.apache.lucene.analysis.WhitespaceTokenizer; import org.apache.solr

lucene indexing of html files

a 夏天 提交于 2019-12-23 12:27:46
问题 Dear Users I am working on apache lucene for indexing and searching . I have to index html files stored on the local disc of computer . I have to make indexing on filename and contents of the html files . I am able to store the file names in the lucene index but not the html file contents which should index not only the data but the entire page consisting images link and url and how can i access the contents from those indexed files for indexing i am using the following code: File indexDir =

Luke says my Lucene index directory is Invalid

无人久伴 提交于 2019-12-23 11:24:13
问题 I'm trying to learn about Lucene, and hope to use Luke to investigate it. I tried building an index with the IndexFiles demo in Lucene 4.3, then tried viewing the index with the latest version of Luke, and I'm getting the message: Invalid directory at the location, check console for more information. Last exception: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: ChecksumIndexInput(MMapIndexInput(path="/home/lavin/sep20.index/segments_2"))): 1

mg4j vs. apache lucene

ε祈祈猫儿з 提交于 2019-12-23 10:39:35
问题 Can anyone provide a simple comparative analysis of these search engines? What advantages does either framework have? BTW, I've seen the following basic explanations of choosing mg4j from several academic papers: combining indices over the same collection multi-index queries Update: These slides (from mir2ed.org) contain a more fresh overview of open source search engines including Lucene and mg4j on benchmarking various aspects: memory & CPU, index size, search performance, search quality

Neo4j indexing (with Lucene) - good way to organize node “types”?

此生再无相见时 提交于 2019-12-23 10:08:08
问题 This is more actually more of a Lucene question, but it's in the context of a neo4j database. I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't. When searching, I always want to find nodes of a specific type, never across all nodes. I can see three ways of organizing this: One index per type, properties map naturally to index fields: index