lucene

ElasticSearch - return the complete value of a facet for a query

我怕爱的太早我们不能终老 提交于 2020-01-01 09:14:31
问题 I've recently started using ElasticSearch. I try to complete some use cases. I have a problem for one of them. I have indexed some users with their full name (e.g. "Jean-Paul Gautier", "Jean De La Fontaine"). I try to get all the full names responding to some query. For example, I want the 100 most frequent full names beggining by "J" { "query": { "query_string" : { "query": "full_name:J*" } } }, "facets":{ "name":{ "terms":{ "field": "full_name", "size":100 } } } } The result I get is all

ElasticSearch - return the complete value of a facet for a query

丶灬走出姿态 提交于 2020-01-01 09:13:05
问题 I've recently started using ElasticSearch. I try to complete some use cases. I have a problem for one of them. I have indexed some users with their full name (e.g. "Jean-Paul Gautier", "Jean De La Fontaine"). I try to get all the full names responding to some query. For example, I want the 100 most frequent full names beggining by "J" { "query": { "query_string" : { "query": "full_name:J*" } } }, "facets":{ "name":{ "terms":{ "field": "full_name", "size":100 } } } } The result I get is all

What is omitNorms and version field in solr schema?

坚强是说给别人听的谎言 提交于 2020-01-01 07:33:48
问题 I am not understanding when to use omitNorms="true". I read 2-3 links but still I am not clear with its meaning. what does it mean "Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms." at http://wiki.apache.org/solr/SchemaXml page 回答1: Norms are stored as a Single byte information in the index per document per field.

Lucene indexing and searching at the same time

六月ゝ 毕业季﹏ 提交于 2020-01-01 05:29:22
问题 I want to search with Lucene on an index. The index is changed frequently. So I need to do something to search and index at the same time. It's a web application on Tomcat. And I want to use RAMDeirectory to increase the searching speed. I don't know how to do it! 回答1: NRTManager in the misc Lucene package provides the ability to search and index at the same time. TrackingIndexWriter writer; // your writer SearcherFactory factory = new SearcherFactory(); NRTManager mgr = new NRTManager(writer

How do we create a simple search engine using Lucene, Solr or Nutch?

自作多情 提交于 2020-01-01 05:07:06
问题 Our company has thousands of PDF documents. How do we create a simple search engine using Lucene, Solr or Nutch? We'll provide a basic Java/JSP web page were people can type in words and perform basic and/or queries then show them the document links of all matching PDF's. 回答1: None of the projects in the Lucene family can natively process PDFs, but there are utilities you can drop in and well written examples on how to roll your own. Lucene will do pretty much whatever you need it to do, but

Boost factor in MultiFieldQueryParser

不问归期 提交于 2020-01-01 04:53:10
问题 Can I boost different fields in MultiFieldQueryParser with different factors? Also, what is the maximum boost factor value I can assign to a field? Thanks a ton! Ed 回答1: MultiFieldQueryParser has a [constructor][1] that accepts a map of boosts. You use it with something like this: String[] fields = new String[] { "title", "keywords", "text" }; HashMap<String,Float> boosts = new HashMap<String,Float>(); boosts.put("title", 10); boosts.put("keywords", 5); MultiFieldQueryParser queryParser = new

Searching on date ranges with Lucene in Java?

时光怂恿深爱的人放手 提交于 2020-01-01 04:35:08
问题 Is it possible to search on date ranges using Lucene in Java? How do I build Lucene search queries based on date fields and dates ranges? For example: between specified dates prior to a specified date after a specified date within the last 24 hours within the past week within the past month. [Edit] i'm using Lucene 2.4.1 and my system is really legacy and really poorly tested so i would like if possible not to have to upgrade 回答1: Lucene (before version 2.9 anyway) only stores String values,

Build a Kibana Histogram with buckets dynamically created by ElasticSearch terms aggregation

无人久伴 提交于 2020-01-01 04:01:08
问题 I want to be able to combine the functionality of the Kibana Terms Graph (be able to create buckets based on uniqueness of values from a particular attribute) and Histogram Graph (separate data into buckets based on queries and then illustrate the date based on time). Overall, I want to create a Histogram, but I only want to create the Histogram based on the results of one query, not multiple queries like it's being done in the Kibana demo app. Instead, I want each bucket to be dynamically

Solr associations

五迷三道 提交于 2020-01-01 03:14:09
问题 The last couple of days we are thinking of using Solr as our search engine of choice. Most of the features we need are out of the box or can be easily configured. There is however one feature that we absolutely need that seems to be well hidden (or missing) in Solr. I'll try to explain with an example. We have lots of documents that are actually businesses: <document> <name>Apache</name> <cat>1</cat> ... </document> <document> <name>McDonalds</name> <cat>2</cat> ... </document> In addition we

Using Solr for indexing multiple languages

谁都会走 提交于 2020-01-01 02:38:30
问题 We're setting up a Solr to index documents where title field can be in various languages. After googling I found two options: Define different schema fields for every language i.e. title_en, title_fr,... applying different filters to each language then query one of title fields with a corresponding language. Creating different Solr cores to handle each language and make our app query correct Solr core. Which one is better? What are the ups and downs? Thanks 回答1: There's also a third