lucene | 易学教程

ElasticSearch - return the complete value of a facet for a query

阅读更多关于 ElasticSearch - return the complete value of a facet for a query

问题 I've recently started using ElasticSearch. I try to complete some use cases. I have a problem for one of them. I have indexed some users with their full name (e.g. "Jean-Paul Gautier", "Jean De La Fontaine"). I try to get all the full names responding to some query. For example, I want the 100 most frequent full names beggining by "J" { "query": { "query_string" : { "query": "full_name:J*" } } }, "facets":{ "name":{ "terms":{ "field": "full_name", "size":100 } } } } The result I get is all

ElasticSearch - return the complete value of a facet for a query

阅读更多关于 ElasticSearch - return the complete value of a facet for a query

What is omitNorms and version field in solr schema?

阅读更多关于 What is omitNorms and version field in solr schema?

问题 I am not understanding when to use omitNorms="true". I read 2-3 links but still I am not clear with its meaning. what does it mean "Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms." at http://wiki.apache.org/solr/SchemaXml page 回答1: Norms are stored as a Single byte information in the index per document per field.

Lucene indexing and searching at the same time

阅读更多关于 Lucene indexing and searching at the same time

问题 I want to search with Lucene on an index. The index is changed frequently. So I need to do something to search and index at the same time. It's a web application on Tomcat. And I want to use RAMDeirectory to increase the searching speed. I don't know how to do it! 回答1: NRTManager in the misc Lucene package provides the ability to search and index at the same time. TrackingIndexWriter writer; // your writer SearcherFactory factory = new SearcherFactory(); NRTManager mgr = new NRTManager(writer

How do we create a simple search engine using Lucene, Solr or Nutch?

阅读更多关于 How do we create a simple search engine using Lucene, Solr or Nutch?

问题 Our company has thousands of PDF documents. How do we create a simple search engine using Lucene, Solr or Nutch? We'll provide a basic Java/JSP web page were people can type in words and perform basic and/or queries then show them the document links of all matching PDF's. 回答1: None of the projects in the Lucene family can natively process PDFs, but there are utilities you can drop in and well written examples on how to roll your own. Lucene will do pretty much whatever you need it to do, but

Boost factor in MultiFieldQueryParser

阅读更多关于 Boost factor in MultiFieldQueryParser

问题 Can I boost different fields in MultiFieldQueryParser with different factors? Also, what is the maximum boost factor value I can assign to a field? Thanks a ton! Ed 回答1: MultiFieldQueryParser has a [constructor][1] that accepts a map of boosts. You use it with something like this: String[] fields = new String[] { "title", "keywords", "text" }; HashMap<String,Float> boosts = new HashMap<String,Float>(); boosts.put("title", 10); boosts.put("keywords", 5); MultiFieldQueryParser queryParser = new

Searching on date ranges with Lucene in Java?

阅读更多关于 Searching on date ranges with Lucene in Java?

问题 Is it possible to search on date ranges using Lucene in Java? How do I build Lucene search queries based on date fields and dates ranges? For example: between specified dates prior to a specified date after a specified date within the last 24 hours within the past week within the past month. [Edit] i'm using Lucene 2.4.1 and my system is really legacy and really poorly tested so i would like if possible not to have to upgrade 回答1: Lucene (before version 2.9 anyway) only stores String values,

Build a Kibana Histogram with buckets dynamically created by ElasticSearch terms aggregation

阅读更多关于 Build a Kibana Histogram with buckets dynamically created by ElasticSearch terms aggregation

问题 I want to be able to combine the functionality of the Kibana Terms Graph (be able to create buckets based on uniqueness of values from a particular attribute) and Histogram Graph (separate data into buckets based on queries and then illustrate the date based on time). Overall, I want to create a Histogram, but I only want to create the Histogram based on the results of one query, not multiple queries like it's being done in the Kibana demo app. Instead, I want each bucket to be dynamically

Solr associations

阅读更多关于 Solr associations

问题 The last couple of days we are thinking of using Solr as our search engine of choice. Most of the features we need are out of the box or can be easily configured. There is however one feature that we absolutely need that seems to be well hidden (or missing) in Solr. I'll try to explain with an example. We have lots of documents that are actually businesses: <document> <name>Apache</name> <cat>1</cat> ... </document> <document> <name>McDonalds</name> <cat>2</cat> ... </document> In addition we

Using Solr for indexing multiple languages

阅读更多关于 Using Solr for indexing multiple languages

问题 We're setting up a Solr to index documents where title field can be in various languages. After googling I found two options: Define different schema fields for every language i.e. title_en, title_fr,... applying different filters to each language then query one of title fields with a corresponding language. Creating different Solr cores to handle each language and make our app query correct Solr core. Which one is better? What are the ups and downs? Thanks 回答1: There's also a third