lucene

Searching over documents stored in Hadoop - which tool to use?

前提是你 提交于 2019-12-18 12:38:22
问题 I'm lost in: Hadoop, Hbase, Lucene, Carrot2, Cloudera, Tika, ZooKeeper, Solr, Katta, Cascading, POI... When you read about the one you can be often sure that each of the others tools is going to be mentioned. I don't expect you to explain every tool to me - sure not. If you could help me to narrow this set for my particular scenario it would be great. So far I'm not sure which of the above will fit and it looks like (as always) there are more then one way of doing what's to be done. The

Lucene Proximity Search for phrase with more than two words

痞子三分冷 提交于 2019-12-18 12:01:23
问题 Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10 example in http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches However, I am wondering what does a search like "jakarta apache lucene"~10 exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that? Thanks! 回答1: The slop (proximity) works like an edit distance (see PhraseQuery.setSlop). So,

Find list of terms indexed by Lucene

若如初见. 提交于 2019-12-18 12:00:32
问题 Is it possible to extract the list of all the terms in a Lucene index as a list of strings? I couldn't find that functionality in the doc. Thanks! 回答1: Lucene 3: C#: C# Lucene get all the index Java: IndexReader indexReader = IndexReader.open(path); TermEnum termEnum = indexReader.terms(); while (termEnum.next()) { Term term = termEnum.term(); System.out.println(term.text()); } termEnum.close(); indexReader.close(); Java (all terms for a specific field): How can I get the list of unique terms

Find list of terms indexed by Lucene

ぃ、小莉子 提交于 2019-12-18 12:00:03
问题 Is it possible to extract the list of all the terms in a Lucene index as a list of strings? I couldn't find that functionality in the doc. Thanks! 回答1: Lucene 3: C#: C# Lucene get all the index Java: IndexReader indexReader = IndexReader.open(path); TermEnum termEnum = indexReader.terms(); while (termEnum.next()) { Term term = termEnum.term(); System.out.println(term.text()); } termEnum.close(); indexReader.close(); Java (all terms for a specific field): How can I get the list of unique terms

Lucene.NET - sorting by int

十年热恋 提交于 2019-12-18 11:56:29
问题 In the latest version of Lucene (or Lucene.NET), what is the proper way to get the search results back in sorted order? I have a document like this: var document = new Lucene.Document(); document.AddField("Text", "foobar"); document.AddField("CreationDate", DateTime.Now.Ticks.ToString()); // store the date as an int indexWriter.AddDocument(document); Now I want do a search and get my results back in order of most recent. How can I do a search that orders results by CreationDate? All the

MongoDB full text search vs Lucene? [closed]

房东的猫 提交于 2019-12-18 11:34:51
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 5 years ago . How does MongoDB's full text search compare to Lucene at the present time? The reason for the question is due to my indeterminacy to: a) use mongo's FTS implementation in production since it was still in beta around 6 months ago and b) because lucene uses Java which will

MongoDB full text search vs Lucene? [closed]

大城市里の小女人 提交于 2019-12-18 11:34:39
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 5 years ago . How does MongoDB's full text search compare to Lucene at the present time? The reason for the question is due to my indeterminacy to: a) use mongo's FTS implementation in production since it was still in beta around 6 months ago and b) because lucene uses Java which will

How to normalize Lucene scores?

不羁的心 提交于 2019-12-18 11:15:07
问题 I need to normalize the Lucene scores between 0 and 1. For example, a random query returns the following scores... 8.864665 2.792687 2.792687 2.792687 2.792687 0.49009037 0.33730242 0.33730242 0.33730242 0.33730242 What's the biggest score ? 10.0 ? thanks 回答1: You can divide all scores with the maximum score to get scores between 0 and 1. However, please note that the normalised scores should be used to compare the results of a single query only. It is not correct to compare the scores

How do I index and search text files in Lucene 3.0.2?

北城以北 提交于 2019-12-18 10:56:15
问题 I am newbie in Lucene, and I'm having some problems creating simple code to query a text file collection . I tried this example, but is incompatible with the new version of Lucene. UDPATE: This is my new code, but it still doesn't work yet. 回答1: Lucene is a quite big topic with a lot of classes and methods to cover, and you normally cannot use it without understanding at least some basic concepts. If you need a quickly available service, use Solr instead. If you need full control of Lucene,

Paging Lucene's search results

一个人想着一个人 提交于 2019-12-18 10:44:26
问题 I am using Lucene to show search results in a web application.I am also custom paging for showing the same. Search results could vary from 5000 to 10000 or more. Can someone please tell me the best strategy for paging and caching the search results? 回答1: I would recommend you don't cache the results, at least not at the application level. Running Lucene on a box with lots of memory that the operating system can use for its file cache will help though. Just repeat the search with a different