lucene | 易学教程

Getting the Doc ID in Lucene

阅读更多关于 Getting the Doc ID in Lucene

问题 In lucene, I can do the following doc.GetField("mycustomfield").StringValue(); This retrieves the value of a column in an index's document. My question, for the same 'doc' , is there a way to get the Doc. Id ? Luke displays it hence there must be a way to figure this out. I need it to delete documents on updates. I scoured the docs but have not found the term to use in GetField or if there already is another method. 回答1: Turns out you have to do this: var hits = searcher.Search(query); var

Elasticsearch query time boosting produces result in inadequate order

阅读更多关于 Elasticsearch query time boosting produces result in inadequate order

问题 The ES search result for the given search keyword one two three seems to be wrong after applying boost feature per keyword. Please help me modifying my "faulty" query in order to accomplish "expected result" below as I described. I'm on ES 1.7.4 with LUCENE 4.10.4 Boosting criteria - three is regarded as the most important keyword : word - boost ---- ----- one 1 two 2 three 3 ES index content - just showing MySQL dump to make the post shorter mysql> SELECT id, title FROM post; +----+---------

Best approach for doing full-text search with list-of-integers documents

阅读更多关于 Best approach for doing full-text search with list-of-integers documents

问题 I'm working on a C++/Qt image retrieval system based on similarity that works as follows (I'll try to avoid irrelevant or off-topic details): I take a collection of images and build an index from them using OpenCV functions. After that, for each image, I get a list of integer values representing important "classes" that each image belongs to. The more integers two images have in common, the more similar they are believed to be. So, when I want to query the system, I just have to compute the

Why does Lucene cause OOM when indexing large files?

阅读更多关于 Why does Lucene cause OOM when indexing large files?

问题 I’m working with Lucene 2.4.0 and the JVM (JDK 1.6.0_07). I’m consistently receiving OutOfMemoryError: Java heap space , when trying to index large text files. Example 1: Indexing a 5 MB text file runs out of memory with a 64 MB max. heap size. So I increased the max. heap size to 512 MB. This worked for the 5 MB text file, but Lucene still used 84 MB of heap space to do this. Why so much? The class FreqProxTermsWriterPerField appears to be the biggest memory consumer by far according to

Why does Lucene cause OOM when indexing large files?

阅读更多关于 Why does Lucene cause OOM when indexing large files?

StandardAnalyzer with stemming

阅读更多关于 StandardAnalyzer with stemming

问题 Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks 回答1: If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the

StandardAnalyzer with stemming

阅读更多关于 StandardAnalyzer with stemming

Search queries in neo4j: how to sort results in neo4j in START query with internal TFIDF / levenshtein or other algorithms?

阅读更多关于 Search queries in neo4j: how to sort results in neo4j in START query with internal TFIDF / levenshtein or other algorithms?

问题 I am working on a model using wikipedia topics' names for my experiments in full-text index. I set up and index on 'topic' (legacy), and do a full text search for : 'united states' : start n=node:topic('name:(united states)') return n The first results are not relevant at all: 'List of United States National Historic Landmarks in United States commonwealths and territories, associated states, and foreign states' [...] and the actual 'united states' is buried deep down the list. As such, it

Timing out a query in Solr

阅读更多关于 Timing out a query in Solr

问题 I hitting queries to solr through a custom developed layer and few queries which i time out in my layer are still in the solr instance. Is there a parameter in solr which can be used to time out an particular query 回答1: As stated in Solr query continues after client disconnects? and written in the Solr FAQ Internally, Solr does nothing to time out any requests -- it lets both updates and queries take however long they need to take to be processed fully. But at the same spot in the FAQ is

Timing out a query in Solr

阅读更多关于 Timing out a query in Solr