lucene | 易学教程

How can you boost documents by recency in RavenDB?

阅读更多关于 How can you boost documents by recency in RavenDB?

问题 Is it possible to boost recent documents in a RavenDB query? This question is exactly what I want to do but refers to native Lucene, not RavenDB. For example, if I have a Document like this public class Document { public string Title { get; set; } public DateTime DateCreated { get; set; } } How can I boost documents who's date are closer to a given date, e.g. DateTime.UtcNow ? I do not want to OrderByDecending(x => x.DateCreated) as there are other search parameters that need to affect the

Too many fields bad for elasticsearch index?

阅读更多关于 Too many fields bad for elasticsearch index?

问题 Let say I have a thousand keys, and I would want to store the associated values. The intuitive approach seems to be something like { "key1":"someval", "key2":"someotherval", ... } Is this a bad design pattern for elasticsearch index to have thousands of keys? Would each keys introduced this way create overhead for every documents under the index? 回答1: If you know there is an upper limit to the number of keys you'll have, a few thousand fields is not a problem. The problem is when you have an

Indexing multilingual words in lucene

阅读更多关于 Indexing multilingual words in lucene

问题 I am trying to index in Lucene a field that could have RDF literal in different languages. Most of the approaches I have seen so far are: Use a single index, where each document has a field per each language it uses, or Use M indexes, M being the number of languages in the corpus. Lucene 2.9+ has a feature called Payload that allows to attach attributes to term. Is anyone use this mechanism to store language (or other attributes such as datatypes) information ? How is performance compared to

How to find related items by tags in Lucene.NET

阅读更多关于 How to find related items by tags in Lucene.NET

问题 My indexed documents have a field containing a pipe-delimited set of ids: a845497737704e8ab439dd410e7f1328| 0a2d7192f75148cca89b6df58fcf2e54| 204fce58c936434598f7bd7eccf11771 (ignore line breaks) This field represents a list of tags. The list may contain 0 to n tag Ids. When users of my site view a particular document, I want to display a list of related documents. This list of related document must be determined by tags: Only documents with at least one matching tag should appear in the

Lucene索引库的增删改

阅读更多关于 Lucene索引库的增删改

新增索引 IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer()); IndexWriter writer = new IndexWriter(directory,config); Document document = new Document(); Field field = new TextField("zzz","444",Field.Store.YES); document.add(field); writer.addDocument(document); writer.close(); 删除索引 IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer()); IndexWriter writer = new IndexWriter(directory,config); writer.deleteAll();//删除所有文档 //删除所有zzz域 writer.deleteDocuments(new Term("zzz")); //删除zzz域中头444的关键字的document writer.deleteDocuments(new Term("zzz","444")); //修改本质就是先删除后增加

How to add analyzer settings in ElasticSearch?

阅读更多关于 How to add analyzer settings in ElasticSearch?

问题 I am using ElasticSearch 1.5.2 and I wish to have the following settings : "settings": { "analysis": { "filter": { "filter_shingle": { "type": "shingle", "max_shingle_size": 2, "min_shingle_size": 2, "output_unigrams": false }, "filter_stemmer": { "type": "porter_stem", "language": "English" } }, "tokenizer": { "my_ngram_tokenizer": { "type": "nGram", "min_gram": 1, "max_gram": 1 } }, "analyzer": { "ShingleAnalyzer": { "tokenizer": "my_ngram_tokenizer", "filter": [ "standard", "lowercase",

Lucene.net Fuzzy Phrase Search

阅读更多关于 Lucene.net Fuzzy Phrase Search

问题 I have tried this myself for a considerable period and looked everywhere around the net - but have been unable to find ANY examples of Fuzzy Phrase searching via Lucene.NET 2.9.2. ( C# ) Is something able to advise how to do this in detail and/or provide some example code - I would seriously seriously appreciate any help as I am totally stuck ? 回答1: I assume that you have Lucene running and created a search index with some fields in it. So let's assume further that: var fields = ... // a

Best way to reuse a Runnable

阅读更多关于 Best way to reuse a Runnable

问题 I have a class that implements Runnable and am currently using an Executor as my thread pool to run tasks (indexing documents into Lucene). executor.execute(new LuceneDocIndexer(doc, writer)); My issue is that my Runnable class creates many Lucene Field objects and I would rather reuse them then create new ones every call. What's the best way to reuse these objects (Field objects are not thread safe so I cannot simple make them static) - should I create my own ThreadFactory ? I notice that

Lucene / Hibernate Search Lock Exception

阅读更多关于 Lucene / Hibernate Search Lock Exception

问题 I use Hibernate Search to index and full-text search items on a web application, problem-less! From my pom.xml: <hibernate.search.version>3.4.2.Final</hibernate.search.version> <apache.lucene.version>3.6.2</apache.lucene.version> <apache.solr.version>3.6.2</apache.solr.version> <hibernate.version>3.6.9.Final</hibernate.version> Now, before going to production I tried to stress test the search feature of my web-application using Apache JMeter. When testing with more then one thread, I receive

How to disable default scoring/boosting in Hibernate Search/Lucene?

阅读更多关于 How to disable default scoring/boosting in Hibernate Search/Lucene?

问题 I want to serve my users the most relevant and best results. For example, I'm rewarding records that have a big title, description, attached photos, etc. For context: the records are bicycle routes, having routepoints (coordinates) and metadata like photos, reviews, etc. Now, I have indexed these records using Hibernate and then I search within the index using Lucene in Hibernate Search . To score my results, I build queries based on the document properties and boost them (using boostedTo() )