lucene | 易学教程

Custom full-text index stored in Cassandra

阅读更多关于 Custom full-text index stored in Cassandra

问题 I've got a situation where I'm using Cassandra for DB and I need full-text search capability. Now I'm aware of Apache Solr, Apache Cassandra, and DSE search. However, I do not want to use a costly and proprietary software(DSE search). The reason I do not want to use Apache Solr is because I don't want to deal with HA, sharding, and redundency for it. Cassandra is perfect for HA, sharding, and redundency; I would like to store my full-text index in the existing Cassandra DB. So what I'm

Searching for UUID in lucene not working

阅读更多关于 Searching for UUID in lucene not working

问题 I've got a UUID field I'm adding to my document in the following format: 372d325c-e01b-432f-98bd-bc4c949f15b8. However, when I try to query for documents by the UUID it will not return them no matter how I try to escape the expression. For example: +uuid:372d325c-e01b-432f-98bd-bc4c949f15b8 +uuid:"372d325c-e01b-432f-98bd-bc4c949f15b8" +uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8 +uuid:(372d325c-e01b-432f-98bd-bc4c949f15b8) +uuid:("372d325c-e01b-432f-98bd-bc4c949f15b8") And even skipping the

WordnetSynonymParser in Lucene

阅读更多关于 WordnetSynonymParser in Lucene

问题 I am new to Lucene and I'm trying to use WordnetSynonymParser to expand queries using the wordnet synonyms prolog. Here is what I have till now: public class CustomAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader){ // TODO Auto-generated method stub Tokenizer source = new ClassicTokenizer(Version.LUCENE_47, reader); TokenStream filter = new StandardFilter(Version.LUCENE_47, source); filter = new LowerCaseFilter(Version

Can I protect short words from an n-gram filter in Solr?

阅读更多关于 Can I protect short words from an n-gram filter in Solr?

问题 I have seen this question about searching for short words in Solr. I am wondering if there is another possible solution to a similar problem. I am using the EdgeNGramFilter with a minGramSize of 3. I want to protect a specific set of shorter words (two-letter acronyms, mainly) from being ignored, but I'd like to keep that minGramSize of 3 for everything else. EdgeNGramFilter doesn't support a protected words list. Is there any filter or setting that makes this possible within a single field

Lucene - searching for a numeric value field

阅读更多关于 Lucene - searching for a numeric value field

问题 ok, i have searched for this in the past two hours with results that only give's tips, and not even one complete code to the rescue ( how would noobs learn if they cant see some samples ? ) i have created an index like so: Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(Server.MapPath("/data/channels/"))); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29); IndexWriter writer = new IndexWriter(directory, analyzer, true, Lucene.Net.Index.IndexWriter.MaxFieldLength

How do you configure Lucene in Sitecore to only index the latest version of an item on the master db?

阅读更多关于 How do you configure Lucene in Sitecore to only index the latest version of an item on the master db?

问题 I recognise this is a moot point on the web database, so this question applies to the master db... I have a custom index set up in Sitecore 6.4.1 as follows: <index id="search_content_US" type="Sitecore.Search.Index, Sitecore.Kernel"> <param desc="name">$(id)</param> <param desc="folder">_search_content_US</param> <Analyzer ref="search/analyzer" /> <locations hint="list:AddCrawler"> <search_content_home type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel"> <Database>master<

How does ElasticSearch and Lucene share the memory

阅读更多关于 How does ElasticSearch and Lucene share the memory

问题 I have one question about the following quota from ES official doc: But if you give all available memory to Elasticsearch’s heap, there won’t be any left over for Lucene. This can seriously impact the performance of full-text search. If my server has 80G memory, I issued the following command to start ES node: bin/elasticsearch -xmx 30g That means I only give the process of ES 30g memory maximum. How can Lucene use the left 50G, since Lucene is running in ES process, it's just part of the

Multilingual Search using lucene

阅读更多关于 Multilingual Search using lucene

问题 I am doing a multilingual search. And I will use lucene as the tool to do it. I have the translated contents already, there will be 3 or 4 languages of each document. For indexing and search, there could be the 4 strategies, For each document/contents: each language are indexed in different index/directory. each language are indexed in different document but in the same index. each language are indexed in different Field but in the same document. all the languages are indexed in the same

Creating and updating Zend_Search_Lucene indexes

阅读更多关于 Creating and updating Zend_Search_Lucene indexes

问题 I'm using Zend_Search_Lucene to create an index of articles to allow them to be searched on my website. Whenever a administrator updates/creates/deletes an article in the admin area, the index is rebuilt: $config = Zend_Registry::get("config"); $cache = $config->lucene->cache; $path = $cache . "/articles"; try { $index = Zend_Search_Lucene::open($path); } catch (Zend_Search_Lucene_Exception $e) { $index = Zend_Search_Lucene::create($path); } $model = new Default_Model_Articles(); $select =

How to delete elastic search indices periodically?

阅读更多关于 How to delete elastic search indices periodically?

问题 I have created indices on daily basis to store the search history and i am using those indices for the suggestions in my applciation, which helps me to suggest based on history as well. now i have to maintain only last 10 days of history. So is there any feature in Elastic search that allows me to create and delete indices periodically? 回答1: The only thing I can think of is using data math: https://www.elastic.co/guide/en/elasticsearch/reference/current/date-math-index-names.html In sense you