lucene | 易学教程

Words normalization using RDD

阅读更多关于 Words normalization using RDD

问题 Maybe this question is a little bit strange... But I'll try to ask it. Everyone, who wrote applications with using Lucene API, seen something like this: public static String removeStopWordsAndGetNorm(String text, String[] stopWords, Normalizer normalizer) throws IOException { TokenStream tokenStream = new ClassicTokenizer(Version.LUCENE_44, new StringReader(text)); tokenStream = new StopFilter(Version.LUCENE_44, tokenStream, StopFilter.makeStopSet(Version.LUCENE_44, stopWords, true));

profanity filteration in solr

阅读更多关于 profanity filteration in solr

问题 I am servering some webdata to my site using apache solr 4.10.3 . I have to block profanity. How do I block profanity in search? I have some confusion about this filter deployment also. Should I apply filter for profanity at document indexing time or document search time? 回答1: You have 2 possibilities : Don't send the document to Solr in the first place (filter it in your code) Implement a custom UpdateRequestProcessor : https://cwiki.apache.org/confluence/display/solr/Update+Request

Solr - Aggregate Term Frequency by Group

阅读更多关于 Solr - Aggregate Term Frequency by Group

问题 Let's say I have the following set of grouped websites crawled and indexed in Solr (latest) : { "id":"1", "domain": "http://www.category1website1.com", "domainGroup": "Group 1" },{ "id":"2", "domain": "http://www.category1website2.com", "domainGroup": "Group 1" },{ "id":"3", "domain": "http://www.category2website1.com", "domainGroup": "Group 2" } I'm looking for a result set that will give me the term frequency in each individual domain but also the aggregated term frequency of that search

Neo4j: full-text lucene legacy indexes (node_auto_index) does not work after migration

阅读更多关于 Neo4j: full-text lucene legacy indexes (node_auto_index) does not work after migration

问题 After successful migration from Neo4j 2.2.8 to 3.0.4 using official faq, full text search does not work as expected. Fuzziness is not that fuzzy as it was before. Example: START n=node:node_auto_index('name:(+Target~0.85)') MATCH (n) RETURN n; Should return nodes with field name that contain work like 85% similar to 'Target'. Before it was matching the following: Target Target v2 After migration: Target Why and how to fix that? 回答1: Reason was that after migration lucene node_auto_index wasn

Solr result grouping error .Unexpected docvalues type SORTED_SET for field 'vendor' (expected=SORTED)

阅读更多关于 Solr result grouping error .Unexpected docvalues type SORTED_SET for field 'vendor' (expected=SORTED)

问题 I have a solr schema like this <fields> <field name="id" type="string" indexed="false" stored="true" required="true" /> <field name="product" type="string" indexed="true" stored="true" required="true" /> <field name="vendor" type="string" indexed="true" stored="true" required="true" /> <field name="language" type="string" indexed="true" stored="true" required="true" /> <field name="TotalInvoices" type="float" indexed="true" stored="true" required="true"/> </fields> I am querying the schema

Tree search with Lucene

阅读更多关于 Tree search with Lucene

问题 I have a taxonomy index that describes a tree structure. When performing a query I want to get the number of hits for multiple categories (not necessarily in the same level of the tree). For example, given the following list of paths: [Root/Cat1, Root/Cat1/Cat12, Root/Cat3] I want to obtain the number of hits for each of these three categories. I've been looking for a solution and I know that is possible to make a tree request and then get the results by calling .getSubResults() (as it is

What is better ? One big field or many small?

阅读更多关于 What is better ? One big field or many small?

问题 I'm about writing a search engine based on Zend Search Lucène. My objects have many different fields (10 text type), and i would like to know which of these ways is the best. (All fields are unstored, just indexed, I don't need to recover them.) One big field, (concatenation of many small fields) : $content = $textfield1 . $textfield2 . $textfield3 . $textfield4 ... Zend_Search_Lucene_Field::unStored("content", $content); OR Many small fields : Zend_Search_Lucene_Field::unStored("content",

What is better ? One big field or many small?

阅读更多关于 What is better ? One big field or many small?

Hibernate Search Integration with Apache Solr unable to index data

阅读更多关于 Hibernate Search Integration with Apache Solr unable to index data

问题 In my current application I use hibernate search to index and searching data. It works fine. But when building a cluster of server instances I do not need to use Master Slave clusters using JMS or JGroups. So I am trying to integrate hibernate search with apache solr. I had follow this example. And did some minor changes to be compatible with new apache.lucene.core version. public class HibernateSearchSolrWorkerBackend implements BackendQueueProcessor { private static final String ID_FIELD

Hibernate Search Integration with Apache Solr unable to index data

阅读更多关于 Hibernate Search Integration with Apache Solr unable to index data