lucene

Words normalization using RDD

谁都会走 提交于 2020-01-05 07:00:26
问题 Maybe this question is a little bit strange... But I'll try to ask it. Everyone, who wrote applications with using Lucene API, seen something like this: public static String removeStopWordsAndGetNorm(String text, String[] stopWords, Normalizer normalizer) throws IOException { TokenStream tokenStream = new ClassicTokenizer(Version.LUCENE_44, new StringReader(text)); tokenStream = new StopFilter(Version.LUCENE_44, tokenStream, StopFilter.makeStopSet(Version.LUCENE_44, stopWords, true));

profanity filteration in solr

隐身守侯 提交于 2020-01-05 05:45:50
问题 I am servering some webdata to my site using apache solr 4.10.3 . I have to block profanity. How do I block profanity in search? I have some confusion about this filter deployment also. Should I apply filter for profanity at document indexing time or document search time? 回答1: You have 2 possibilities : Don't send the document to Solr in the first place (filter it in your code) Implement a custom UpdateRequestProcessor : https://cwiki.apache.org/confluence/display/solr/Update+Request

Solr - Aggregate Term Frequency by Group

断了今生、忘了曾经 提交于 2020-01-05 05:39:07
问题 Let's say I have the following set of grouped websites crawled and indexed in Solr (latest) : { "id":"1", "domain": "http://www.category1website1.com", "domainGroup": "Group 1" },{ "id":"2", "domain": "http://www.category1website2.com", "domainGroup": "Group 1" },{ "id":"3", "domain": "http://www.category2website1.com", "domainGroup": "Group 2" } I'm looking for a result set that will give me the term frequency in each individual domain but also the aggregated term frequency of that search

Neo4j: full-text lucene legacy indexes (node_auto_index) does not work after migration

萝らか妹 提交于 2020-01-05 04:18:12
问题 After successful migration from Neo4j 2.2.8 to 3.0.4 using official faq, full text search does not work as expected. Fuzziness is not that fuzzy as it was before. Example: START n=node:node_auto_index('name:(+Target~0.85)') MATCH (n) RETURN n; Should return nodes with field name that contain work like 85% similar to 'Target'. Before it was matching the following: Target Target v2 After migration: Target Why and how to fix that? 回答1: Reason was that after migration lucene node_auto_index wasn

Solr result grouping error .Unexpected docvalues type SORTED_SET for field 'vendor' (expected=SORTED)

只谈情不闲聊 提交于 2020-01-05 03:08:23
问题 I have a solr schema like this <fields> <field name="id" type="string" indexed="false" stored="true" required="true" /> <field name="product" type="string" indexed="true" stored="true" required="true" /> <field name="vendor" type="string" indexed="true" stored="true" required="true" /> <field name="language" type="string" indexed="true" stored="true" required="true" /> <field name="TotalInvoices" type="float" indexed="true" stored="true" required="true"/> </fields> I am querying the schema

Tree search with Lucene

落爺英雄遲暮 提交于 2020-01-04 12:47:21
问题 I have a taxonomy index that describes a tree structure. When performing a query I want to get the number of hits for multiple categories (not necessarily in the same level of the tree). For example, given the following list of paths: [Root/Cat1, Root/Cat1/Cat12, Root/Cat3] I want to obtain the number of hits for each of these three categories. I've been looking for a solution and I know that is possible to make a tree request and then get the results by calling .getSubResults() (as it is

What is better ? One big field or many small?

旧时模样 提交于 2020-01-04 11:02:06
问题 I'm about writing a search engine based on Zend Search Lucène. My objects have many different fields (10 text type), and i would like to know which of these ways is the best. (All fields are unstored, just indexed, I don't need to recover them.) One big field, (concatenation of many small fields) : $content = $textfield1 . $textfield2 . $textfield3 . $textfield4 ... Zend_Search_Lucene_Field::unStored("content", $content); OR Many small fields : Zend_Search_Lucene_Field::unStored("content",

What is better ? One big field or many small?

我怕爱的太早我们不能终老 提交于 2020-01-04 11:00:06
问题 I'm about writing a search engine based on Zend Search Lucène. My objects have many different fields (10 text type), and i would like to know which of these ways is the best. (All fields are unstored, just indexed, I don't need to recover them.) One big field, (concatenation of many small fields) : $content = $textfield1 . $textfield2 . $textfield3 . $textfield4 ... Zend_Search_Lucene_Field::unStored("content", $content); OR Many small fields : Zend_Search_Lucene_Field::unStored("content",

Hibernate Search Integration with Apache Solr unable to index data

为君一笑 提交于 2020-01-04 07:35:29
问题 In my current application I use hibernate search to index and searching data. It works fine. But when building a cluster of server instances I do not need to use Master Slave clusters using JMS or JGroups. So I am trying to integrate hibernate search with apache solr. I had follow this example. And did some minor changes to be compatible with new apache.lucene.core version. public class HibernateSearchSolrWorkerBackend implements BackendQueueProcessor { private static final String ID_FIELD

Hibernate Search Integration with Apache Solr unable to index data

泪湿孤枕 提交于 2020-01-04 07:35:02
问题 In my current application I use hibernate search to index and searching data. It works fine. But when building a cluster of server instances I do not need to use Master Slave clusters using JMS or JGroups. So I am trying to integrate hibernate search with apache solr. I had follow this example. And did some minor changes to be compatible with new apache.lucene.core version. public class HibernateSearchSolrWorkerBackend implements BackendQueueProcessor { private static final String ID_FIELD