solr | 易学教程

Obtain metadata associated with matched content in Solr/Lucene

阅读更多关于 Obtain metadata associated with matched content in Solr/Lucene

问题 I've a large set of text documents which I will index with Solr, in a format where each line of text has associated metadata. For example: #metadata1 A line of text. #metadata2 Another long, broken line of #metadata3 text that should be searchable. I'd like to index this such that the content is searchable, including phrase matches spanning multiple lines, but not the metadata. However, I can't discard the metadata: I would like to have any matches still have the associated metadata. E.g. A

How to join two different cores from two different Solr servers?

阅读更多关于 How to join two different cores from two different Solr servers?

问题 So I have some cores in one solr server and some cores in another solr server and I need to join them. The schema of the cores are different with no matching attribute name but matching attribute value. I tried to do it with join & shards but both didn't work. Can you help me out? attribute1 is in abc:7892/solr/core1 attribute2 , attribute3 is in xyz:8983/solr/core2 {!join from=attribute1 to=attribute2 fromIndex="xyz:8983/solr/core2"} attribute3:* Error Message : Cross-core join: no such core

Solr Facet and Tokenizer

阅读更多关于 Solr Facet and Tokenizer

问题 I have solr array field that could contain string with some separate words as a one value, for example ["Super Ball", "BlaBla", "Info"]. I need to see all those 3 values as an facet values and have case insensitive search by fields as well. If I use next field type setting I see 3 values in facet but case insensitive search doesn't work. <fieldType name="myLower" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/>

Solr Facet and Tokenizer

阅读更多关于 Solr Facet and Tokenizer

SOLR and accented characters

阅读更多关于 SOLR and accented characters

问题 I have an index for occupations (identifier + occupation): <field name="occ_id" type="int" indexed="true" stored="true" required="true" /> <field name="occ_tx_name" type="text_es" indexed="true" stored="true" multiValued="false" />  <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words=

Compare strings of text between two tables in a database or locally

阅读更多关于 Compare strings of text between two tables in a database or locally

问题 Edit : SQL doesn't work for this. I just found out about Solr/Sphinx and it seems like the right tool for this problem, so if you know Solr or Sphinx I'm eager to hear from you. Basically, I have a .tsv with patent info and a .csv with product names. I need to match each row of the patents column against the product names and extract the occurrences in a new .csv column. You can scroll down and see the example at the end. Original question: SQL newbie here so bear with me :). I can't figure

Some word is not indexed in solr properly

阅读更多关于 Some word is not indexed in solr properly

问题 I don't know what is going wrong. http://IP_ADDRESS/solr/CORE_NAME/select?indent=on&q=Bangalore&wt=json There are more than 100 records which contains the word Bangalore in my database. However the the results contain just 2 records. However, The below Query below for works perfectly. http://IP_ADDRESS/solr/CORE_NAME/select?indent=on&q=Bangalor&wt=json Just removing the letter e from Bangalore , i get much more results containing the word "Bangalore". I think the word "Bangalore" is not

Multiple shards on single machine performance

阅读更多关于 Multiple shards on single machine performance

问题 Does it make sense to have multiple shards in Elasticsearch if I am going to use only single machine? Will it improve performance in any way? Same question for Apache Solr - does it make sense to use Solr Cloud with ZooKeeper for single server instance or just create one core without any sharding? Let's assume I am not going to use other machines in future, so the main point is how sharding on single machine influence search engines performance? 回答1: There are certain parts of Lucene that's

What is the regular expression to remove spaces in SOLR

阅读更多关于 What is the regular expression to remove spaces in SOLR

问题 In a Regex, how to remove all the leading, trailing and where ever spaces exist in SOLR. To remove special characters, we can have the PatternReplaceFilterFactory as <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" /> What pattern value will be formed to remove the spaces whereever it comes. 回答1: I don't know SOLR but based on your example I guess you could just do <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement=""

Solr Delta Import Query is not working

阅读更多关于 Solr Delta Import Query is not working

问题 I am trying to import data from Mongodb to Solr6.0. Full import is executing properly but delta import is not working. When I execute delta import I get below result. Requests: 0 , Fetched: 0 , Skipped: 0 , Processed: 0 My data config file queries are as below query="" deltaQuery="db.getCollection('customer').find({'jDate':{$gt:'${dih.last_index_time}'}},{'_id' :1});" deltaImportQuery="db.getCollection('customer').find({'_id':'${dataimporter.delta.id}'})" the whole data-config.xml <?xml