solr4

Configure Tesseract with solr 6.4.1

ぃ、小莉子 提交于 2020-06-28 06:30:18
问题 How to configure Tika OCR with solr 6.4.1. I indexed documents including PDF, images and MS office documents but problem was occurred Tika was not extracting text from images and also from images which are inside PDF and MS office documents. for this I researched Tika OCR is used. for this purpose i am installing tika-app-1.7.jar and Tesseract but i don't know how to configure them with my solr core. 回答1: You don't need to do anything special. Simply get the Tesseract OCR setup for your

Solr: Does the PatternReplaceFilterFactory able to replace the field value for copyField and then index it?

此生再无相见时 提交于 2020-01-26 01:47:57
问题 I have indexed the data from solr.xml and monitor.xml that came with the solr package, and I added the below configuration in the schema.xml file <field name="my_field" type="my_field_type" indexed="true" stored="true" required="false"/> <copyField source="name" dest="my_field" /> <fieldType name="my_field_type" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern=".*" replacement=

Remove diacritics at index time into Solr

核能气质少年 提交于 2020-01-25 05:55:17
问题 I am working on a Solr search fine tuning. I'm using Solr 4.0. Normally, I worked with language analyzers and tokenizers for English language, however this time I'm working with Portuguese language and I'm facing issue as it doesn't really give the expected result I need. For example: I'm searching for word 'proteses' but what is indexed is 'próteses' which is with diacritics. So it gives wrong results! What I need to do is remove all diacritics before indexing and search, so it gives correct

Solr query issue with Faceting and Stats in dse

独自空忆成欢 提交于 2020-01-16 18:02:06
问题 Query : http://localhost:8983/solr/trackfleet_db.location/select?q=*:*&facet=true&facet.pivot={!stats=piv1}date,latitude,longitude&stats=true&stats.field={!tag=piv1}gpsdt When I execute this query on a separate solr instance (which is not an instance of DSE) then this query works fine. But in case of dse (Now I am using in built Solr of DSE) then it does not return anything ....And when I execute this query using curl command then it is giving following error <?xml version="1.0" encoding="UTF

Solrcloud delete collection bug?

依然范特西╮ 提交于 2020-01-13 19:07:05
问题 First,I create a collection called usercollection : http://xxxxx/solr/admin/collections?action=CREATE&name=usercollection&numShards=3&replicationFactor=3&maxShardsPerNode=3 Then I found something wrong, so I delete it. http://xxxx/solr/admin/collections?action=DELETE&name=usercollection At last ,I want to create the collection again. And I found something wrong. `May 16, 2013 8:32:23 PM org.apache.solr.cloud.OverseerCollectionProcessor run INFO: Overseer Collection Processor: Get the message

Best Solr model when searching inside multivalued fields

Deadly 提交于 2020-01-06 14:06:07
问题 I have the following model for a type of document in Solr 5: 1 document per entity entity has about 100 single valued attributes entity has 1 multi valued attribute uuids_scores , which contains a value like "123_456", being the first part (123) the user id and the second part (456) a stored score I keep for each user. an entity can have about 100 k uuids_scores values. The way I am trying to use this is: I search for entities where uuids_scores:123_* and I get the list of entities I want.

solr - set fields as default search field - Using EdisMax

对着背影说爱祢 提交于 2020-01-05 10:28:44
问题 The following query works well for me http://...:8983/solr/vault/select?q=White&defType=edismax&qf=VersionComments+VersionName returns all the documents where version comments includes White I try to omit the qf containing the fields names : In solr config I write <requestHandler name="/select" class="solr.SearchHandler"> <!-- default values for query parameters can be specified, these will be overridden by parameters in the request --> <lst name="defaults"> <str name="echoParams">explicit<

Timestamp compatibility while performing delta import in solr

旧时模样 提交于 2020-01-04 06:35:26
问题 Im new to solr.I have successfully indexed oracle 10g xe database. Im trying to perform delta import on the same. The delta query requires a comparison of last_modified column of the table with ${dih.last_index_time} . However in my application I do not have such a column . Also, i cannot add this column. Therefore i used ' scn_to_timestamp(ora_rowscn) ' to give the value of the required timestamps. This query returns the value of type timestamp in the following format 24-JUL-13 12.42.32

SolrCloud with SSL and Basic Authentication

喜夏-厌秋 提交于 2020-01-03 17:42:12
问题 Is it possible to configure SolrCloud with SSL and Basic Authentication? I have configured 3 nodes of Solr in SolrCloud with SSL using this: https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and I have added authentication and authorization following this: https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin, https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin when only SSL is enabled it works. when only authentication +

Solr 4.4: StopFilterFactory and enablePositionIncrements

半世苍凉 提交于 2020-01-01 09:23:58
问题 While attempting to upgrade from Solr 4.3.0 to Solr 4.4.0 I ran into this exception: java.lang.IllegalArgumentException: enablePositionIncrements=false is not supported anymore as of Lucene 4.4 as it can create broken token streams which led me to this issue. I need to be able to match queries irrespective of intervening stopwords (which used to work with enablePositionIncrements="true"). For instance: "foo of the bar" would find documents matching "foo bar", "foo of bar", and "foo of the bar