dih

Solr DIH showing data import successfull but no docs retrieved via query

徘徊边缘 提交于 2020-04-30 06:39:34
问题 I am using the SolrEntityProcessor in my DIH config to reindex data from one collection to another. Here is my DIH config for the same <dataConfig> <document> <entity name="sep" processor="SolrEntityProcessor" url="http://127.0.0.1:8983/solr/techPro2 " query="*:*"/> </document> </dataConfig> I have another collection techproducts(destination collection) which has the same configset(sample_techproducts_configs) as techPro2 (my source collection here). So after performing a fullimport of the

Solr DIH showing data import successfull but no docs retrieved via query

柔情痞子 提交于 2020-04-30 06:39:30
问题 I am using the SolrEntityProcessor in my DIH config to reindex data from one collection to another. Here is my DIH config for the same <dataConfig> <document> <entity name="sep" processor="SolrEntityProcessor" url="http://127.0.0.1:8983/solr/techPro2 " query="*:*"/> </document> </dataConfig> I have another collection techproducts(destination collection) which has the same configset(sample_techproducts_configs) as techPro2 (my source collection here). So after performing a fullimport of the

What is the difference between a Join Query and Embedded Entities in Solr DIH?

蹲街弑〆低调 提交于 2019-12-24 10:21:43
问题 I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so: <document> <entity name="item" pk="id" query="SELECT * FROM item"> <entity name="member" pk="memberid" query="SELECT * FROM member WHERE memberid='${item.memberid}'> </entity> </entity> </document> Another way that works is: <document> <entity name="item" pk="id" query="SELECT * FROM item INNER JOIN member ON item

SOLR: Inconsistencies using splitBy to populate a multi valued field

烈酒焚心 提交于 2019-12-23 05:58:40
问题 I'm having trouble using the splitBy functionality to populate a multi valued field from a pipe delimited datasource. My implementation seems to partially work for one of the field and does not the work for the the other field. An example of my implementation below. I have a db view with following data: recordId relist dbaName 1 PA21|MD29 The Hong Kong Dragon|The Peeled Apple My config: <dataConfig> <dataSource name="jdbc" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:

Splitting database column into multivalued Solr field

泪湿孤枕 提交于 2019-12-13 05:59:07
问题 I'm going nuts trying to figure out how to get the Data Import Handler's splitBy construct to work. I was expecting it to split the input column into a multivalued field. Here's a test case to reproduce the problem: import java.io.File; import java.io.IOException; import java.sql.SQLException; import static org.junit.Assert.*; import javax.sql.DataSource; import org.apache.commons.dbutils.QueryRunner; import org.apache.commons.io.FileUtils; import org.apache.solr.client.solrj.SolrQuery;

Solr: FileListEntityProcessor is executing sub entities multiple times

烈酒焚心 提交于 2019-12-13 05:22:09
问题 I have configured a dih-import.xml as shown below. The FileListEntityProcessor walks through some folders and then executes a XPathEntity and a DB-Entity for each file. When I executed a full import for ~30.000 files, the import took almost 3 hours. Back to the DIH-debug console it showed me, that for the first file that was found 2 db-calls were made, for the 2nd 4, then 6, 8, .. google didn't show me anything on this subject, so I am hoping for you :) Thanks in advance <?xml version="1.0"

Solr DIH with multi value fields and faceting

浪子不回头ぞ 提交于 2019-12-12 04:31:49
问题 I’ m using Solr to index a dataset stored in DBMS using SQL DIH. One on the table use a n-to-n relationship. Just for sake of simplicity (my app is much more complex than this) here is an example of the application: a person has a name and it has associated 0..n roles (a role is described by a role_name string). Table Person: - id: int - Name: string Table roles - id: int - role_name: string Table association - id_person: int - id_role: int Two persons could be described as: id=1, name=John

Efficiency aspect of delta import in solr

吃可爱长大的小学妹 提交于 2019-12-12 04:11:37
问题 I have data of about 2100000 rows. The time taken for full-import is about 2 minutes. For any updates in table I'm using delta import to index the updates. The time taken for delta import is 6 minutes. Considering the efficiency aspect it is better to do full import rather than delta import. So, what is the need of delta import? Is there any better way to use delta import to increase it's efficiency? I followed the steps in documentation. data-config.xml <dataConfig> <dataSource type=

Error: “Missing required field” using embedded entities in Solr's DIH Configuration File

╄→гoц情女王★ 提交于 2019-12-12 00:56:17
问题 I am trying to import multiple tables from a MySQL database using Solr's Data Import Handler (DIH). The DIH does not import data from the second table, 'detail'. My database configuration file is <document> <entity name="item" pk="ListingId" query="SELECT * FROM item as item where listingid=360245270"> <entity name="detail" pk="ListingId" query="SELECT Body FROM detail where listingid='${item.listingid}'"> <field column="Body" name="Body" /> </entity> </entity> </document> I monitored the

How to index blob field in Apache Solr indexing?

六眼飞鱼酱① 提交于 2019-12-11 11:52:04
问题 I am using Apache Solr to index my data, I have blob field which I want to be indexed too...but I dont know what is the fieldType to be declared in the 'scheme.xml'.... I tried following: " field name="abstract" type="text" indexed="true" stored="true" required="true" " but when I tried to search then that field is shown as : id, abstract, title, price, publishedDate 1, [B@1e9b7b2, Spain Consumer, 3795.0, 2009-01-19T18:30:00Z 'abstract' is my blob filed which is nothing but big string...and I