Solr DIH with multi value fields and faceting

浪子不回头ぞ 提交于 2019-12-12 04:31:49

问题


I’ m using Solr to index a dataset stored in DBMS using SQL DIH. One on the table use a n-to-n relationship. Just for sake of simplicity (my app is much more complex than this) here is an example of the application: a person has a name and it has associated 0..n roles (a role is described by a role_name string).

Table Person:
- id: int
- Name: string

Table roles
- id: int
- role_name: string

Table association
- id_person: int
- id_role: int

Two persons could be described as:

id=1, name=John Doe, roles=[programmer, father, soccer player]
id=2, name= Eric Smith, roles=[]

Here what I would like to achieve with solr.

  1. Import the data with DIH (may be using a nested sql query?)
  2. Query and present the data with all the person info + the person’s roles
  3. Be able to query using a given role, e.g. tell me all the person with role=programmer?
  4. Set up faceting, to create a list of all roles, each one with the number of occurrences in the whole datasets

I expect this to be possible with solr (I am using version 6.4, but I can easily upgrade to latest 6.5). Does anybody can explain how to do it or point to proper information/tutorial?

Thanks

UMG


回答1:


Yes It is possible in Solr.

I assume a single person don't have huge number of role
You can create your solr schema like the below one :

<field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
<field name="name" type="string" indexed="false" stored="true"/>
<field name="roles" type="strings" indexed="true" stored="true"/>
<field name="cfname" type="text_general" indexed="true" stored="false" multiValued="false"/>
<copyField source="name" dest="cfname"/>

Here roles is a multivalued field.

Now you can query with person name : q=cfname:John

http://solr_node:8983/solr/collection_name/select?q=cfname%3AJohn

And list of all roles, each one with the number of occurrences in the whole datasets : q=*:*, facet=true, facet.field=roles and rows=0

http://solr_node:8983/solr/collection_name/select?q=*%3A*&rows=0&facet=true&facet.field=roles



回答2:


some tricky things you need to take into account:

  1. you define roles as multivalued

     <field name="roles" type="string" indexed="true" stored="true" multiValued="true"/>
    
  2. in the DIH setup, for optimal performance, do it like this (this is for mysql, do modify as needed for you DB): left join so you run a single query (much faster than running an inner query per person), and use sql GROUP BY, and a transformer to massage roles into the multivalued field:

     <entity name="person" pk="id" transformer="RegexTransformer" query="
        SELECT p.id... GROUP_CONCAT(DISTINCT COALESCE(r.name,'') SEPARATOR '|') AS roles FROM person p LEFT JOIN association a ON p.id_person = a.id_role LEFT JOIN roles r ON a.id_role=r.id 
        WHERE ...
        GROUP BY p.id, ...
            ">
        <field column="roles" name="roles" splitBy="\|"/>
    </entity>
    

This is mostly for optimal indexing perf. Once you have it indexed, the queries you want to run are pretty basic.

The conf above is hand written and not tested, there might be some typo etc, but hope you get the gist of it.



来源:https://stackoverflow.com/questions/43324726/solr-dih-with-multi-value-fields-and-faceting

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!