Defining nested entities in Solr Data Import Handler

前端 未结 2 1524
傲寒
傲寒 2020-12-11 07:53

Let me preface by mentioning that I\'ve been through everything I could find about this topic including the Solr docs and all of the SO questions.

I have a Solr inst

相关标签:
2条回答
  • 2020-12-11 08:09

    DIH does not produce nested documents. Solr supports them, but DIH can't yet generate them.

    The nested entities in DIH is to be able to merge sources and to be able to create entities based on iteration from a different source. E.g. if the outer entity reads a file for file names and inner entity loads content from those files with each file getting its own record.

    You may want to move your nested object code into the client with SolrJ for now.

    0 讨论(0)
  • 2020-12-11 08:14

    Indexing nested document in DIH is finally supported from Solr 5.1 onwards.

    https://issues.apache.org/jira/browse/SOLR-5147

    Simply adding child=true to the child entity, then Solr DIH will automagically indexes as child document.

    Example taken from JIRA (in the link above) :

    <document>
      <entity name='PARENT' query='select * from PARENT'>
        <field column='id' />
        <field column='desc' />
        <field column='type_s' />
        <entity child='true' name='CHILD' query="select * from CHILD where parent_id='${PARENT.id}'">
          <field column='id' />
          <field column='desc' />
          <field column='type_s' />
      </entity>
    </entity>
    </document>
    

    I've also decompiled DocBuilder.class in solr-dataimporthandler-5.3.0.jar, found this code snippet : -

    if (doc != null) {
        if (epw.getEntity().isChild())
        {
            childDoc = new DocWrapper();
            handleSpecialCommands(arow, childDoc);
            addFields(epw.getEntity(), childDoc, arow, vr);
            doc.addChildDocument(childDoc);
        }
        else
        {
            handleSpecialCommands(arow, doc);
            addFields(epw.getEntity(), doc, arow, vr);
        }
    }
    

    Noticed that if epw.getEntity().isChild() will return true if child="true" is set, thus it's creating a new DocWrapper and add as child document instead of simply adding the entity as a bunch of new fields.

    0 讨论(0)
提交回复
热议问题