neo4j : CALL APOC.LOAD.HTML with HEADERs

时光怂恿深爱的人放手 提交于 2020-02-05 01:11:49

问题


I have the following table

<table>
    <tr>
        <th> header 1</th>
        <th> header 2</th>
        <th> header 3</th>
    <tr>
        <td> keyword1 </td>
        <td> value1.2 </td>
        <td>
            <p> paragraph 1 </p>
        </td>
    </tr>
    <tr>
        <td> keyword2 </td>
        <td> value2.2 </td>
        <td>
            <p> paragraph 2 </p>
            <p> paragraph 3 </p>
        </td>
    </tr>
    <tr>
        <td> keyword3 </td>
        <td> value3.2</td>
        <td>
            <p> paragraph 1 </p>
            <p> paragraph 3 </p>
            <p> </p>
        </td>
    </tr>
</table>

What method you suggest to use to load it via apoc.load.html and apoc.node.create or apoc.node.merge so that headers are used dynamically as node properties names?

It should create dynamic equivalent to the below static code:

MERGE(:node {name:keyword1, header2:value1.2})-[:R]->(:header3 {name:paragrap1})

MERGE(:node {name:keyword2, header2:value2.2})-[:R]->(:header3 {name:paragrap2})
MERGE(:node {name:keyword2, header2:value2.2})-[:R]->(:header3 {name:paragrap3})

MERGE(:node {name:keyword3, header2:value3.2})-[:R]->(:header3 {name:paragrap1})
MERGE(:node {name:keyword3, header2:value3.2})-[:R]->(:header3 {name:paragrap3})

I wrote the code below ...

// 999. SAMPLE CODE
CALL apoc.load.html("file:///C:/Users/sesa407003/Desktop/CURRENT%20PROJECTS/NEO4J/doc_start.html",{line: "table tr"}) yield value as lineList

CALL apoc.load.html("file:///doc_start.html",{header: "table tr th"}) yield value as headersList

UNWIND range(1, length(lineList.line) -1) as j
//with j,i,source
CALL apoc.load.html("file:///doc_start.html",{value: "table tr:eq("+j+") td"}) yield value as valueList
CALL apoc.merge.node(["node"], {name:valueList.value[2].text}) yield node as source
UNWIND range(0,length(headersList.header)-2) as i
CALL apoc.create.setProperties(source,[headersList.header[i].text],[valueList.value[i].text]) yield node
CALL apoc.load.html("file:///doc_start.html",{paragraphs: "table tr:eq("+j+") td:eq(2) p"}) yield value as paragraphsList
UNWIND paragraphsList.paragraphs as paragraph
MERGE(target:dashboard {name:paragraph.text})
MERGE(source)-[:R]->(target)
return *

It seems to work ... but when I try to remove empty paragraphs, like the last on keyword3 ... I don't find the right syntax for WHERE or CASE WHEN or apoc.case.when


回答1:


I took a look at your cypher and made a few changes. I expect it gets you closer to your end state.

To remove the empty paragraph I added this little WITH block

WITH source, paragraph.text AS para
WHERE trim(para) <> ""

I also changed a few of the array indexes to get the right data from the table.

CALL apoc.load.html("file:///table.html",{line: "table tr"}) yield value as lineList
CALL apoc.load.html("file:///table.html",{header: "table tr th"}) yield value as headersList
UNWIND range(1, size(lineList.line) - 1) as j
CALL apoc.load.html("file:///table.html",{value: "table tr:eq("+j+") td"}) yield value as valueList
CALL apoc.merge.node(["node"], {name:valueList.value[0].text}) yield node as source
UNWIND range(0,size(headersList.header)-2) as i
CALL apoc.create.setProperties(source,[headersList.header[i].text],[valueList.value[i].text]) yield node
CALL apoc.load.html("file:///table.html",{paragraphs: "table tr:eq("+j+") td:eq(2) p"}) yield value as paragraphsList
UNWIND paragraphsList.paragraphs as paragraph
WITH source, paragraph.text AS para
WHERE trim(para) <> ""
MERGE(target:dashboard {name:para})
MERGE(source)-[:R]->(target)
RETURN *


来源:https://stackoverflow.com/questions/59863758/neo4j-call-apoc-load-html-with-headers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!