Avoid duplicate entires when Inserting Excel or CSV-like entries into a neo4j graph

只谈情不闲聊 提交于 2019-12-14 03:05:59

问题


I have the following .xslx file:

My software regardless tis language will return the following graph:

My software iterates line by line and on each line iteration executes the following query

MERGE (A:POINT {x:{xa},y:{ya}}) MERGE (B:POINT {x:{xb},y:{yb}}) MERGE (C:POINT {x:{xc},y:{yc}}) MERGE (A)-[:LINKS]->(B)-[:LINKS]->(C) MERGE (C)-[:LINKS]->(A)

Will this avoid by inserting duplicate entries?


回答1:


According to this question, yes it will avoid writing duplicate entries.

The query above will match any existing nodes and it will avoid to write duplicates.

A good rule of thumb is on each node that it may be a duplicate write it into a seperate MERGE query and afterwards write the merge statements for each relationship between 2 nodes.

Update

After some experiece when using asyncronous technologies such nodejs or even parallel threads you must verify that you read the next line AFTER you inserted the previous one. The reason why is because is that doing multiple insertions asyncronously may result having multiple nodes into your graph that are actually the same ones.

In node.js project of mine I read the excell file like:

const iterateWorksheet=function(worksheet,maxRows,row,callback){

process.nextTick(function(){
  //Skipping first row
  if(row==1){
    return iterateWorksheet(worksheet,maxRows,2,callback);
  }

  if(row > maxRows){
    return;
  }

  const alphas=_.range('A'.charCodeAt(0),config.excell.maxColumn.charCodeAt(0));

  let rowData={};

  _.each(alphas,(column) => {
    column=String.fromCharCode(column);
    const item=column+row;
    const key=config.excell.columnMap[column];
    if(worksheet[item] && key ){
      rowData[key]=worksheet[item].v;
    }
  });

  // The callback is the isertion over a neo4j db
  return callback(rowData,(error)=>{
    if(!error){
      return iterateWorksheet(worksheet,maxRows,row+1,callback);
    }
  });
});


 }

As you see I visit the next line when I successfully inserted the previous one. I find no way yet to serialize the inserts like most conventional RDBMS's does.

In case or web or server applications another UNTESTED approach is to use queue servers such as RabbitMQ or similar in order to queue the queries. Then the code responsimble for insertion will read from the queue so the whole isolation should be in the queue.

Furthermore ensure that all inserts are into a transaction.



来源:https://stackoverflow.com/questions/48317975/avoid-duplicate-entires-when-inserting-excel-or-csv-like-entries-into-a-neo4j-gr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!