问题
I have the following .xslx file:
My software regardless tis language will return the following graph:
My software iterates line by line and on each line iteration executes the following query
MERGE (A:POINT {x:{xa},y:{ya}}) MERGE (B:POINT {x:{xb},y:{yb}}) MERGE (C:POINT {x:{xc},y:{yc}}) MERGE (A)-[:LINKS]->(B)-[:LINKS]->(C) MERGE (C)-[:LINKS]->(A)
Will this avoid by inserting duplicate entries?
回答1:
According to this question, yes it will avoid writing duplicate entries.
The query above will match any existing nodes and it will avoid to write duplicates.
A good rule of thumb is on each node that it may be a duplicate write it into a seperate MERGE query and afterwards write the merge statements for each relationship between 2 nodes.
Update
After some experiece when using asyncronous technologies such nodejs or even parallel threads you must verify that you read the next line AFTER you inserted the previous one. The reason why is because is that doing multiple insertions asyncronously may result having multiple nodes into your graph that are actually the same ones.
In node.js project of mine I read the excell file like:
const iterateWorksheet=function(worksheet,maxRows,row,callback){
process.nextTick(function(){
//Skipping first row
if(row==1){
return iterateWorksheet(worksheet,maxRows,2,callback);
}
if(row > maxRows){
return;
}
const alphas=_.range('A'.charCodeAt(0),config.excell.maxColumn.charCodeAt(0));
let rowData={};
_.each(alphas,(column) => {
column=String.fromCharCode(column);
const item=column+row;
const key=config.excell.columnMap[column];
if(worksheet[item] && key ){
rowData[key]=worksheet[item].v;
}
});
// The callback is the isertion over a neo4j db
return callback(rowData,(error)=>{
if(!error){
return iterateWorksheet(worksheet,maxRows,row+1,callback);
}
});
});
}
As you see I visit the next line when I successfully inserted the previous one. I find no way yet to serialize the inserts like most conventional RDBMS's does.
In case or web or server applications another UNTESTED approach is to use queue servers such as RabbitMQ or similar in order to queue the queries. Then the code responsimble for insertion will read from the queue so the whole isolation should be in the queue.
Furthermore ensure that all inserts are into a transaction.
来源:https://stackoverflow.com/questions/48317975/avoid-duplicate-entires-when-inserting-excel-or-csv-like-entries-into-a-neo4j-gr