I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don\'t know or care in advance what the property valu
You can also use an index on that property. Then for a given value retrieve all the nodes. The advantage is that you can also query for approximations of the value.
What about the following approach:
java.util.Map
containing all properties for a node. Calculate the map's hashCode()
Map
using the hashCode as key and a set of node.getId()
as valuesThis should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.
Neo4j 3.1.1
HAS is no longer supported in Cypher, please use EXISTS instead.
If you want to find nodes with specific property, the Cyper is as follows:
MATCH (n:NodeLabel) where has(n.NodeProperty) return n
With Neo4j 3.3.4 you can simply do the following:
MATCH (n) where EXISTS(n.propertyName) return n
Simply change propertyName
to whatever property you are looking to find.
You can try this one who does which I think does whatever you want.
START n=node(*), m=node(*)
WHERE
HAS(n.name) AND HAS (m.name) AND
n.name=m.name AND
ID(n) <ID(m)
RETURN n, m
http://console.neo4j.org/?id=xe6wmt
Both nodes should have a name
property. name
should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.
The best/easiest option is to do something like a local Map
. If you did something like this, you could create code like this:
GlobalGraphOperations ggo = GlobalGraphOperations.at(db);
Map<Object, Node> duplicateMap = new HashMap<Object, Node>();
for (Node node : ggo.getAllNodes()) {
Object propertyValue = node.getProperty("property");
Node existingNode = duplicateMap.get(propertyValue);
if (existingNode == null) {
duplicateMap.put(propertyValue, node);
} else {
System.out.println("Duplicate Node. First Node: " + existingNode + ", Second Node: " + node);
}
}
This would print out a list. If you needed to do more, like remove these nodes, you could do something in the else.
Do you know the property name? Will this be multiple properties, or just duplicates of a single name/value pair? If you are doing multiple properties, just create a map for each property you have.