问题
I am using TITAN 0.4, and gremlin for traversals. My requirement is to identify duplicate vertices in graph, and to merge those. There are > 15 M vertices in graph.
gremlin> g.V.has('domain').groupBy{it.domain}{it.id}.cap
==>{google.com=[4], yahoo.com=[16, 24, 20]}
I am able to group the vertices, but I need only those domains(vertices) which exists more than once.
In the above example, I need to return only ==>{yahoo.com=[16, 24, 20]}
The key "domain" is indexed, if that makes any difference.
Please help me here
回答1:
Consider use of groupCount
rather than groupBy
to save a step of counting up ids in your collected list:
g.V.has('domain').groupCount(it.domain}.cap.next().findAll{it.value>1}
I suppose this is cheaper as well on a larger traversal as you are just maintaining a counter rather than lists of identifiers.
回答2:
Old question, but did you try below to force the index?
g.V.hasNot('domain', null).groupBy{it.domain}{it.id}.cap
来源:https://stackoverflow.com/questions/30164261/gremlin-groupby-vertices-having-count-1