Gremlin : GroupBy vertices , having count > 1

五迷三道 提交于 2019-12-11 03:17:44

问题


I am using TITAN 0.4, and gremlin for traversals. My requirement is to identify duplicate vertices in graph, and to merge those. There are > 15 M vertices in graph.

gremlin> g.V.has('domain').groupBy{it.domain}{it.id}.cap

==>{google.com=[4], yahoo.com=[16, 24, 20]}

I am able to group the vertices, but I need only those domains(vertices) which exists more than once.

In the above example, I need to return only ==>{yahoo.com=[16, 24, 20]} The key "domain" is indexed, if that makes any difference.

Please help me here


回答1:


Consider use of groupCount rather than groupBy to save a step of counting up ids in your collected list:

g.V.has('domain').groupCount(it.domain}.cap.next().findAll{it.value>1}

I suppose this is cheaper as well on a larger traversal as you are just maintaining a counter rather than lists of identifiers.




回答2:


Old question, but did you try below to force the index?

g.V.hasNot('domain', null).groupBy{it.domain}{it.id}.cap



来源:https://stackoverflow.com/questions/30164261/gremlin-groupby-vertices-having-count-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!