Finding Connected Components using Hadoop/MapReduce

后端 未结 4 1284
误落风尘
误落风尘 2021-02-06 06:11

I need to find connected components for a huge dataset. (Graph being Undirected)

One obvious choice is MapReduce. But i\'m a newbie to MapReduce and am quiet short of ti

4条回答
  •  Happy的楠姐
    2021-02-06 06:54

    You may want to look at the Pegasus project from Carnegie Mellon University. They provide an efficient - and elegant - implementation using MapReduce. They also provide binaries, samples and a very detailed documentation.

    The implementation itself is based on the Generalized Iterative Matrix-Vector multiplication (GIM-V) proposed by U Kang in 2009.

    PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations U Kang, Charalampos E. Tsourakakis, Christos Faloutsos In IEEE International Conference on Data Mining (ICDM 2009)

    EDIT: The official implementation is actually limited to 2.1 billions nodes (node id are stored as integers). I'm creating a fork on github (https://github.com/placeiq/pegasus) to share my patch and other enhancements (eg. Snappy compression).

提交回复
热议问题