Finding Connected Components using Hadoop/MapReduce

后端 未结 4 1286
误落风尘
误落风尘 2021-02-06 06:11

I need to find connected components for a huge dataset. (Graph being Undirected)

One obvious choice is MapReduce. But i\'m a newbie to MapReduce and am quiet short of ti

4条回答
  •  故里飘歌
    2021-02-06 07:01

    I blogged about it for myself:

    http://codingwiththomas.blogspot.de/2011/04/graph-exploration-with-hadoop-mapreduce.html

    But MapReduce isn't a good fit for these Graph analysis things. Better use BSP (bulk synchronous parallel) for that, Apache Hama provides a good graph API on top of Hadoop HDFS.

    I've written a connected components algorithm with MapReduce here: (Mindist search)

    https://github.com/thomasjungblut/tjungblut-graph/tree/master/src/de/jungblut/graph/mapreduce

    Also a BSP version for Apache Hama can be found here:

    https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph/bsp/MindistSearch.java

    The implementation isn't as difficult as in MapReduce and it is at least 10 times faster. If you're interested, checkout the latest version in TRUNK and visit our mailing list.

    http://hama.apache.org/

    http://apache.org/hama/mail-lists.html

提交回复
热议问题