Why isn't Hadoop implemented using MPI?

前端 未结 6 920
一个人的身影
一个人的身影 2021-01-30 01:41

Correct me if I\'m wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.

What are the technical reasons for this?

<
6条回答
  •  你的背包
    2021-01-30 02:32

    There is no restriction that prevents MPI programs from using local disks. And of course MPI-programs always attempt to work locally on data - in RAM or on local disk - just like all parallel applications. In MPI 2.0 (which is not a future version, it's been here for a decade) it is possible to add and remove processes dynamically, which makes it possible to implement applications which can recover from e.g. a process dying on some node.

    Perhaps hadoop is not using MPI because MPI usually requires coding in C or Fortran and has a more scientific/academic developer culture, while hadoop seems to be more driven by IT professionals with a strong Java bias. MPI is very low-level and error-prone. It allows very efficient use of hardware, RAM and network. Hadoop tries to be high-level and robust, with an efficiency penalty. MPI programming requires discipline and great care to be portable, and still requires compilation from sourcecode on each platform. Hadoop is highly portable, easy to install and allow pretty quick and dirty application development. It's a different scope.

    Still, perhaps the hadoop hype will be followed by more resource-efficient alternatives, perhaps based on MPI.

提交回复
热议问题