Correct me if I\'m wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.
What are the technical reasons for this?
<
There is no restriction that prevents MPI programs from using local disks. And of course MPI-programs always attempt to work locally on data - in RAM or on local disk - just like all parallel applications. In MPI 2.0 (which is not a future version, it's been here for a decade) it is possible to add and remove processes dynamically, which makes it possible to implement applications which can recover from e.g. a process dying on some node.
Perhaps hadoop is not using MPI because MPI usually requires coding in C or Fortran and has a more scientific/academic developer culture, while hadoop seems to be more driven by IT professionals with a strong Java bias. MPI is very low-level and error-prone. It allows very efficient use of hardware, RAM and network. Hadoop tries to be high-level and robust, with an efficiency penalty. MPI programming requires discipline and great care to be portable, and still requires compilation from sourcecode on each platform. Hadoop is highly portable, easy to install and allow pretty quick and dirty application development. It's a different scope.
Still, perhaps the hadoop hype will be followed by more resource-efficient alternatives, perhaps based on MPI.