Apache Helix vs YARN

点点圈 提交于 2019-12-03 03:51:32

问题


What is the difference between Apache Helix and Hadoop YARN (MRv2). Does anyone have experience with both technologies? Can someone explain me the advantages/disadvantages of Helix over YARN and why the LinkedIn guys developed their own cluster management instead of using YARN?

Thanks in advance Tobi


回答1:


While Helix and YARN both provide capabilities to manage distributed applications, there are important differences between the two.

YARN primarily provides resource management capabilities across a cluster of machines while requiring applications to write their custom logic to negotiate resources from the resource manager. On the other hand, Helix provides a way of declaratively managing the state of distributed applications, thus freeing the applications from having to do a custom implementation. At this time, Helix does not provide resource management capabilities in the same way as YARN. Thus the two systems are quite complementary.

As an illustration, assume you have a set of nodes and you want to start some containers on them.

  1. Allocate containers among nodes based on the resource utilization
  2. start containers,
  3. monitor container, if they die restart containers

YARN provides the framework/machinery to do the above. Once you have the containers, you have to implement the following features:

  1. Partitioning and Replication: You need to distribute tasks to containers, possibly allocate multiple tasks to each container. For redundancy you might chose to allocate a task to multiple containers.
  2. State management: manage the state of the task
  3. Fault Tolerance: When a container fails you might either chose to redistribute work among remaining containers or restart the container depending on SLA requirement.
  4. Cluster expansion: You might start new containers to handle the workload, then you want the task to be re-distributed.
  5. Throttling: During all these operations you might want to limit some operations like data movement

Helix makes it easy to achieve the above features. In YARN one needs to write the application master to achieve these (A example of such implementation is the Application master for hadoop map reduce jobs).

Helix was developed at LinkedIn to manage distributed data systems in the online/nearline space. In this space once a container is launched it runs for ever until it crashes. When a container fails, tasks might be redistributed among remaining containers.

YARN comes with resource scheduling algorithms that allows flexible and efficient utilization of the available hardware for short lived tasks like the map reduce jobs.



来源:https://stackoverflow.com/questions/16401412/apache-helix-vs-yarn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!