HBase and Hadoop

 ̄綄美尐妖づ 提交于 2019-12-13 02:31:19

问题


HBase requires Hadoop installation based on what I read so far. And it looks like HBase can be set up to use existing Hadoop cluster (which is shared with some other users) or it can be set up to use dedicated Hadoop cluster? I guess the latter would be a safer configuration but I am wondering if anybody has any experience on the former (but then I am not very sure my understanding of HBase setup is correct or not).


回答1:


I know that Facebook and other large organizations separate their HBase cluster (real time access) from their Hadoop cluster (batch analytics) for performance reasons. Large MapReduce jobs on the cluster have the ability to impact performance of the real-time interface, which can be problematic.

In a smaller organization or in a situation in which your HBase response time doesn't necessarily need to be consistent, you can just use the same cluster.

There aren't many (or any) concerns with coexistence other than performance concerns.




回答2:


We've set it up with an existing Hadoop cluster that's 1,000 cores strong. Short answer: it works just fine, at least with Cloudera CH2 +149.88. But by Hadoop version, your mileage may vary.




回答3:


In a distributed mode Hadoop is used for its HDFS storage. HBase will store HFile on HDFS, and thus get benefits from replication strategies and data-locality principles brought by datanodes.

RegionServer are about to basically handle local data, but still might have to fetch data from other datanodes.

Hope that will help you to understand why and how hadoop is used with HBase.



来源:https://stackoverflow.com/questions/9935863/hbase-and-hadoop

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!