Difference between HBase and Hadoop/HDFS

后端 未结 6 560
悲&欢浪女
悲&欢浪女 2020-12-02 03:31

This is kind of naive question but I am new to NoSQL paradigm and don\'t know much about it. So if somebody can help me clearly understand difference between the HBase and H

6条回答
  •  生来不讨喜
    2020-12-02 04:25

    Apache Hadoop project includes four key modules

    1. Hadoop Common: The common utilities that support the other Hadoop modules.
    2. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
    3. Hadoop YARN: A framework for job scheduling and cluster resource management.
    4. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

    HBase is A scalable, distributed database that supports structured data storage for large tables. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

    When to use HBase:

    1. If your application has a variable schema where each row is slightly different
    2. If you find that your data is stored in collections, that is all keyed on the same value
    3. If you need random, real time read/write access to your Big Data.
    4. If you need key based access to data when storing or retrieving.
    5. If you have huge amount of data with existing Hadoop cluster

    But HBase has some limitations

    1. It can't be used for classic transactional applications or even relational analytics.
    2. It is also not a complete substitute for HDFS when doing large batch MapReduce.
    3. It doesn’t talk SQL, have an optimizer, support cross record transactions or joins.
    4. It can't be used with complicated access patterns (such as joins)

    Summary:

    Consider HBase when you’re loading data by key, searching data by key (or range), serving data by key, querying data by key or when storing data by row that doesn’t conform well to a schema.

    Have a look at Do's and Don't of HBase from cloudera blog.

提交回复
热议问题