When to use Hadoop, HBase, Hive and Pig?

前端 未结 16 1165
时光说笑
时光说笑 2020-12-04 04:21

What are the benefits of using either Hadoop or HBase or Hive ?

From my understanding, HBase avoi

16条回答
  •  再見小時候
    2020-12-04 05:04

    I implemented a Hive Data platform recently in my firm and can speak to it in first person since I was a one man team.

    Objective

    1. To have the daily web log files collected from 350+ servers daily queryable thru some SQL like language
    2. To replace daily aggregation data generated thru MySQL with Hive
    3. Build Custom reports thru queries in Hive

    Architecture Options

    I benchmarked the following options:

    1. Hive+HDFS
    2. Hive+HBase - queries were too slow so I dumped this option

    Design

    1. Daily log Files were transported to HDFS
    2. MR jobs parsed these log files and output files in HDFS
    3. Create Hive tables with partitions and locations pointing to HDFS locations
    4. Create Hive query scripts (call it HQL if you like as diff from SQL) that in turn ran MR jobs in the background and generated aggregation data
    5. Put all these steps into an Oozie workflow - scheduled with Daily Oozie Coordinator

    Summary

    HBase is like a Map. If you know the key, you can instantly get the value. But if you want to know how many integer keys in Hbase are between 1000000 and 2000000 that is not suitable for Hbase alone.

    If you have data that needs to be aggregated, rolled up, analyzed across rows then consider Hive.

    Hopefully this helps.

    Hive actually rocks ...I know, I have lived it for 12 months now... So does HBase...

提交回复
热议问题