Pig vs Hive vs Native Map Reduce

后端 未结 7 2227
无人及你
无人及你 2020-12-14 01:55

I\'ve basic understanding on what Pig, Hive abstractions are. But I don\'t have a clear idea on the scenarios that require Hive, Pig or native map reduce.

I went thr

7条回答
  •  误落风尘
    2020-12-14 02:43

    Hive

    Pros:

    Sql like Data-base guys love that. Good support for structured data. Currently support database schema and views like structure Support concurrent multi users, multi session scenarios. Bigger community support. Hive , Hiver server , Hiver Server2, Impala ,Centry already

    Cons: Performance degrades as data grows bigger not much to do, memory over flow issues. cant do much with it. Hierarchical data is a challenge. Un-structured data requires udf like component Combination of multiple techniques could be a nightmare dynamic portions with UTDF in case of big data etc

    Pig: Pros: Great script based data flow language.

    Cons:

    Un-structured data requires udf like component Not a big community support

    MapReudce: Pros: Dont agree with "hard to achieve join functionality", if you understand what kind of join you want to implement you can implement with few lines of code. Most of the times MR yields better performance. MR support for hierarchical data is great especially implement tree like structures. Better control at partitioning / indexing the data. Job chaining.

    Cons: Need to know api very well to get a better performance etc Code / debug / maintain

提交回复
热议问题