Amazon EC2 vs. Amazon EMR [closed]

ⅰ亾dé卋堺 提交于 2019-11-28 06:28:32

EMR is a collection of EC2 instances with Hadoop (and optionally Hive and/or Pig) installed and configured on them. If you are using your cluster for running Hadoop/Hive/Pig jobs, EMR is the way to go. An EMR instance costs a little bit extra as compared to an EC2 instance. A quick check on Amazon prices today reveals that a small EC2 instances costs $0.08/hour while a small EMR instance costs $0.015/hour extra. In my opinion, it's totally worth paying that extra money to save yourself the hassle of installing and setting up Hadoop (along with Hive and Pig), creating and maintaining and AMI and using it. Moreover, EMR's version of Hadoop and Hive has some patches that are not available (atleast, not yet) on Apache Hive. If you use EC2, you will probably be using Apache Hadoop and Hive (or may be, the cloudera distributions) and wouldn't have access to those patches (like native support for S3 or commands like ALTER TABLE my_table RECOVER PARTITIONS

References:

I would suggest that you do NOT try and deploy your own Hadoop cluster, unless you have 2-3 months to spare, and you have a hadoop expert handy.

Elastic MapReduce will allow you to get started very quickly by providing a pre-configured hadoop environment. Seeing as you only have a single job, it should be fine.

In general, historically, EMR was pretty far behind the latest versions of Hadoop components, and some were missing entirely. That's the major reason for using another distribution. For example, if you wanted HBase, it wasn't in EMR, but not it is. Today, Spark is absent from EMR. EMR will generally lag.

That said, if you're not using the latest and greatest features, go with EMR.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!