emr | 易学教程

Need strategy advice for migrating large tables from RDS to DynamoDB

阅读更多关于 Need strategy advice for migrating large tables from RDS to DynamoDB

问题 We have a couple of mySql tables in RDS that are huge (over 700 GB), that we'd like to migrate to a DynamoDB table. Can you suggest a strategy, or a direction to do this in a clean, parallelized way? Perhaps using EMR or the AWS Data Pipeline. 回答1: You can use AWS Pipeline. There are two basic templates, one for moving RDS tables to S3 and the second for importing data from S3 to DynamoDB. You can create your own pipeline using both templates. Regards 回答2: one thing to consider with such

Apache Hive: How to convert string to timestamp?

阅读更多关于 Apache Hive: How to convert string to timestamp?

I'm trying to convert the string in REC_TIME column to a timestamp format in hive. Ex: Sun Jul 31 09:28:20 UTC 2016 => 2016-07-31 09:28:20 SELECT xxx, UNIX_TIMESTAMP(REC_TIME, "E M dd HH:mm:ss z yyyy") FROM wlogs LIMIT 10; When I execute the above SQL it returns a NULL value. Try this : select from_unixtime(unix_timestamp("Sun Jul 31 09:28:20 UTC 2016","EEE MMM dd HH:mm:ss zzz yyyy")); This works fine if your hive cluster has UTC timezone. Say suppose your server is in CST then you need to do as below to get to UTC; select to_utc_timestamp(from_unixtime(unix_timestamp("Sun Jul 31 09:28:20 UTC

Optimizing GC on EMR cluster

阅读更多关于 Optimizing GC on EMR cluster

I am running a Spark Job written in Scala on EMR and the stdout of each executor is filled with GC allocation failures. 2016-12-07T23:42:20.614+0000: [GC (Allocation Failure) 2016-12-07T23:42:20.614+0000: [ParNew: 909549K->432K(1022400K), 0.0089234 secs] 2279433K->1370373K(3294336K), 0.0090530 secs] [Times: user=0.11 sys=0.00, real=0.00 secs] 2016-12-07T23:42:21.572+0000: [GC (Allocation Failure) 2016-12-07T23:42:21.572+0000: [ParNew: 909296K->435K(1022400K), 0.0089298 secs] 2279237K->1370376K(3294336K), 0.0091147 secs] [Times: user=0.11 sys=0.01, real=0.00 secs] 2016-12-07T23:42:22.525+0000:

Get a yarn configuration from commandline

阅读更多关于 Get a yarn configuration from commandline

In EMR, is there a way to get a specific value of the configuration given the configuration key using the yarn command? For example I would like to do something like this yarn get-config yarn.scheduler.maximum-allocation-mb It's a bit non-intuitive, but it turns out the hdfs getconf command is capable of checking configuration properties for YARN and MapReduce, not only HDFS. > hdfs getconf -confKey fs.defaultFS hdfs://localhost:19000 > hdfs getconf -confKey dfs.namenode.name.dir file:///Users/chris/hadoop-deploy-trunk/data/dfs/name > hdfs getconf -confKey yarn.resourcemanager.address 0.0.0.0

Apache Hive: How to convert string to timestamp?

阅读更多关于 Apache Hive: How to convert string to timestamp?

问题 I'm trying to convert the string in REC_TIME column to a timestamp format in hive. Ex: Sun Jul 31 09:28:20 UTC 2016 => 2016-07-31 09:28:20 SELECT xxx, UNIX_TIMESTAMP(REC_TIME, "E M dd HH:mm:ss z yyyy") FROM wlogs LIMIT 10; When I execute the above SQL it returns a NULL value. 回答1: Try this : select from_unixtime(unix_timestamp("Sun Jul 31 09:28:20 UTC 2016","EEE MMM dd HH:mm:ss zzz yyyy")); This works fine if your hive cluster has UTC timezone. Say suppose your server is in CST then you need

How to install a GUI on Amazon AWS EC2 or EMR with the Amazon AMI

阅读更多关于 How to install a GUI on Amazon AWS EC2 or EMR with the Amazon AMI

I have a need to run an application that requires a GUI interface to start and configure. I also need to be able to run this application on Amazon's EC2 service and EMR service. The EMR requirement means it has to run on Amazon's Linux AMI. After extensive searching I've been unable to find any ready made solutions, in particular the requirement to run on Amazon's AMI. The closest match and most often referenced solution is here . Unfortunately it was developed on a RHEL6 instance which differs enough from Amazon's AMI that the solution does not work. I'm posting my solution below. Hopefully

Exporting Hive Table to a S3 bucket

阅读更多关于 Exporting Hive Table to a S3 bucket

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this: CREATE TABLE csvimport(id BIGINT, time STRING, log STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport; I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance. Does anyone know how to do this? user495732 Why Me Yes you have to export and import your data at the start and end of your hive session To do this you need to create a table

Get a yarn configuration from commandline

阅读更多关于 Get a yarn configuration from commandline

问题 In EMR, is there a way to get a specific value of the configuration given the configuration key using the yarn command? For example I would like to do something like this yarn get-config yarn.scheduler.maximum-allocation-mb 回答1: It's a bit non-intuitive, but it turns out the hdfs getconf command is capable of checking configuration properties for YARN and MapReduce, not only HDFS. > hdfs getconf -confKey fs.defaultFS hdfs://localhost:19000 > hdfs getconf -confKey dfs.namenode.name.dir file://

How do you make a HIVE table out of JSON data?

阅读更多关于 How do you make a HIVE table out of JSON data?

I want to create a Hive table out of some JSON data (nested) and run queries on it? Is this even possible? I've gotten as far as uploading the JSON file to S3 and launching an EMR instance but I don't know what to type in the hive console to get the JSON file to be a Hive table? Does anyone have some example command to get me started, I can't find anything useful with Google ... You'll need to use a JSON serde in order for Hive to map your JSON to the columns in your table. A really good example showing you how is here: http://aws.amazon.com/articles/2855 Unfortunately the JSON serde supplied

How to install a GUI on Amazon AWS EC2 or EMR with the Amazon AMI

阅读更多关于 How to install a GUI on Amazon AWS EC2 or EMR with the Amazon AMI

问题 I have a need to run an application that requires a GUI interface to start and configure. I also need to be able to run this application on Amazon's EC2 service and EMR service. The EMR requirement means it has to run on Amazon's Linux AMI. After extensive searching I've been unable to find any ready made solutions, in particular the requirement to run on Amazon's AMI. The closest match and most often referenced solution is here. Unfortunately it was developed on a RHEL6 instance which