hadoop copying from hdfs to S3

后端 未结 1 1048
自闭症患者
自闭症患者 2020-12-20 05:40

I\'ve successfully completed mahout vectorizing job on Amazon EMR (using Mahout on Elastic MapReduce as reference). Now I want to copy results from HDFS to S3 (to use it in

相关标签:
1条回答
  • 2020-12-20 06:24

    I've found a bug:

    1. The main problem is not

      java.net.UnknownHostException: unknown host: my.bucket

    but:

    2012-09-06 13:27:33,909 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system
    

    So. After adding 1 more slash in source path - job was started without problems. Correct command is:

    elastic-mapreduce --jobflow $JOBID \
    > --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
    > --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' \
    > --arg --src --arg 'hdfs:///my.bucket/prj1/seqfiles' \
    > --arg --dest --arg 's3://my.bucket/prj1/seqfiles'
    

    P.S. So. it is working. Job is correctly finished. I've successfully copied dir with 30Gb file.

    0 讨论(0)
提交回复
热议问题