Writing data to Hadoop

前端 未结 8 1635
悲&欢浪女
悲&欢浪女 2020-12-13 10:58

I need to write data in to Hadoop (HDFS) from external sources like a windows box. Right now I have been copying the data onto the namenode and using HDFS\'s put command to

8条回答
  •  既然无缘
    2020-12-13 11:18

    About 2 years after my last answer, there are now two new alternatives - Hoop/HttpFS, and WebHDFS.

    Regarding Hoop, it was first announced in Cloudera's blog and can be downloaded from a github repository. I have managed to get this version to talk successfully to at least Hadoop 0.20.1, it can probably talk to slightly older versions as well.

    If you're running Hadoop 0.23.1 which at time of writing still is not released, Hoop is instead part of Hadoop as its own component, the HttpFS. This work was done as part of HDFS-2178. Hoop/HttpFS can be a proxy not only to HDFS, but also to other Hadoop-compatible filesystems such as Amazon S3.

    Hoop/HttpFS runs as its own standalone service.

    There's also WebHDFS which runs as part of the NameNode and DataNode services. It also provides a REST API which, if I understand correctly, is compatible with the HttpFS API. WebHDFS is part of Hadoop 1.0 and one of its major features is that it provides data locality - when you're making a read request, you will be redirected to the WebHDFS component on the datanode where the data resides.

    Which component to choose depends a bit on your current setup and what needs you have. If you need a HTTP REST interface to HDFS now and you're running a version that does not include WebHDFS, starting with Hoop from the github repository seems like the easiest option. If you are running a version that includes WebHDFS, I would go for that unless you need some of the features Hoop has that WebHDFS lacks (access to other filesystems, bandwidth limitation, etc.)

提交回复
热议问题