Writing data to Hadoop

前端 未结 8 1669
悲&欢浪女
悲&欢浪女 2020-12-13 10:58

I need to write data in to Hadoop (HDFS) from external sources like a windows box. Right now I have been copying the data onto the namenode and using HDFS\'s put command to

8条回答
  •  猫巷女王i
    2020-12-13 11:06

    For the problem of loading the data I needed to put into HDFS, I choose to turn the problem around.

    Instead of uploading the files to HDFS from the server where they resided, I wrote a Java Map/Reduce job where the mapper read the file from the file server (in this case via https), then write it directly to HDFS (via the Java API).

    The list of files is read from the input. I then have an external script that populates a file with the list of files to fetch, uploads the file into HDFS (using hadoop dfs -put), then start the map/reduce job with a decent number of mappers.

    This gives me excellent transfer performance, since multiple files are read/written at the same time.

    Maybe not the answer you were looking for, but hopefully helpful anyway :-).

提交回复
热议问题