What is the HDFS Location on Hadoop?

前端 未结 1 1793
梦如初夏
梦如初夏 2021-01-12 05:37

I am trying to run the WordCount example in Hadoop after following some online tutorials. However what\'s not clear to me as where does the file get copied from our local fi

相关标签:
1条回答
  • 2021-01-12 05:57

    This is set in the dfs.datanode.data.dir property, which defaults to file://${hadoop.tmp.dir}/dfs/data (see details here).

    However, in your case, the problem is that you are not using the full path within HDFS. Instead, do:

    hadoop fs -ls /usr/local/myhadoop-tmp/
    

    Note that, you also seem to be confusing the path within HDFS to the path in your local file system. Within HDFS, your file is in /usr/local/myhadoop-tmp/. In your local system (and given your configuration setting), it is under /usr/local/myhadoop-tmp/dfs/data/; in there, there's a directory structure and naming convention defined by HDFS, that is independent to whatever path in HDFS you decide to use. Also, it won't have the same name, since it is divided into blocks and each block is assigned a unique ID; the name of a block is something like blk_1073741826.

    To conclude: the local path used by the datanode is NOT the same as the paths you use in HDFS. You can go into your local directory looking for files, but you should not do this, since you could mess up the HDFS metadata management. Just use the hadoop command-line tools to copy/move/read files within HDFS, using any logical path (in HDFS) that you wish to use. These paths within HDFS do not need to be tied to the paths you used in for your local datanode storage (there is no reason to or advantage of doing this).

    0 讨论(0)
提交回复
热议问题