HDFS space consumed: “hdfs dfs -du /” vs “hdfs dfsadmin -report”

本秂侑毒 提交于 2020-06-12 04:59:08

问题


Which tool is the right one to measure HDFS space consumed?

When I sum up the output of "hdfs dfs -du /" I always get less amount of space consumed compared to "hdfs dfsadmin -report" ("DFS Used" line). Is there data that du does not take into account?


回答1:


Hadoop file systems provides a relabel storage, by putting a copy of data to several nodes. The number of copies is replication factor, usually it is greate then one.

Command hdfs dfs -du / shows space consume your data without replications.

Command hdfs dfsadmin -report (line DFS Used) shows actual disk usage, taking into account data replication. So it should be several times bigger when number getting from dfs -ud command.




回答2:


How HDFS Storage works in brief:

Let say replication factor = 3 (default) 
Data file size = 10GB (i.e xyz.log)
HDFS will take 10x3 = 30GB to store that file

Depending on the type of command you use, you will get different values for space occupied by HDFS (10GB vs 30GB)

If you are on latest version of Hadoop, try the following command. In my case this works very well on Hortonworks Data Platform (HDP) 2.3.* and above. This should also work on cloudera's latest platform.

hadoop fs -count -q -h -v /path/to/directory

(-q = quota, -h = human readable values, -v = verbose)

This command will show the following fields in the output. QUOTA REMAINING_QUOTA SPACE_QUOTA REMAINING_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME

Where

CONTENT_SIZE = real file size without replication (10GB) and 
SPACE_QUOTA = space occupied in HDFS to save the file (30GB)

Notes: Control replication factor here: Modify "dfs.replication" property found in hdfs-site.xml file under conf/ dir of default hadoop installation directory). Changing this using Ambari/Cloudera Manager is recommended if you have multinode cluster.

There are other commands to check storage space. E.G hadoop fsck, hadoop dfs -dus,



来源:https://stackoverflow.com/questions/33517658/hdfs-space-consumed-hdfs-dfs-du-vs-hdfs-dfsadmin-report

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!