Hadoop Namenode Metadata - fsimage and edit logs

落爺英雄遲暮 提交于 2019-12-05 00:14:20


I understand that the fsimage is loaded into the memory on startup and any further transactions are added to the edit log rather than to the fsimage for performance reasons.

The fsimage in memory gets refreshed when the namenode is restarted. For efficiency, secondary name node periodically does a checkpoint to update the fsimage so that the namenode recovery is faster. All these are fine.

But one point which i fail to understand is this, Lets say that a file already exists and the info about this file is in the fsimage in memory. Now i move this file to a different location, which is updated in the edit log. Now when i try to list the old file path, it complains thats it does not exists or whatever.

Does this mean that namenode looks at the edit log as well which is contradictory to the purpose of the fsimage in memory? or how does it know that the file location has changed?


Answer is by looking at information in the edit logs. If information is not available in the edit logs This question stands true for use-case when we write the new file to hdfs. While your namenode is running if you remove fsimage file and try to read the hdfs file it is able to read.

Removing the fsimage file from the running namenode will not cause issue with the read / write operations. When we restart the namenode, there will be errors stating that image file is not found.

Let me try to give some more explanation to help you out.

Only on start up hadoop looks fsimage file, in case if it is not there, namenode does not come up and log for formatting the namenode.

hadoop format -namenode command creates fsimage file (if edit logs are present). After namenode startup file metadata is fetched from edit logs (and if not found information in edit logs searched thru fsimage file). so fsimage just works as checkpoint where inforamtion is saved last time. This is also one of the reason secondary node keeps on sync (after 1 hour / 1 milliion transactions) from edit logs so that on start up from last checkpoint not much needs to be synced.

if you will turn the safemode ( command : hdfs dfsadmin -safemode enter) on and will use saveNamespace (command : hdfs dfsadmin -saveNamespace), it will show below mentioned log message.

2014-07-05 15:03:13,195 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /data/hadoop-namenode-data-temp/current/fsimage.ckpt_0000000000000000169 using no compression
2014-07-05 15:03:13,205 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file /data/hadoop-namenode-data-temp/current/fsimage.ckpt_0000000000000000169 of size 288 bytes saved in 0 seconds.
2014-07-05 15:03:13,213 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 0
2014-07-05 15:03:13,237 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 170


The entire file system namespace, including the "mapping of blocks to files" and file system properties, is stored in a file called the FsImage.Remember "mapping of blocks to files" is a part of FsImage.This is stored both in memory and on disk.Along with FsImage, Hadoop will also store in memory, block to datanode mapping through block reports while the name node is (re)started and periodically.So when you move a file to a different location, this will be tracked in the edit log on disk and also when a block report is sent by data node to namenode, namenode will get an up-to-date view of where blocks are located on the cluster.So that way, you will not be able to see the data in old path since block report has updated "mapping of blocks to datanodes".But remember the update has happened only in the memory.Now after a certain amount of time, either in checkpointing or when a name node is restarted, editlogs on disk which already have the updates that you have done(in your case movement of file) will get merged with the old FsImage on disk and creates a new FsImage.Now this updated FsImage will be loaded into memory and the same process repeats.