问题
What procedure do i need to follow to properly add a new NameNode data directory (dfs.name.dir, dfs.namenode.name.dir) to an existing production cluster? I have added the new path to the comma-delimited list in the hdfs-site.xml file but when i try to start the namenode i get the following error:
Directory /data/nfs/dfs/nn is in an inconsistent state: storage directory does not exist or is not accessible.
In my case, i have two directories already in place and working. (/data/1/dfs/nn,/data/2/dfs/nn) When i add the new directory I can't start the namenode. When the new path is removed, it starts just fine. My fstab for the new directory looks like this:
backup-server:/hadoop_nn /data/nfs/dfs nfs tcp,soft,intr,timeo=10,retrans=10 1 2
In the above mount point I have created a folder called nn. That folder has identical ownership and permissions to the other two existing locations nn folder.
drwx------ 2 hdfs hadoop 64 Jan 22 16:30 nn
Do i need to manually replicate/copy all the files from one of the existing namenode directories or should the namenode service do that automatically when it's started?
回答1:
I believe i may have just answered my own question. I ended up copying the entire contents of one of the existing namenode directories over to the new NFS namenode directory and i was able to start the namenode. (Note that i stopped the namenode before copying to avoid problems)
cp -rp /data/1/dfs/nn /data/nfs/dfs/nn
I guess my assumption that the namenode would automatically copy the existing metadata over to the new directory was incorrect.
回答2:
In Cloudera CDH 4.5.0, that error can only occur when the following function (from Storage.java
, around line 418) returns NON_EXISTENT
. It appears in each case a warning is logged with more details, look for log lines from org.apache.hadoop.hdfs.server.common.Storage
.
In short, it appears the name node perceives it to not exist, not be a directory, not writable or otherwise threw a SecurityException
.
/**
* Check consistency of the storage directory
*
* @param startOpt a startup option.
*
* @return state {@link StorageState} of the storage directory
* @throws InconsistentFSStateException if directory state is not
* consistent and cannot be recovered.
* @throws IOException
*/
public StorageState analyzeStorage(StartupOption startOpt, Storage storage)
throws IOException {
assert root != null : "root is null";
String rootPath = root.getCanonicalPath();
try { // check that storage exists
if (!root.exists()) {
// storage directory does not exist
if (startOpt != StartupOption.FORMAT) {
LOG.warn("Storage directory " + rootPath + " does not exist");
return StorageState.NON_EXISTENT;
}
LOG.info(rootPath + " does not exist. Creating ...");
if (!root.mkdirs())
throw new IOException("Cannot create directory " + rootPath);
}
// or is inaccessible
if (!root.isDirectory()) {
LOG.warn(rootPath + "is not a directory");
return StorageState.NON_EXISTENT;
}
if (!root.canWrite()) {
LOG.warn("Cannot access storage directory " + rootPath);
return StorageState.NON_EXISTENT;
}
} catch(SecurityException ex) {
LOG.warn("Cannot access storage directory " + rootPath, ex);
return StorageState.NON_EXISTENT;
}
this.lock(); // lock storage if it exists
if (startOpt == HdfsServerConstants.StartupOption.FORMAT)
return StorageState.NOT_FORMATTED;
if (startOpt != HdfsServerConstants.StartupOption.IMPORT) {
storage.checkOldLayoutStorage(this);
}
// check whether current directory is valid
File versionFile = getVersionFile();
boolean hasCurrent = versionFile.exists();
// check which directories exist
boolean hasPrevious = getPreviousDir().exists();
boolean hasPreviousTmp = getPreviousTmp().exists();
boolean hasRemovedTmp = getRemovedTmp().exists();
boolean hasFinalizedTmp = getFinalizedTmp().exists();
boolean hasCheckpointTmp = getLastCheckpointTmp().exists();
if (!(hasPreviousTmp || hasRemovedTmp
|| hasFinalizedTmp || hasCheckpointTmp)) {
// no temp dirs - no recovery
if (hasCurrent)
return StorageState.NORMAL;
if (hasPrevious)
throw new InconsistentFSStateException(root,
"version file in current directory is missing.");
return StorageState.NOT_FORMATTED;
}
if ((hasPreviousTmp?1:0) + (hasRemovedTmp?1:0)
+ (hasFinalizedTmp?1:0) + (hasCheckpointTmp?1:0) > 1)
// more than one temp dirs
throw new InconsistentFSStateException(root,
"too many temporary directories.");
// # of temp dirs == 1 should either recover or complete a transition
if (hasCheckpointTmp) {
return hasCurrent ? StorageState.COMPLETE_CHECKPOINT
: StorageState.RECOVER_CHECKPOINT;
}
if (hasFinalizedTmp) {
if (hasPrevious)
throw new InconsistentFSStateException(root,
STORAGE_DIR_PREVIOUS + " and " + STORAGE_TMP_FINALIZED
+ "cannot exist together.");
return StorageState.COMPLETE_FINALIZE;
}
if (hasPreviousTmp) {
if (hasPrevious)
throw new InconsistentFSStateException(root,
STORAGE_DIR_PREVIOUS + " and " + STORAGE_TMP_PREVIOUS
+ " cannot exist together.");
if (hasCurrent)
return StorageState.COMPLETE_UPGRADE;
return StorageState.RECOVER_UPGRADE;
}
assert hasRemovedTmp : "hasRemovedTmp must be true";
if (!(hasCurrent ^ hasPrevious))
throw new InconsistentFSStateException(root,
"one and only one directory " + STORAGE_DIR_CURRENT
+ " or " + STORAGE_DIR_PREVIOUS
+ " must be present when " + STORAGE_TMP_REMOVED
+ " exists.");
if (hasCurrent)
return StorageState.COMPLETE_ROLLBACK;
return StorageState.RECOVER_ROLLBACK;
}
来源:https://stackoverflow.com/questions/21295848/adding-a-new-namenode-data-directory-to-an-existing-cluster