问题
I need to build a utility class to test the connection to HDFS. The test should display the sever side version of HDFS and any other metadata. Although, there are lot of client demos available but nothing on extracting the server metadata. Could anybody help?
Please note that my client is a remote java client and dont have the hadoop and HDFS config files to initialise the configuration. I need to do it by connecting to the HDFS name node service using its URL on the fly.
回答1:
Hadoop exposes some information over HTTP you can use. See Cloudera's article.
Probably the easiest way would be to connect to the NN
UI and parse the content
returned by the server:
URL url = new URL("http://myhost:50070/dfshealth.jsp");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
...
On the other hand if you know the address of the NN and JT you can connect to them with a simple client like this (Hadoop 0.20.0-r1056497):
import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSClient;
import org.apache.hadoop.hdfs.protocol.ClientProtocol;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.hdfs.protocol.FSConstants.DatanodeReportType;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.util.VersionInfo;
public class NNConnTest {
private enum NNStats {
STATS_CAPACITY_IDX(0,
"Total storage capacity of the system, in bytes: ");
//... see org.apache.hadoop.hdfs.protocol.ClientProtocol
private int id;
private String desc;
private NNStats(int id, String desc) {
this.id = id;
this.desc = desc;
}
public String getDesc() {
return desc;
}
public int getId() {
return id;
}
}
private enum ClusterStats {
//see org.apache.hadoop.mapred.ClusterStatus API docs
USED_MEM {
@Override
public String getDesc() {
String desc = "Total heap memory used by the JobTracker: ";
return desc + clusterStatus.getUsedMemory();
}
};
private static ClusterStatus clusterStatus;
public static void setClusterStatus(ClusterStatus stat) {
clusterStatus = stat;
}
public abstract String getDesc();
}
public static void main(String[] args) throws Exception {
InetSocketAddress namenodeAddr = new InetSocketAddress("myhost",8020);
InetSocketAddress jobtrackerAddr = new InetSocketAddress("myhost",8021);
Configuration conf = new Configuration();
//query NameNode
DFSClient client = new DFSClient(namenodeAddr, conf);
ClientProtocol namenode = client.namenode;
long[] stats = namenode.getStats();
System.out.println("NameNode info: ");
for (NNStats sf : NNStats.values()) {
System.out.println(sf.getDesc() + stats[sf.getId()]);
}
//query JobTracker
JobClient jobClient = new JobClient(jobtrackerAddr, conf);
ClusterStatus clusterStatus = jobClient.getClusterStatus(true);
System.out.println("\nJobTracker info: ");
System.out.println("State: " +
clusterStatus.getJobTrackerState().toString());
ClusterStats.setClusterStatus(clusterStatus);
for (ClusterStats cs : ClusterStats.values()) {
System.out.println(cs.getDesc());
}
System.out.println("\nHadoop build version: "
+ VersionInfo.getBuildVersion());
//query Datanodes
System.out.println("\nDataNode info: ");
DatanodeInfo[] datanodeReport = namenode.getDatanodeReport(
DatanodeReportType.ALL);
for (DatanodeInfo di : datanodeReport) {
System.out.println("Host: " + di.getHostName());
System.out.println(di.getDatanodeReport());
}
}
}
Make sure that your client should use the same
Hadoop version as your cluster does otherwise EOFException
can occur.
回答2:
All hadoop nodes exposes the JMX interface and one of the features you can get via the JMX is the version. Good way to start is to run the Hadoop on your localhost and jconsole and connect to some node and explore the interface and copy&past the object names of MBeans. Unfortunately, there is nearly no documentation about Hadoop's JMX iface.
btw. NameNode
provides the most useful information.
来源:https://stackoverflow.com/questions/12366482/how-to-get-the-hdfs-server-metadata-information-in-a-java-client