How to get the HDFS server metadata information in a java client?

问题

I need to build a utility class to test the connection to HDFS. The test should display the sever side version of HDFS and any other metadata. Although, there are lot of client demos available but nothing on extracting the server metadata. Could anybody help?

Please note that my client is a remote java client and dont have the hadoop and HDFS config files to initialise the configuration. I need to do it by connecting to the HDFS name node service using its URL on the fly.

回答1:

Hadoop exposes some information over HTTP you can use. See Cloudera's article. Probably the easiest way would be to connect to the NN UI and parse the content returned by the server:

URL url = new URL("http://myhost:50070/dfshealth.jsp");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
...

On the other hand if you know the address of the NN and JT you can connect to them with a simple client like this (Hadoop 0.20.0-r1056497):

import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSClient;
import org.apache.hadoop.hdfs.protocol.ClientProtocol;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.hdfs.protocol.FSConstants.DatanodeReportType;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.util.VersionInfo;

public class NNConnTest {

    private enum NNStats {

        STATS_CAPACITY_IDX(0, 
                "Total storage capacity of the system, in bytes: ");
        //... see org.apache.hadoop.hdfs.protocol.ClientProtocol 

        private int id;
        private String desc;

        private NNStats(int id, String desc) {
            this.id = id;
            this.desc = desc;
        }

        public String getDesc() {
            return desc;
        }

        public int getId() {
            return id;
        }

    }

    private enum ClusterStats {

        //see org.apache.hadoop.mapred.ClusterStatus API docs
        USED_MEM {
            @Override
            public String getDesc() {
                String desc = "Total heap memory used by the JobTracker: ";
                return desc + clusterStatus.getUsedMemory();
            }
        };

        private static ClusterStatus clusterStatus;
        public static void setClusterStatus(ClusterStatus stat) {
            clusterStatus = stat;
        }

        public abstract String getDesc();
    }


    public static void main(String[] args) throws Exception {

        InetSocketAddress namenodeAddr = new InetSocketAddress("myhost",8020);
        InetSocketAddress jobtrackerAddr = new InetSocketAddress("myhost",8021);

        Configuration conf = new Configuration();

        //query NameNode
        DFSClient client = new DFSClient(namenodeAddr, conf);
        ClientProtocol namenode = client.namenode;
        long[] stats = namenode.getStats();

        System.out.println("NameNode info: ");
        for (NNStats sf : NNStats.values()) {
            System.out.println(sf.getDesc() + stats[sf.getId()]);
        }

        //query JobTracker
        JobClient jobClient = new JobClient(jobtrackerAddr, conf); 
        ClusterStatus clusterStatus = jobClient.getClusterStatus(true);

        System.out.println("\nJobTracker info: ");
        System.out.println("State: " + 
                clusterStatus.getJobTrackerState().toString());

        ClusterStats.setClusterStatus(clusterStatus);
        for (ClusterStats cs : ClusterStats.values()) {
            System.out.println(cs.getDesc());
        }

        System.out.println("\nHadoop build version: " 
                + VersionInfo.getBuildVersion());

        //query Datanodes
        System.out.println("\nDataNode info: ");
        DatanodeInfo[] datanodeReport = namenode.getDatanodeReport(
                DatanodeReportType.ALL);
        for (DatanodeInfo di : datanodeReport) {
            System.out.println("Host: " + di.getHostName());
            System.out.println(di.getDatanodeReport());
        }

    }

}

Make sure that your client should use the same Hadoop version as your cluster does otherwise EOFException can occur.

回答2:

All hadoop nodes exposes the JMX interface and one of the features you can get via the JMX is the version. Good way to start is to run the Hadoop on your localhost and jconsole and connect to some node and explore the interface and copy&past the object names of MBeans. Unfortunately, there is nearly no documentation about Hadoop's JMX iface.

btw. NameNode provides the most useful information.

来源：https://stackoverflow.com/questions/12366482/how-to-get-the-hdfs-server-metadata-information-in-a-java-client

标签

java

Hadoop

connection

client

HDFS