Pointing HiveServer2 to MiniMRCluster for Hive Testing

问题

I've been wanting to do Hive integration testing for some of the code that I've been developing. The two major requirements of the testing framework that I need:

It needs to work with a Cloudera version of Hive and Hadoop (preferably, 2.0.0-cdh4.7.0)
It needs to be all local. Meaning, the Hadoop cluster and Hive server should start on the beginning of the test, run a few queries, and teardown after the test is over.

So I broke this problem down into three parts:

Getting code for the HiveServer2 part (I decided to use a JDBC connector over a Thrift service client)
Getting code for building an in-memory MapReduce cluster (I decided to use MiniMRCluster for this)
Setting up both (1) and (2) above to work with each other.

I was able to get (1) out of the way by looking at many resources. Some of these that were very useful are:

Cloudera Hadoop Google User Group
Hive JDBC Client Wiki

For (2), I followed this excellent post in StackOverflow:

Integration Testing Hive Jobs

So far, so good. At this point of time, my pom.xml in my Maven project, on including both above functionalities, looks something like this:

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.1</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
    </dependency>
    <!-- START: dependencies for getting MiniMRCluster to work -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-auth</artifactId>
        <version>2.0.0-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-test</artifactId>
        <version>2.0.0-mr1-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.0.0-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.0.0-cdh4.7.0</version>
        <classifier>tests</classifier>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.0.0-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.0.0-cdh4.7.0</version>
        <classifier>tests</classifier>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.0.0-mr1-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.0.0-mr1-cdh4.7.0</version>
        <classifier>tests</classifier>
    </dependency>
    <!-- END: dependencies for getting MiniMRCluster to work -->

    <!-- START: dependencies for getting Hive JDBC to work -->
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-builtins</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-cli</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-metastore</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-serde</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-common</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-jdbc</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.thrift</groupId>
        <artifactId>libfb303</artifactId>
        <version>0.9.1</version>
    </dependency>
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.15</version>
    </dependency>
    <dependency>
        <groupId>org.antlr</groupId>
        <artifactId>antlr-runtime</artifactId>
        <version>3.5.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.derby</groupId>
        <artifactId>derby</artifactId>
        <version>10.10.1.1</version>
    </dependency>
    <dependency>
        <groupId>javax.jdo</groupId>
        <artifactId>jdo2-api</artifactId>
        <version>2.3-ec</version>
    </dependency>
    <dependency>
        <groupId>jpox</groupId>
        <artifactId>jpox</artifactId>
        <version>1.1.9-1</version>
    </dependency>
    <dependency>
        <groupId>jpox</groupId>
        <artifactId>jpox-rdbms</artifactId>
        <version>1.2.0-beta-5</version>
    </dependency>
    <!-- END: dependencies for getting Hive JDBC to work -->
</dependencies>

Now I'm on step (3). I tried running the following code:

@Test
    public void testHiveMiniDFSClusterIntegration() throws IOException, SQLException {
        Configuration conf = new Configuration();

        /* Build MiniDFSCluster */
        MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();

        /* Build MiniMR Cluster */
        System.setProperty("hadoop.log.dir", "/Users/nishantkelkar/IdeaProjects/" +
                "nkelkar-incubator/hive-test/target/hive/logs");
        int numTaskTrackers = 1;
        int numTaskTrackerDirectories = 1;
        String[] racks = null;
        String[] hosts = null;
        MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),
                numTaskTrackerDirectories, racks, hosts, new JobConf(conf));

        System.setProperty("mapred.job.tracker", miniMR.createJobConf(
                new JobConf(conf)).get("mapred.job.tracker"));

        try {
            String driverName = "org.apache.hive.jdbc.HiveDriver";
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

        Connection hiveConnection = DriverManager.getConnection(
                "jdbc:hive2:///", "", "");
        Statement stm = hiveConnection.createStatement();

        // now create test tables and query them
        stm.execute("set hive.support.concurrency = false");
        stm.execute("drop table if exists test");
        stm.execute("create table if not exists test(a int, b int) row format delimited fields terminated by ' '");
        stm.execute("create table dual as select 1 as one from test");
        stm.execute("insert into table test select stack(1,4,5) AS (a,b) from dual");
        stm.execute("select * from test");
    }

My hope was that (3) would be solved by the following line of code from the above method:

    Connection hiveConnection = DriverManager.getConnection(
            "jdbc:hive2:///", "", "");

However, I'm getting the following error:

java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
    at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)

Can anyone please let me know what I need to do in addition/what I'm doing wrong to get this to work?

P.S. I looked at HiveRunner and hive_test projects as options, but I wasn't able to get these to work with Cloudera versions of Hadoop.

回答1:

Your test is failing at the first create table statement. Hive is unhelpfully suppressing the following error message:

file:/user/hive/warehouse/test is not a directory or unable to create one

Hive is attempting to use the default warehouse directory /user/hive/warehouse which doesn't exist on your filesystem. You could create the directory, but for testing you'll likely want to override the default value. For example:

import static org.apache.hadoop.hive.conf.HiveConf.ConfVars;
...
System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(), "/Users/nishantkelkar/IdeaProjects/" +
            "nkelkar-incubator/hive-test/target/hive/warehouse");

来源：https://stackoverflow.com/questions/26665768/pointing-hiveserver2-to-minimrcluster-for-hive-testing

标签

jdbc

Hive

integration-testing