Pointing HiveServer2 to MiniMRCluster for Hive Testing

拈花ヽ惹草 提交于 2020-01-15 11:26:08

问题


I've been wanting to do Hive integration testing for some of the code that I've been developing. The two major requirements of the testing framework that I need:

  1. It needs to work with a Cloudera version of Hive and Hadoop (preferably, 2.0.0-cdh4.7.0)
  2. It needs to be all local. Meaning, the Hadoop cluster and Hive server should start on the beginning of the test, run a few queries, and teardown after the test is over.

So I broke this problem down into three parts:

  1. Getting code for the HiveServer2 part (I decided to use a JDBC connector over a Thrift service client)
  2. Getting code for building an in-memory MapReduce cluster (I decided to use MiniMRCluster for this)
  3. Setting up both (1) and (2) above to work with each other.

I was able to get (1) out of the way by looking at many resources. Some of these that were very useful are:

  • Cloudera Hadoop Google User Group
  • Hive JDBC Client Wiki

For (2), I followed this excellent post in StackOverflow:

  • Integration Testing Hive Jobs

So far, so good. At this point of time, my pom.xml in my Maven project, on including both above functionalities, looks something like this:

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.1</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
    </dependency>
    <!-- START: dependencies for getting MiniMRCluster to work -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-auth</artifactId>
        <version>2.0.0-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-test</artifactId>
        <version>2.0.0-mr1-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.0.0-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.0.0-cdh4.7.0</version>
        <classifier>tests</classifier>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.0.0-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.0.0-cdh4.7.0</version>
        <classifier>tests</classifier>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.0.0-mr1-cdh4.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.0.0-mr1-cdh4.7.0</version>
        <classifier>tests</classifier>
    </dependency>
    <!-- END: dependencies for getting MiniMRCluster to work -->

    <!-- START: dependencies for getting Hive JDBC to work -->
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-builtins</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-cli</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-metastore</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-serde</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-common</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-jdbc</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.thrift</groupId>
        <artifactId>libfb303</artifactId>
        <version>0.9.1</version>
    </dependency>
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.15</version>
    </dependency>
    <dependency>
        <groupId>org.antlr</groupId>
        <artifactId>antlr-runtime</artifactId>
        <version>3.5.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.derby</groupId>
        <artifactId>derby</artifactId>
        <version>10.10.1.1</version>
    </dependency>
    <dependency>
        <groupId>javax.jdo</groupId>
        <artifactId>jdo2-api</artifactId>
        <version>2.3-ec</version>
    </dependency>
    <dependency>
        <groupId>jpox</groupId>
        <artifactId>jpox</artifactId>
        <version>1.1.9-1</version>
    </dependency>
    <dependency>
        <groupId>jpox</groupId>
        <artifactId>jpox-rdbms</artifactId>
        <version>1.2.0-beta-5</version>
    </dependency>
    <!-- END: dependencies for getting Hive JDBC to work -->
</dependencies>

Now I'm on step (3). I tried running the following code:

@Test
    public void testHiveMiniDFSClusterIntegration() throws IOException, SQLException {
        Configuration conf = new Configuration();

        /* Build MiniDFSCluster */
        MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();

        /* Build MiniMR Cluster */
        System.setProperty("hadoop.log.dir", "/Users/nishantkelkar/IdeaProjects/" +
                "nkelkar-incubator/hive-test/target/hive/logs");
        int numTaskTrackers = 1;
        int numTaskTrackerDirectories = 1;
        String[] racks = null;
        String[] hosts = null;
        MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),
                numTaskTrackerDirectories, racks, hosts, new JobConf(conf));

        System.setProperty("mapred.job.tracker", miniMR.createJobConf(
                new JobConf(conf)).get("mapred.job.tracker"));

        try {
            String driverName = "org.apache.hive.jdbc.HiveDriver";
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

        Connection hiveConnection = DriverManager.getConnection(
                "jdbc:hive2:///", "", "");
        Statement stm = hiveConnection.createStatement();

        // now create test tables and query them
        stm.execute("set hive.support.concurrency = false");
        stm.execute("drop table if exists test");
        stm.execute("create table if not exists test(a int, b int) row format delimited fields terminated by ' '");
        stm.execute("create table dual as select 1 as one from test");
        stm.execute("insert into table test select stack(1,4,5) AS (a,b) from dual");
        stm.execute("select * from test");
    } 

My hope was that (3) would be solved by the following line of code from the above method:

    Connection hiveConnection = DriverManager.getConnection(
            "jdbc:hive2:///", "", "");

However, I'm getting the following error:

java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
    at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)

Can anyone please let me know what I need to do in addition/what I'm doing wrong to get this to work?

P.S. I looked at HiveRunner and hive_test projects as options, but I wasn't able to get these to work with Cloudera versions of Hadoop.


回答1:


Your test is failing at the first create table statement. Hive is unhelpfully suppressing the following error message:

file:/user/hive/warehouse/test is not a directory or unable to create one

Hive is attempting to use the default warehouse directory /user/hive/warehouse which doesn't exist on your filesystem. You could create the directory, but for testing you'll likely want to override the default value. For example:

import static org.apache.hadoop.hive.conf.HiveConf.ConfVars;
...
System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(), "/Users/nishantkelkar/IdeaProjects/" +
            "nkelkar-incubator/hive-test/target/hive/warehouse");


来源:https://stackoverflow.com/questions/26665768/pointing-hiveserver2-to-minimrcluster-for-hive-testing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!