Spark connection to Hbase in Kerberos environement failing

送分小仙女□ 提交于 2019-12-24 18:44:54

问题


I am using

  1. Spark 1.6.0(spark-1.2.0-cdh5.10.2)
  2. cloudera vm (spark-1.2.0-cdh5.10.2)
  3. Hbase (1.2.0 from cloudera)
  4. Scala 2.10
  5. Kerberos enabled

The steps I am running are:

  1. kinit (So that my user will be in place)

  2. 2.

spark-shell --master yarn --executor-memory 256m --jars /opt/cloudera/parcels/CDH/lib/hbase/lib/hbase-spark-1.2.0-cdh5.10.2.jar

3.

```

import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.spark.SparkContext
import org.apache.hadoop.hbase.{ CellUtil, TableName, HBaseConfiguration }
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Get
import org.apache.hadoop.hbase.client.Result
import org.apache.spark.SparkConf
import org.apache.hadoop.hbase.client.Scan




val tableName = "web-table"

 val scan = new Scan()
 scan.setCaching(100)

//sc.setLogLevel("DEBUG")
val conf = HBaseConfiguration.create()

conf.set("hbase.zookeeper.quorum", "quickstart.cloudera");
conf.set("hbase.client.retries.number", Integer.toString(1));
conf.set("zookeeper.session.timeout", Integer.toString(60000));
conf.set("zookeeper.recovery.retry", Integer.toString(20))

val hbaseContext = new HBaseContext(sc, conf)
val getRdd = hbaseContext.hbaseRDD(TableName.valueOf(tableName), scan)


getRdd.take(1)

```

The above code fails with the following stack trace

```

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions:
Wed Feb 07 20:30:27 PST 2018, RpcRetryingCaller{globalStartTime=1518064227140, pause=100, retries=1}, java.io.IOException: Broken pipe

    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)
    at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
    at java.io.DataOutputStream.flush(DataOutputStream.java:123)
    at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:278)
    at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:921)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:874)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
    at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:400)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:65)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:381)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:355)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)

```

If anyone has seen this error and knows the solution please let me know.

Also worth mentioning are

  1. I have tried providing --principal and --keytab to the spark application
  2. I have provided more configs like jaas in HBase configs.

In debug mode however this error looks really suspicious which makes me wonder if zookeeper interaction to Hbase via spark is something wrong.

18/02/07 20:51:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

Here is another way to reproduce the same issue

kinit...

```

 spark-submit --master yarn  --executor-memory 256m --class org.apache.hadoop.hbase.spark.example.hbasecontext.HBaseDistributedScanExample /opt/cloudera/parcels/CDH/lib/hbase/lib/hbase-spark-1.2.0-cdh5.10.2.jar web-table

```

The code gets stuck at the scan region phase for some time before failing with broken pipe error

Some more zookeeper related messages. Which may be interesting to experts out there.

```

18/02/07 20:51:05 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/cloudera/Desktop
18/02/07 20:51:05 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x1c053d8a0x0, quorum=localhost:2181, baseZNode=/hbase
18/02/07 20:51:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
18/02/07 20:51:05 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:49815, server: localhost/127.0.0.1:2181
18/02/07 20:51:05 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x161735f4d4700d4, negotiated timeout = 60000
18/02/07 20:51:05 INFO util.RegionSizeCalculator: Calculating region sizes for table "web-table".
18/02/07 20:51:53 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x161735f4d4700d4
18/02/07 20:51:53 INFO zookeeper.ZooKeeper: Session: 0x161735f4d4700d4 closed
18/02/07 20:51:53 INFO zookeeper.ClientCnxn: EventThread shut down

```


回答1:


This problem is simply because Cloudera VM enable kerberos does not go all the way to make the system ready.

So the solution(easy way) is to Spark-> Configruation -> HBase Service -> HBase(Default is None)

This step ends up adding bunch of configuration that lets spark talk to HBase via kerberos.



来源:https://stackoverflow.com/questions/48677801/spark-connection-to-hbase-in-kerberos-environement-failing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!