HBase connection pooling for very frequent scanning of row

不羁岁月 提交于 2020-06-13 05:39:21

问题


I have to scan the table for row very frequent(~million times) in an hour. I have the information about rowid(which is a byte array). I am creating rowid for creating startrow and endrow which are essentially the same in my case.

     public String someMethod(byte[] rowid){
            if (aTable == null) {
                  aTable = new HTable(Config.getHadoopConfig(),
                  Config.getATable());     
            }
            byte[] endRow = new byte[rowId.length];
            endrow = System.copyArray(rowId, 0, endRow, 0, rowId.length)
            Scan scan = new Scan(rowId , endRow)
            //scanner implementation and iteration over the result
            (ResultScanner result = aTable.getScanner(scan);) {
                   for (Result item : result) {

                   }
            }
     }

I am wondering whether I can implement some connection pooling for improving the performance. IS there any pooling mechanism available in HBase Java API. I am using 0.96.x version of HBase. Also, is there any configuration setting which can improve the performance. Thanks


回答1:


Taken From http://hbase.apache.org/book.html

Connection Pooling

For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads in a single JVM), you can pre-create an HConnection, as shown in the following example:

Example 9.1. Pre-Creating a HConnection

// Create a connection to the cluster.
HConnection connection = HConnectionManager.createConnection(Configuration);
HTableInterface table = connection.getTable("myTable");
// use table as needed, the table returned is lightweight
table.close();
// use the connection for other access to the cluster
connection.close();



回答2:


Connection pooling API has been changed since version 1.0.

New API code for reference of readers:

// Create a connection to the cluster.
Configuration conf = HBaseConfiguration.create();
try (Connection connection = 
  ConnectionFactory.createConnection(conf);
  Table table = connection.getTable(TableName.valueOf(tablename))) {
// use table as needed, the table returned is lightweight
}



回答3:


Connection is thread safe, and very heavy-weight(this includes the zookeeper and socket connections etc), Hence it should be created only once per application and shared across threads. Table is light weight and but NOT thread-safe.Only one thread can use a table instance and hence it is better to use HBaseconfiguration instance when using Table instances. Using HBaseConfiguration, will ensure the sharing of the Zookeeper and socket instances to the Region servers as well.

Sample Code:

Configuration config = HBaseConfiguration.create();
config.addResource("hbase-site.xml");
try{
   Connection connection = 
   ConnectionFactory.createConnection(config);
   Table table = connection.getTable(TableName.valueOf("tableName"));
   Get getVal = new Get(Bytes.toBytes("rowkey"));
   Result result = table.get(getVal);
   byte [] value = 
   result.getValue(Bytes.toBytes("cf"),Bytes.toBytes("dataCol"));
}



回答4:


I would strongly recomend reuse the connection instance.

// Create a connection to the cluster.
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);

try (Table table = connection.getTable(TableName.valueOf(tablename))) {
// use table as needed, the table returned is lightweight
}

As quite big (by default) thread pool executor for batch processing is initialized for each connection instance.

HConnectionImplementation implements ClusterConnection, Closeable {
...
private ExecutorService getBatchPool() {
  if (batchPool == null) {
    synchronized (this) {
      if (batchPool == null) {
        this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256),
            conf.getInt("hbase.hconnection.threads.core", 256), "-shared-", null);
        this.cleanupPool = true;
      }
    }
  }
  return this.batchPool;
}


来源:https://stackoverflow.com/questions/25835307/hbase-connection-pooling-for-very-frequent-scanning-of-row

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!