问题
I have to scan the table for row very frequent(~million times) in an hour. I have the information about rowid(which is a byte array). I am creating rowid for creating startrow and endrow which are essentially the same in my case.
public String someMethod(byte[] rowid){
if (aTable == null) {
aTable = new HTable(Config.getHadoopConfig(),
Config.getATable());
}
byte[] endRow = new byte[rowId.length];
endrow = System.copyArray(rowId, 0, endRow, 0, rowId.length)
Scan scan = new Scan(rowId , endRow)
//scanner implementation and iteration over the result
(ResultScanner result = aTable.getScanner(scan);) {
for (Result item : result) {
}
}
}
I am wondering whether I can implement some connection pooling for improving the performance. IS there any pooling mechanism available in HBase Java API. I am using 0.96.x version of HBase. Also, is there any configuration setting which can improve the performance. Thanks
回答1:
Taken From http://hbase.apache.org/book.html
Connection Pooling
For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads in a single JVM), you can pre-create an HConnection, as shown in the following example:
Example 9.1. Pre-Creating a HConnection
// Create a connection to the cluster.
HConnection connection = HConnectionManager.createConnection(Configuration);
HTableInterface table = connection.getTable("myTable");
// use table as needed, the table returned is lightweight
table.close();
// use the connection for other access to the cluster
connection.close();
回答2:
Connection pooling API has been changed since version 1.0.
New API code for reference of readers:
// Create a connection to the cluster.
Configuration conf = HBaseConfiguration.create();
try (Connection connection =
ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf(tablename))) {
// use table as needed, the table returned is lightweight
}
回答3:
Connection is thread safe, and very heavy-weight(this includes the zookeeper and socket connections etc), Hence it should be created only once per application and shared across threads.
Table is light weight and but NOT thread-safe.Only one thread can use a table instance and hence it is better to use HBaseconfiguration
instance when using Table instances.
Using HBaseConfiguration, will ensure the sharing of the Zookeeper and socket instances to the Region servers as well.
Sample Code:
Configuration config = HBaseConfiguration.create();
config.addResource("hbase-site.xml");
try{
Connection connection =
ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf("tableName"));
Get getVal = new Get(Bytes.toBytes("rowkey"));
Result result = table.get(getVal);
byte [] value =
result.getValue(Bytes.toBytes("cf"),Bytes.toBytes("dataCol"));
}
回答4:
I would strongly recomend reuse the connection instance.
// Create a connection to the cluster.
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
try (Table table = connection.getTable(TableName.valueOf(tablename))) {
// use table as needed, the table returned is lightweight
}
As quite big (by default) thread pool executor for batch processing is initialized for each connection instance.
HConnectionImplementation implements ClusterConnection, Closeable {
...
private ExecutorService getBatchPool() {
if (batchPool == null) {
synchronized (this) {
if (batchPool == null) {
this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256),
conf.getInt("hbase.hconnection.threads.core", 256), "-shared-", null);
this.cleanupPool = true;
}
}
}
return this.batchPool;
}
来源:https://stackoverflow.com/questions/25835307/hbase-connection-pooling-for-very-frequent-scanning-of-row