Using Solr to Query HBase

╄→尐↘猪︶ㄣ 提交于 2019-12-11 04:38:52

问题


I have a data warehousing problem, needing to query over a large dataset. For the sake of this example lets say a typical state would have 30 million users with activity stats for each. Ideally I could buy a data warehousing tool (Vertica, Infobright, etc...) but that's not in the cards or the budget.

Right now I'm considering using Solr to query HBase. While I believe HBase could scale up to the needs, I worry about Solr. It's optimized as a search engine, i.e. the first pages of results return before the last and there's no support for something like a database cursor. Tests so far have shown that getting a large result set out of Solr have been slower than I would've liked. For instance comparing a query that would retrieve half of the available users (one which ultimately returned 500 mb of data) in the community version of Infobright finished in under a minute, for Solr it took 12 minutes.

Is there something other than Solr that's better suited to query this data? Are there any optimizations that would help with bulk data input and output?


回答1:


I know this is a bit late but...

Depending on your search requirements Solr could be a good option. Keep in mind you most likely won't need to index everything in HBase. Are there certain fields you can pick out? Portions of text? You most certainly do NOT need to store this stuff in Solr if you're already storing it in HBase.

Solr is an excellent secondary index system to put on top of HBase, and Solr also has some great text analytics capabilities if that is what you need.

You should also take a look at ElasticSearch, one of Solr's primary competitors.




回答2:


Take a look at SolBase and Lily - two implementation that combine Solr with HBase backend



来源:https://stackoverflow.com/questions/14759778/using-solr-to-query-hbase

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!