realtime querying/aggregating millions of records - hadoop? hbase? cassandra?

前端 未结 5 702
不思量自难忘°
不思量自难忘° 2021-01-31 06:30

I have a solution that can be parallelized, but I don\'t (yet) have experience with hadoop/nosql, and I\'m not sure which solution is best for my needs. In theory, if I had unl

5条回答
  •  萌比男神i
    2021-01-31 07:21

    If I understand you correctly and you only need to aggregate on single columns at a time You can store your data differently for better results in HBase that would look something like table per data column in today's setup and another single table for the filtering fields (type_ids) row for each key in today's setup - you may want to think how to incorporate your filter fields into the key for efficient filtering - otherwise you'd have to do a two phase read ( column for each table in today's setup (i.e. few thousands of columns) HBase doesn't mind if you add new columns and is sparse in the sense that it doesn't store data for columns that don't exist. When you read a row you'd get all the relevant value which you can do avg. etc. quite easily

提交回复
热议问题