vldb

Performance for RBAR vs. set-based processing with varying transactional sizes

一曲冷凌霜 提交于 2019-12-11 03:35:18
问题 It is conventional wisdom that set based processing of tables should always be preferred over RBAR - especially when the tables grow larger and/or you need to update many rows. But does that always hold? I have experienced quite a few situations - on different hardware - where set-based processing shows exponential growth in time consumption, while splitting the same workload into smaller chunks gives linear growth. I think it would be interesting either to be proven totally wrong - if I'm

20 Billion Rows/Month - Hbase / Hive / Greenplum / What?

折月煮酒 提交于 2019-12-03 00:04:50
问题 I'd like to use your wisdom for picking up the right solution for a data-warehouse system. Here are some details to better understand the problem: Data is organized in a star schema structure with one BIG fact and ~15 dimensions. 20B fact rows per month 10 dimensions with hundred rows (somewhat hierarchy) 5 dimensions with thousands rows 2 dimensions with ~200K rows 2 big dimensions with 50M-100M rows Two typical queries run against this DB Top members in dimq: select top X dimq, count(id)