How to calculate the likely size of an OLAP cube

问题

Does anyone know a method to use to get a rough size of an OLAP cube based on a star schema data warehouse. Something based on the number of dimensions, the number of records in the dimension tables and the number of fact records and finally the number of aggregations or distinct records etc..

The database I am looking at has a fact table of over 20 billion rows and a few dimension tables of 20 million, 70 million and 1.3 billion rows.

Thanks Nicholas

回答1:

I can see some roadblocks to creating this estimate. Knowing the row counts and cardinalities of the dimension tables in isolation isn't nearly as important as the relationships between them.

Imagine two low-cardinality dimensions with n and m unique values respectively. Caching OLAP aggregates over those dimensions produces anywhere from n + m values to n * m values depending on how closely the relationship resembles a pure bijection. Given only the information you provided, all you can say is you'll end up with fewer than 3.64 * 10^34 values, which is not very useful.

I'm pessimistic there's an algorithm fast enough that it wouldn't make more sense to generate the cube and weigh it when you're done.

回答2:

We wrote a research paper that seems relevant:

Kamel Aouiche and Daniel Lemire, A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP, DOLAP 2007, pp. 17-24, 2007. http://arxiv.org/abs/cs.DB/0703058