How NoSQL databases perform on aggregate functions (AVG, SUM, etc)

孤街浪徒 提交于 2019-12-13 16:48:52

问题


We need to work periodically with a pretty big dataset (30-40GB). It has a lot of values ordered by time (and more information) but we basically need to perform some mathematical operations by month.

Our first approach was to use a MySQL database to back the data, as we have a reasonable experience with the engine and with the relational approach. However, the process takes too long and we were wondering if a NoSQL approach can do it better.

Basically the data that we need to express is:

Value: { NumericalValue, Year, Month }
Entity: List of 'Value'

We process this list three times, doing simple mathematical operations, and when I say 'process' I mean iterate through the dataset and perform the calculus. When everything is over, we have the same structure (but with different data):

Value: { NumericalValue, Year, Month }
Entity: List of 'Value'

It's now when we found the biggest problems as we need to calculate some AVERAGES and it takes a lot. As we repeat this process some times, I think that the most consuming tasks are:

1) Exporting the dataset to MySQL. Which means a lot of inserts from text files.

And when the data is transformed:

2) Compute some queries that contains aggregate functions (AVG,SUM) with LIMIT. 3) Compute some queries that contains aggreate functions with the whole dataset.

Usually, even with some indexes added, we feel that things take too long (20 mins some queries). Any tip or resolution strategy would be very appreciated. I feel that NoSQL databases aren't designed specifically for this, but maybe some experiences could help :).

Thanks for your time,


回答1:


Your task fits very well into Columnar databases. Column-oriented NoSQL(e.g. Cassandra) databases store data tables as sections of columns of data rather than as rows of data. This improves the speed of aggregations drastically. This have to do with systems that rely on hard disks for storage. If this is not the case(in-memory databases for examples) there are much more options for squeezing out performance.



来源:https://stackoverflow.com/questions/18203214/how-nosql-databases-perform-on-aggregate-functions-avg-sum-etc

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!