Why is CouchDB's reduce_limit enabled by default? (Is it better to approximate SQL JOINS in MapReduce views or List views?)

こ雲淡風輕ζ 提交于 2019-12-08 08:34:48

问题


I'm using CouchDB, and I want to make better use of MapReduce when querying data.

My exact use case is the following:

I have many surveys. Each survey has a meterNumber, meterReading, and meterReadingDate, for example:

{
  meterNumber: 1,
  meterReading: 2050,
  meterReadingDate: 1480000000000
}

I then use a Map function do produce readings by meterNumber. There are many keys that are repeated (reading the same meter on different dates). i.e.

[
  [meterNumber, {reading: xxx, readingDate: xxx}],
  [meterNumber, {reading: xxx, readingDate: xxx}],
  [meterNumber, {reading: xxx, readingDate: xxx}],
  etc
]

I then group these before sending to the reduce function, and the reduce function should then actually EXPAND the values set. I.e. I want this:

[
  [meterNumber, [{reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}]],
  [meterNumber, [{reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}]],
  [meterNumber, [{reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}]],
  etc
]

To run this MapReduce view on CouchDB I had to allow for this type of result set (Couchdb - Is it possible to deactivate the reduce_overflow_error error).

This suggests to me that I may run into performance problems with large result sets. Is this the case? Why would you have to specifically enable this setting on CouchDB?

*** EDIT

The accepted answer below pointed out to me that what I was doing in MapReduce was also possible (and better) using lists. Here is another good Stack Overflow answer on the same topic: Best way to do one-to-many "JOIN" in CouchDB

*** EDIT

Here is a reference from the CouchDB documentation: http://guide.couchdb.org/draft/transforming.html


回答1:


A reduce function is intended to reduce values associated with given keys.

CouchDB reduce_limit is here to detect badly designed reduce functions, which is what you did by concatenating values... But don't panic: any newcomer in CouchDB would do the same error.

The problem with concatenating values in a reduce function is that:

  1. it is totally unnecessary (if you need the whole list, just use a single map function),
  2. it is very unefficient: your index will be get bigger and bigger on your disk, and you will have more and more disk access time.

So... Just write a minimal map function such as:

function(o){
  emit(o.meterNumber);
}

Don't write any reduce function. And call the view with include_docs=true.

But maybe you were not pleased with the data format? No problem: you have list functions for this. Just remember that map and reduce functions should be used for pure data processing, not for formatting purpose.



来源:https://stackoverflow.com/questions/38026118/why-is-couchdbs-reduce-limit-enabled-by-default-is-it-better-to-approximate-s

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!