Using a CouchDB view, can I count groups and filter by key range at the same time?

坚强是说给别人听的谎言 提交于 2019-12-20 14:15:20

问题


I'm using CouchDB. I'd like to be able to count occurrences of values of specific fields within a date range that can be specified at query time. I seem to be able to do parts of this, but I'm having trouble understanding the best way to pull it all together.

Assuming documents that have a timestamp field and another field, e.g.:

{ date: '20120101-1853', author: 'bart' }
{ date: '20120102-1850', author: 'homer'}
{ date: '20120103-2359', author: 'homer'}
{ date: '20120104-1200', author: 'lisa'}
{ date: '20120815-1250', author: 'lisa'}

I can easily create a view that filters documents by a flexible date range. This can be done with a view like the one below, called with key range parameters, e.g. _view/all-docs?startkey=20120101-0000&endkey=20120201-0000.

all-docs/map.js:

function(doc) {
    emit(doc.date, doc);
}

With the data above, this would return a CouchDB view containing just the first 4 docs (the only docs in the date range).

I can also create a query that counts occurrences of a given field, like this, called with grouping, i.e. _view/author-count?group=true:

author-count/map.js:

function(doc) {
  emit(doc.author, 1);
}

author-count/reduce.js:

function(keys, values, rereduce) {
  return sum(values);
}

This would yield something like:

{
    "rows": [
        {"key":"bart","value":1},
        {"key":"homer","value":2}
        {"key":"lisa","value":2}
     ]
}

However, I can't find the best way to both filter by date and count occurrences. For example, with the data above, I'd like to be able to specify range parameters like startkey=20120101-0000&endkey=20120201-0000 and get a result like this, where the last doc is excluded from the count because it is outside the specified date range:

{
    "rows": [
        {"key":"bart","value":1},
        {"key":"homer","value":2}
        {"key":"lisa","value":1}
     ]
}

What's the most elegant way to do this? Is this achievable with a single query? Should I be using another CouchDB construct, or is a view sufficient for this?


回答1:


You can get pretty close to the desired result with a list:

{
  _id: "_design/authors",
  views: {
    authors_by_date: {
      map: function(doc) {
        emit(doc.date, doc.author);
      }
    }
  },
  lists: {
    count_occurrences: function(head, req) {
      start({ headers: { "Content-Type": "application/json" }});

      var result = {};
      var row;
      while(row = getRow()) {
        var val = row.value;
        if(result[val]) result[val]++;
        else result[val] = 1;
      }
      return result;
    }
  }
}

This design can be requested as such:

http://<couchurl>/<db>/_design/authors/_list/count_occurrences/authors_by_date?startkey=<startDate>&endkey=<endDate>

This will be slower than a normal map-reduce, and is a bit of a workaround. Unfortunately, this is the only way to do a multi-dimensional query, "which CouchDB isn’t suited for".

The result of requesting this design will be something like this:

{
  "bart": 1,
  "homer": 2,
  "lisa": 2
}

What we do is basically emit a lot of elements, then using a list to group them as we want. A list can be used to display a result in any way you want, but will also often be slower. Whereas a normal map-reduce can be cached and only change according to the diffs, the list will have to be built anew every time it is requested.

It is pretty much as slow as getting all the elements resulting from the map (the overhead of orchestrating the data is mostly negligible): a lot slower than getting the result of a reduce.

If you want to use the list for a different view, you can simply exchange it in the URL you request:

http://<couchurl>/<db>/_design/authors/_list/count_occurrences/<view>

Read more about lists on the couchdb wiki.




回答2:


You need to create a combined view:

combined/map.js:

function(doc) {
    emit([doc.date, doc.author], 1);
}

combined/reduce.js:

_sum

This way you will be able to filter documents by start/end date.

startkey=[20120101-0000, "a"]&endkey=[20120201-0000, "a"]



回答3:


Although your problem is hard to solve in general case, knowing some more restrictions on the possible queries can help a lot. E.g. if you know you will search on the ranges that will cover full days/months you can user the arrays of [year, month, day, time] instead of the string:

emit([doc.date_year, doc.date_month, doc.date_day, doc.date_time, doc.author] doc);

Even if you cannot predict that all possible queries will fit into grouping based on this key type, splitting the key may help you to optimize your range queries and decrease number of lookups needed (with the cost of some extra space).



来源:https://stackoverflow.com/questions/12944294/using-a-couchdb-view-can-i-count-groups-and-filter-by-key-range-at-the-same-tim

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!