CouchDB Views: remove duplicates *and* order by time

北慕城南 提交于 2019-12-19 06:25:10

问题


Based on a great answer to my previous question, I've partially solved a problem I'm having with CouchDB.

This resulted in a new view.

Now, the next thing I need to do is remove duplicates from this view while ordering by date.

For example, here is how I might query that view:

GET http://scoates-test.couchone.com/follow/_design/asset/_view/by_userid_following?endkey=[%22c988a29740241c7d20fc7974be05ec54%22]&startkey=[%22c988a29740241c7d20fc7974be05ec54%22,{}]&descending=true&limit=3

Resulting in this:

HTTP 200 http://scoates-test.couchone.com/follow/_design/asset/_view/by_userid_following
http://scoates-test.couchone.com > $_.json.rows
[ { id: 'c988a29740241c7d20fc7974be067295'
  , key: 
     [ 'c988a29740241c7d20fc7974be05ec54'
     , '2010-11-26T17:00:00.000Z'
     , 'clementine'
     ]
  , value: 
     { _id: 'c988a29740241c7d20fc7974be062ee8'
     , owner: 'c988a29740241c7d20fc7974be05f67d'
     }
  }
, { id: 'c988a29740241c7d20fc7974be068278'
  , key: 
 [ 'c988a29740241c7d20fc7974be05ec54'
     , '2010-11-26T15:00:00.000Z'
     , 'durian'
     ]
  , value: 
     { _id: 'c988a29740241c7d20fc7974be065115'
     , owner: 'c988a29740241c7d20fc7974be060bb4'
     }
  }
, { id: 'c988a29740241c7d20fc7974be068026'
  , key: 
     [ 'c988a29740241c7d20fc7974be05ec54'
     , '2010-11-26T14:00:00.000Z'
     , 'clementine'
     ]
  , value: 
     { _id: 'c988a29740241c7d20fc7974be063b6d'
     , owner: 'c988a29740241c7d20fc7974be05ff71'
     }
  }
]

As you can see, "clementine" shows up twice.

If I change the view to emit the fruit/asset name as the second key (instead of the time), I can change the grouping depth to collapse these, but that doesn't solve my order-by-time requirement. Similarly, with the above setup, I can order by time, but I can't collapse duplicate asset names into single rows (to allow e.g. 10 assets per page).

Unfortunately, this is not a simple question to explain. Maybe this chat transcript will help a little.

Please help. I'm afraid that what I need to do is still not possible.

S


回答1:


You can do this using list function. Here is an example to generate a really simple list containing all the owner fields without dupes. You can easily modify it to produce json or xml or anything you want.

Put it into your assets design doc inside the lists.nodupes and use like this: http://admin:123@127.0.0.1:5984/follow/_design/assets/_list/nodupes/by_userid_following_reduce?group=true

function(head, req) {
    start({
          "headers": {
          "Content-Type": "text/html"
          }
         });
    var row;
    var dupes = [];
    while(row = getRow()) {
    if (dupes.indexOf(row.key[2]) == -1) {
        dupes.push(row.key[2]);
        send(row.value[0].owner+"<br>");
    }
    } 
}



回答2:


Ordering by one field and uniquing on another isn't something the basic map reduce can do. All it can do is sort your data, and apply reduce rollups to dynamic key-ranges.

To find the latest entry for each type of fruit, you'd need to query once per fruit.

There are some ways to do this that are kinda sane.

You'll want a view with keys like [fruit_type, date], and then you can query like this:

for fruit in fruits
  GET /db/_design/foo/_view/bar?startkey=["apples"]&limit=1&descending=true

This will give you the latest entry for each fruit.

The list operation could be used to do this, it would just echo the first row from each fruit's block. This would be efficient enough as long as each fruit has a small number of entries. Once there are many entries per fruit, you'll be discarding more data than you echo, so the multi-query approach actually scales better than the list approach, when you get to a large data set. Luckily they can both work on the same view index, so when you have to switch it won't be a big deal.



来源:https://stackoverflow.com/questions/4298937/couchdb-views-remove-duplicates-and-order-by-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!