Random Sampling from Mongo

半世苍凉 提交于 2019-11-28 00:24:02

问题


I have a mongo collection with documents. There is one field in every document which is 0 OR 1. I need to random sample 1000 records from the database and count the number of documents who have that field as 1. I need to do this sampling 1000 times. How do i do it ?


回答1:


Here's an example in the mongo shell .. assuming a collection of collname, and a value of interest in thefield:

var total = db.collname.count();
var count = 0;
var numSamples = 1000;

for (i = 0; i < numSamples; i++) {
    var random = Math.floor(Math.random()*total);
    var doc = db.collname.find().skip(random).limit(1).next();
    if (doc.thefield) {
        count += (doc.thefield == 1);
    }
}



回答2:


For people coming to the answer, you should now use the new $sample aggregation function, new in 3.2.

https://docs.mongodb.org/manual/reference/operator/aggregation/sample/

db.collection_of_things.aggregate(
   [ { $sample: { size: 15 } } ]
)

Then add another step to count up the 0s and 1s using $group to get the count. Here is an example from the MongoDB docs.




回答3:


For MongoDB 3.0 and before, I use an old trick from SQL days (which I think Wikipedia use for their random page feature). I store a random number between 0 and 1 in every object I need to randomize, let's call that field "r". You then add an index on "r".

db.coll.ensureIndex(r: 1);

Now to get random x objects, you use:

var startVal = Math.random();
db.coll.find({r: {$gt: startVal}}).sort({r: 1}).limit(x);

This gives you random objects in a single find query. Depending on your needs, this may be overkill, but if you are going to be doing lots of sampling over time, this is a very efficient way without putting load on your backend.




回答4:


I was gonna edit my comment on @Stennies answer with this but you could also use a seprate auto incrementing ID index here as an alternative if you were to skip over HUGE amounts of record (talking huge here).

I wrote another answer to another question a lot like this one where some one was trying to find nth record of the collection:

php mongodb find nth entry in collection

The second half of my answer basically describes one potential method by which you could approach this problem. You would still need to loop 1000 times to get the random row of course.




回答5:


If you are using mongoengine, you can use a SequenceField to generate an incremental counter.

class User(db.DynamicDocument):
    counter = db.SequenceField(collection_name="user.counters")

Then to fetch a random list of say 100, do the following

def get_random_users(number_requested):
    users_to_fetch = random.sample(range(1, User.objects.count() + 1), min(number_requested, User.objects.count()))
    return User.objects(counter__in=users_to_fetch)

where you would call

get_random_users(100)


来源:https://stackoverflow.com/questions/12664816/random-sampling-from-mongo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!