What is a Cursor in MongoDB?

前端 未结 4 927

We are troubled by eventually occurring cursor not found exceptions for some Morphia Queries asList and I\'ve found a hint on SO, that this might be quite memor

相关标签:
4条回答
  • 2020-12-08 07:28

    This error also comes when you have a large set of data and are doing batch processing on that data and each batch takes more time, totalling that time be exceeded the default cursor live time.

    Then you need to change that default time to tell mongo that will not expire this cursor until processing is done.

    Do check No TimeOut Documentation

    0 讨论(0)
  • 2020-12-08 07:32

    I am by no mean a mongodb expert but I just want to add some observations from working in a medium sized mongo system for the last year. Also thanks to @xameeramir for the excellent walkthough about how cursors work in general.

    The causes of a "cursor lost" exception may be several. One that I have noticed is explained in this answer.

    The cursor lives server side. It is not distributed over a replica set but exists on the instance that is primary at the time of creation. This means that if another instance takes over as primary the cursor will be lost to the client. If the old primary is still up and around it may still be there but for no use. I guess it is garbaged collected away after a while. So if your mongo replica set is unstable or you have a shaky network in front of it you are out of luck when doing any long running queries.

    If the full content of what the cursor wants to return does not fit in memory on the server the query may be very slow. RAM on your servers needs to be larger than the largest query you run.

    All this can partly be avoided by designing better. For a use case with large long running queries you may be better of with several smaller database collections instead of a big one.

    0 讨论(0)
  • 2020-12-08 07:35

    A cursor is an object returned by calling db.collection.find() and which enables iterating through documents (NoSQL equivalent of a SQL "row") of a MongoDB collection (NoSQL equivalent of "table").

    0 讨论(0)
  • 2020-12-08 07:36

    Here's a comparison between toArray() and cursors after a find() in the Node.js MongoDB driver. Common code:

    var MongoClient = require('mongodb').MongoClient,
    assert = require('assert');
    
    MongoClient.connect('mongodb://localhost:27017/crunchbase', function (err, db) {
        assert.equal(err, null);
        console.log('Successfully connected to MongoDB.');
    
        const query = { category_code: "biotech" };
    
        // toArray() vs. cursor code goes here
    });
    

    Here's the toArray() code that goes in the section above.

        db.collection('companies').find(query).toArray(function (err, docs) {
            assert.equal(err, null);
            assert.notEqual(docs.length, 0);
    
            docs.forEach(doc => {
                console.log(`${doc.name} is a ${doc.category_code} company.`);
            });
    
            db.close();
        });
    

    Per the documentation,

    The caller is responsible for making sure that there is enough memory to store the results.

    Here's the cursor-based approach, using the cursor.forEach() method:

        const cursor = db.collection('companies').find(query);
    
        cursor.forEach(
            function (doc) {
                console.log(`${doc.name} is a ${doc.category_code} company.`);
            },
            function (err) {
                assert.equal(err, null);
                return db.close();
            }
        );
    });
    

    With the forEach() approach, instead of fetching all data in memory, we're streaming the data to our application. find() creates a cursor immediately because it doesn't actually make a request to the database until we try to use some of the documents it will provide. The point of cursor is to describe our query. The second parameter to cursor.forEach shows what to do when an error occurs.

    In the initial version of the above code, it was toArray() which forced the database call. It meant we needed ALL the documents and wanted them to be in an array.

    Note that MongoDB returns data in batches. The image below shows requests from cursors (from application) to MongoDB:

    forEach scales better than toArray because we can process documents as they come in until we reach the end. Contrast it with toArray - where we wait for ALL the documents to be retrieved and the entire array is built. This means we're not getting any advantage from the fact that the driver and the database system are working together to batch results to your application. Batching is meant to provide efficiency in terms of memory overhead and the execution time. Take advantage of it in your application, if you can.

    0 讨论(0)
提交回复
热议问题