MongoDB: does document size affect query performance?

后端 未结 4 1901
不思量自难忘°
不思量自难忘° 2021-01-31 16:31

Assume a mobile game that is backed by a MongoDB database containing a User collection with several million documents.

Now assume several dozen properties t

4条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-31 17:10

    Short answer: yes.

    Long answer: how it will affect the queries depends on many factors, like the nature of the queries, the memory available and the indices sizes.

    The best you can do is testing.

    The code bellow will generate two collections named smallDocuments and bigDocuments, with 1024 documents each, being different only by a field 'c' containing a big string and the _id. The bigDocuments collection will have about 2GB, so be careful running it.

    const numberOfDocuments = 1024;
    
    // 2MB string x 1024 ~ 2GB collection
    const bigString = 'a'.repeat(2 * 1024 * 1024);
    
    // generate and insert documents in two collections: shortDocuments and
    // largeDocuments;
    for (let i = 0; i < numberOfDocuments; i++) {
      let doc = {};
      // field a: integer between 0 and 10, equal in both collections;
      doc.a = ~~(Math.random() * 10);
    
      // field b: single character between a to j, equal in both collections;
      doc.b = String.fromCharCode(97 + ~~(Math.random() * 10));
    
      //insert in smallDocuments collection
      db.smallDocuments.insert(doc);
    
      // field c: big string, present only in bigDocuments collection;
      doc.c = bigString;
    
      //insert in bigDocuments collection
      db.bigDocuments.insert(doc);
    }
    

    You can put this code in a file (e.g. create-test-data.js) and run it directly in the mongoshell, typing this command:

    mongo testDb < create-test-data.js

    It will take a while. After that you can execute some test queries, like these ones:

    const numbersToQuery = [];
    
    // generate 100 random numbers to query documents using field 'a':
    for (let i = 0; i < 100; i++) {
      numbersToQuery.push(~~(Math.random() * 10));
    }
    
    const smallStart = Date.now();
    numbersToQuery.forEach(number => {
      // query using inequality conditions: slower than equality
      const docs = db.smallDocuments
        .find({ a: { $ne: number } }, { a: 1, b: 1 })
        .toArray();
    });
    print('Small:' + (Date.now() - smallStart) + ' ms');
    
    const bigStart = Date.now();
    numbersToQuery.forEach(number => {
      // repeat the same queries in the bigDocuments collection; note that the big field 'c'
      // is ommited in the projection
      const docs = db.bigDocuments
        .find({ a: { $ne: number } }, { a: 1, b: 1 })
        .toArray();
    });
    print('Big: ' + (Date.now() - bigStart) + ' ms');
    

    Here I got the following results:

    Without index:

    Small: 1976 ms
    Big: 19835 ms
    

    After indexing field 'a' in both collections, with .createIndex({ a: 1 }):

    Small: 2258 ms
    Big: 4761 ms
    

    This demonstrates that queries on big documents are slower. Using index, the result time from bigDocuments is more than 100% bigger than in smallDocuments.

    My sugestions are:

    1. Use equality conditions in queries (https://docs.mongodb.com/manual/core/query-optimization/index.html#query-selectivity);
    2. Use covered queries (https://docs.mongodb.com/manual/core/query-optimization/index.html#covered-query);
    3. Use indices that fit in memory (https://docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram/);
    4. Keep documents small;
    5. If you need phrase queries using text indices, make sure the entire collection fits in memory (https://docs.mongodb.com/manual/core/index-text/#storage-requirements-and-performance-costs, last bullet);
    6. Generate test data and make test queries, simulating your app use case; use random strings generators if needed.

    I had problems with text queries in big documents, using MongoDB: Autocomplete and text search memory issues in apostrophe-cms: need ideas

    Here there is some code I wrote to generate sample data, in ApostropheCMS, and some test results: https://github.com/souzabrs/misc/tree/master/big-pieces.

    This is more a database design issue than a MongoDB internal one. I think MongoDB was made to behave this way. But, it would help a lot to have more obvious explanation in its documentation.

提交回复
热议问题