In MongoDB mapreduce, how can I flatten the values object?

前端 未结 7 2189
挽巷
挽巷 2020-12-04 22:22

I\'m trying to use MongoDB to analyse Apache log files. I\'ve created a receipts collection from the Apache access logs. Here\'s an abridged summary of what my

7条回答
  •  生来不讨喜
    2020-12-04 22:48

    While experimenting with Vincent's answer, I found a couple of problems. Basically, if you perform updates within a foreach loop, this will move the document to the end of the collection and the cursor will reach that document again (example). This can be circumvented if $snapshot is used. Hence, I am providing a Java example below.

    final List> bulkUpdate = new ArrayList<>();
    
    // You should enable $snapshot if performing updates within foreach
    collection.find(new Document().append("$query", new Document()).append("$snapshot", true)).forEach(new Block() {
        @Override
        public void apply(final Document document) {
            // Note that I used incrementing long values for '_id'. Change to String if
            // you used string '_id's
            long docId = document.getLong("_id");
            Document subDoc = (Document)document.get("value");
            WriteModel m = new ReplaceOneModel<>(new Document().append("_id", docId), subDoc);
            bulkUpdate.add(m);
    
            // If you used non-incrementing '_id's, then you need to use a final object with a counter.
            if(docId % 1000 == 0 && !bulkUpdate.isEmpty()) {
                collection.bulkWrite(bulkUpdate);
                bulkUpdate.removeAll(bulkUpdate);
            }
        }
    });
    // Fixing bug related to Vincent's answer.
    if(!bulkUpdate.isEmpty()) {
        collection.bulkWrite(bulkUpdate);
        bulkUpdate.removeAll(bulkUpdate);
    }
    

    Note : This snippet takes an average of 7.4 seconds to execute on my machine with 100k records and 14 attributes (IMDB dataset). Without batching, it takes an average of 25.2 seconds.

提交回复
热议问题