Extracting a list of substrings from MongoDB using a Regular Expression

后端 未结 3 451
感动是毒
感动是毒 2021-01-15 05:53

I need to extract a part of a string that matches a regex and return it.

I have a set of documents such as:

{\"_id\" :12121, \"fileName\" : \"apple.d         


        
3条回答
  •  轮回少年
    2021-01-15 06:30

    Starting Mongo 4.2, the $regexFind aggregation operator makes things easier:

    // { _id : 12121, fileName: "apple.doc" }
    // { _id : 12125, fileName: "rap.txt" }
    // { _id : 12126, fileName: "tap.pdf" }
    // { _id : 12127, fileName: "cricket.txt" }
    // { _id : 12129, fileName: "oops" }
    db.collection.aggregate([
      { $set: { ext: { $regexFind: { input: "$fileName", regex: /\.\w+$/ } } } },
      { $group: { _id: null, extensions: { $addToSet: "$ext.match" } } }
    ])
    // { _id: null, extensions: [ ".doc", ".pdf", ".txt" ] }
    

    This makes use of:

    • The $set operator, which adds a new field to each the documents.
    • This new field (ext) is the result of the $regexFind operator, which captures the result of a matching regex. If a match is found, it returns a document that contains information on the first match. If a match is not found, returns null. For instance:
      • For { fileName: "tap.pdf" }, it produces { matches: { match: ".pdf", idx: 3, captures: [] }.
      • For { fileName: "oops" }, it produces { matches: null }.
    • Finally, using a $group stage, coupled with $addToSet on the match subfield, we can generate the list of distinct extensions.

提交回复
热议问题