I need to extract a part of a string that matches a regex and return it.
I have a set of documents such as:
{\"_id\" :12121, \"fileName\" : \"apple.d
Starting Mongo 4.2
, the $regexFind aggregation operator makes things easier:
// { _id : 12121, fileName: "apple.doc" }
// { _id : 12125, fileName: "rap.txt" }
// { _id : 12126, fileName: "tap.pdf" }
// { _id : 12127, fileName: "cricket.txt" }
// { _id : 12129, fileName: "oops" }
db.collection.aggregate([
{ $set: { ext: { $regexFind: { input: "$fileName", regex: /\.\w+$/ } } } },
{ $group: { _id: null, extensions: { $addToSet: "$ext.match" } } }
])
// { _id: null, extensions: [ ".doc", ".pdf", ".txt" ] }
This makes use of:
ext
) is the result of the $regexFind
operator, which captures the result of a matching regex. If a match is found, it returns a document that contains information on the first match. If a match is not found, returns null
. For instance:
{ fileName: "tap.pdf" }
, it produces { matches: { match: ".pdf", idx: 3, captures: [] }
.{ fileName: "oops" }
, it produces { matches: null }
.$group
stage, coupled with $addToSet on the match
subfield, we can generate the list of distinct extensions.