MongoDb Aggregate , find duplicate records within 7 days

假装没事ソ 提交于 2020-03-04 21:36:20

问题


I have to create a check for this use case-

Duplicate payment check

• Same amount to a same account number in last 7 days for all transactions.

I haven't used mongoDb as much would have been easier for me to write in sql

This is what I am trying without the 7 days part

db.transactiondetails.aggregate({$group: {"_id":{"account_number":"$account_number","amount":"$amount"},"count": { $sum: 1 }}}) 

Where I get something like this :

{ "_id" : { "account_number" : "xxxxxxxy", "amount" : 19760 }, "count" : 2 }
{ "_id" : { "account_number" : "xxxxzzzz", "amount" : 20140 }, "count" : 2 }
...

I have created_at and updated_at which are date fields , I am using updated_at for duplicates

for example :

"created_at" : ISODate("2019-01-07T15:40:53.683Z"),
"updated_at" : ISODate("2019-01-09T06:48:44.839Z"), 

In sql we can create groups of 7 days, for each date there will be a start date plus 7 days in which we need to find the duplicates.

It is running 7 day groups where I need to find duplicates.

Any help how to go about this will be appreciated.


回答1:


Check if this meets your requirements:

Explanation

  1. We sort documents (I assume you have indexes). We need it to iterate array in the next steps.
  2. We group by account_number + amount and create arrays (data, tmp) with documents
  3. We $unwind (flatten) tmp array to calculate how many days past for itemi to itemi+1 - n
  4. We count how many duplicates we have for different dates
  5. Skip all counts = 0

db.transactiondetails.aggregate([
  {
    $sort: {
      account_number: 1,
      amount: 1,
      updated_at: 1
    }
  },
  {
    $group: {
      "_id": {
        "account_number": "$account_number",
        "amount": "$amount"
      },
      "data": {
        $push: "$$ROOT"
      },
      "tmp": {
        $push: "$$ROOT"
      }
    }
  },
  {
    $unwind: "$tmp"
  },
  {
    $project: {
      _id: {
        account_number: "$_id.account_number",
        amount: "$_id.amount",
        updated_at: "$tmp.updated_at"
      },
      data: {
        $map: {
          input: {
            $slice: [
              "$data",
              {
                $add: [
                  {
                    $indexOfArray: [
                      "$data",
                      "$tmp"
                    ]
                  },
                  1
                ]
              },
              {
                $size: "$data"
              }
            ]
          },
          in: {
            "_id": "$$this._id",
            "account_number": "$$this.account_number",
            "amount": "$$this.amount",
            "created_at": "$$this.created_at",
            "updated_at": "$$this.updated_at",
            "days": {
              $divide: [
                {
                  $subtract: [
                    "$$this.updated_at",
                    "$tmp.updated_at"
                  ]
                },
                {
                  $multiply: [
                    24,
                    60,
                    60,
                    1000
                  ]
                }
              ]
            }
          }
        }
      }
    }
  },
  {
    $project: {
      count: {
        $size: {
          $filter: {
            input: "$data",
            cond: {
              $lte: [
                "$$this.days",
                7
              ]
            }
          }
        }
      }
    }
  },
  {
    $match: {
      "count": {
        $gt: 0
      }
    }
  }
])

MongoPlayground



来源:https://stackoverflow.com/questions/60371210/mongodb-aggregate-find-duplicate-records-within-7-days

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!