I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ \"_id\" = ObjectId(\"50731xxxxxxxxxxxxxxxxxx
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id) .foreach(x=>cole.remove({id $eq x})