How to convert a string with characters in the int for the entire collection?

我是研究僧i 提交于 2020-01-06 05:47:10

问题


I have a collection of a similar look:

_id:5d0fe0dcfd8ea94eb4633222
Category:"Stripveiling (Nederlands)"
Category url:"https://www.catawiki.nl/a/11-stripveiling-nederlands"
Lot title:"Erwin Sels (Ersel) - Originele pagina"
Seller name:"Stripwereld"
Seller country:"Nederland"
Bids count:21
Winning bid:"€ 135"
Bid amount:"Closed"
Lot image:"https://assets.catawiki.nl/assets/2011/11/17/7/4/c/74c53540-f390-012e-..."

I need to change the "Winning bid" field to a int. That is, remove the currency sign and convert from string to int for the entire collection.

Nowhere in the documentation I could not find how to do it, do I really have to take every value with Python, remove the currency symbol and use the method update to do it? I have almost 8,000,000 records, it will be long.

How can I do this with the collection method? Or what is the quickest option to do this with Python?


回答1:


If you want to convert the entire collection, you can do it with Aggregation pipeline.

You need to convert the currency to string using $substr and $toInt( or $toDouble, or $convert whatever suits your case) in the $project stage and $out as your last stage of aggregation. $out writes the result of the aggregtion pipeline to the given collection name.

But be careful while using $out. According to official mongodb documentation :

Create New Collection

The $out operation creates a new collection in the current database if one does not already exist. The collection is not visible until the aggregation completes. If the aggregation fails, MongoDB does not create the collection.

Replace Existing Collection

If the collection specified by the $out operation already exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. Specifically, the $out operation:

  1. Creates a temp collection.
  2. Copies the indexes from the existing collection to the temp collection.
  3. Inserts the documents into the temp collection.
  4. Calls db.collection.renameCollection with dropTarget: true to rename the temp collection to the destination collection.

The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the pre-existing collection.

Try this :

db.collection_name.aggregate([
    {
        $project: {
            category : "$category",
            category_name : "$category_name",
            lot_title : "$lot_title",
            seller_name : "$seller_name",
            seller_country : "$seller_country",
            bid_count : "$bid_count",
            winning_bid : { $toInt : {$substr : ["$winning_bid",2,-1]}},
            bid_amount : "$bid_amount",
            lot_image : "$lot_image"
        }
    },{
        $out : "collection_name"
    }
])

you might need to use allowDiskUse : true as an option to aggregation pipeline, as you have a lots of documents, and it may surpass 16MB mongodb limit.

Don't forget to replace collection_name with actual collection name , and include all the required field in the $project stage which you need in the collection. And please double check the value first either with a different temporary_collection or just by removing the $out stage and checking the result of aggregation pipeline.

For detailed information read official mongodb documentation $out, $toInt, $toDouble, $convert, $substr and allowDiskUse.



来源:https://stackoverflow.com/questions/56813094/how-to-convert-a-string-with-characters-in-the-int-for-the-entire-collection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!