Mongoimport csv files with string _id and upsert

a 夏天 提交于 2019-12-01 04:05:08

问题


I'm trying to use mongoimport to upsert data with string values in _id. Since the ids look like integers (even though they're in quotes), mongoimport treats them as integers and creates new records instead of upserting the existing records.

Command I'm running:

mongoimport --host localhost --db database --collection my_collection --type csv --file mydata.csv --headerline --upsert

Example data in mydata.csv:

{ "_id" : "0364", someField: "value" }

The result would be for mongo to insert a record like this: { "_id" : 364, someField: "value" } instead of updating the record with _id "0364".

Does anyone know how to make it treat the _id as strings?

Things that don't work:

  • Surrounding the data with double double quotes ""0364"", double and single quotes "'0364'" or '"0364"'
  • Appending empty string to value: { "_id" : "0364" + "", someField: "value" }

回答1:


Unfortunately there is not now a way to force number-like strings to be interpreted as strings:

https://jira.mongodb.org/browse/SERVER-3731

You could write a script in Python or some other language with which you're comfortable, along the lines of:

import csv, pymongo

connection = pymongo.Connection()
collection = connection.mydatabase.mycollection
reader = csv.DictReader(open('myfile.csv'))
for line in reader:
    print '_id', line['_id']
    upsert_fields = {
        '_id': line['_id'],
        'my_other_upsert_field': line['my_other_upsert_field']}

    collection.update(upsert_fields, line, upsert=True, safe=True)



回答2:


Just encountered this same issue and discovered an alternative. You can force Mongo to use string types for non-string values by converting your CSV to JSON and quoting the field. For example, if your CSV looks like this:

key value
123 foo
abc bar

Then you'll get an integer field for key 123 and a string field for key abc. If you convert that to JSON, making sure that all the keys are quoted, and then use --type json when you import, you'll end up with the desired behavior:

{
    "123":"foo",
    "abc":"bar"
}



回答3:


I was able to prefix the numeric string and that worked for me. Example:

00012345 was imported as 12345 (Type Int) string00012345 was imported as string00012345 (Type String)

My source was a SQL database so I just did

select 'string'+column as name

Of course, you also need to do a bit of post-processing to parse the string, but far less effort than converting a rather large tsv file to json.

I also added +1 to the jira link above for the enhancement.




回答4:


As an alternative to @Jesse, you can do something similar in the mongo console, e.g.

db.my_collection.find().forEach(function (obj) {
  db.my_collection.remove({_id: obj._id); // remove the old one
  obj._id = '' + obj._id; // change to string
  db.my_collection.save(obj); // resave
});

For non _id fields you can simply do:

db.my_collection.find().forEach(function (obj) {
  obj.someField = '' + obj.someField; // change to string
  db.my_collection.save(obj); // resave
});



回答5:


I encountered the same issue.

I feel the simplest way is to convert the CSV file to a JSON file using an online tool and then import.

This is the tool I used:

http://www.convertcsv.com/csv-to-json.htm

It lets you wrap the integer values of your CSV file in double quotes for your JSON file.

If you have trouble importing this JSON file and encountering an error, just add --jsonArray to your import command. It will work for sure.

mongoimport --host localhost --db mydb -c mycollection --type json --jsonArray --file <file_path>


来源:https://stackoverflow.com/questions/10302149/mongoimport-csv-files-with-string-id-and-upsert

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!