Export text stored as Bindata in mongodb

风流意气都作罢 提交于 2021-02-11 12:12:31

问题


I have a quite big collection of data, made by third parties, it contains a Bindata "package" column where plain ASCII is actually stored. I mean, they could have stored the text as string.

I have to export this collection in csv format, which works quite well with mongoexport, but the output contains the base64 encoded values for the "package" column. I need the actual text from that column, not BinData(0,\"NDYuN.....==").

What I've tried so far is update the collection with a new column "rawData", like this:

db.segments.find({"_id" : ObjectId("4fc79525f65181293930070b")}).forEach(function(data) {
  db.segments.update(
     {_id:data._id},
     {$set:{ "rawData" : data.package.toString() }}
  );
});

I've limited the find to just one document until I get it right. Unfortunately toString does not do the magic I expect.

Also, I've tried this:

db.segments.find({"_id" : ObjectId("4fc79525f65181293930070b")}).forEach(function(data){
  data.package = new String(data.package);
  db.segments.save(data); 
});

The result was even worse.

If I read the document with php, $response = $db->execute('return db.segments.findOne()'); then print_r($response) , I can validate that the data is properly stored, as base64.

I couldn't find a solution anywhere, perhaps because nobody ever needed to do something as stupid as this.


回答1:


The JavaScript BinData object has base64() and hex() methods, which you can use to receive a string in the respective format. As for decoding those values into text strings within JS, you have some options:

  • How do I base64 encode and decode using JavaScript?
  • How to convert from hex to ASCII in JavaScript

According to the BSON spec, binary data fields are composed of a 32-bit integer length, subtype, and byte array. The base64 and hexadecimal representations returned in JS contain the byte array prepended with the integer length. That means after decoding the values, we'll want to strip off the first four bytes. Here's an example using both options:

// https://stackoverflow.com/a/3058974/162228
function decode_base64(s) {
    var e={},i,k,v=[],r='',w=String.fromCharCode,u=0;
    var n=[[65,91],[97,123],[48,58],[43,44],[47,48]];

    for(z in n){for(i=n[z][0];i<n[z][1];i++){v.push(w(i));}}
    for(i=0;i<64;i++){e[v[i]]=i;}
    function a(c){
      if(c<128)r+=w(c);else if(c>=192)u=c;else r+=w(((u&31)<<6)+(c&63));
    }

    for(i=0;i<s.length;i+=72){
        var b=0,c,x,l=0,o=s.substring(i,i+72);
        for(x=0;x<o.length;x++){
            c=e[o.charAt(x)];b=(b<<6)+c;l+=6;
            while(l>=8)a((b>>>(l-=8))%256);
        }
    }
    return r;
}

// https://stackoverflow.com/a/3745677/162228
function hex2a(hex) {
    var str = '';
    for (var i = 0; i < hex.length; i += 2)
        str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
    return str;
}

db.segments.find().forEach(function(doc){
    print(decode_base64(doc.package.base64()));
});

db.segments.find().forEach(function(doc){
    print(hex2a(doc.package.hex()));
});

And here's a small PHP script I used to insert some fixture data:

<?php

$mongo = new Mongo();
$c = $mongo->selectCollection('test', 'segments');
$c->drop();

$c->save(['package' => new MongoBinData('foo')]);
$c->save(['package' => new MongoBinData('bar')]);

foreach ($c->find() as $doc) {
    printf("%s\n", $doc['bindata']->bin);
}

Depending on the size of your data set, it's likely more reasonable to do the binary field conversion in PHP. If you do want to utilize JavaScript, I'd definitely suggest executing the script via the shell client instead of db.eval(), so you don't lock up the database with a long-running JS function.



来源:https://stackoverflow.com/questions/11410170/export-text-stored-as-bindata-in-mongodb

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!