Say I have a single-page application that uses a third party API for content. The app’s logic is in-browser only, and there is no backend I can write to.
To allow de
Why not using protocol-buffers?
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
ProtoBuf.js converts objects to protocol buffer messages and vice vera.
The following object converts to: CgFhCgFiCgFjEgFkEgFlEgFmGgFnGgFoGgFpIgNqZ2I=
{
repos : ['a', 'b', 'c'],
labels: ['d', 'e', 'f'],
milestones : ['g', 'h', 'i'],
username : 'jgb'
}
The following example is built using require.js. Give it a try on this jsfiddle.
require.config({
paths : {
'Math/Long' : '//rawgithub.com/dcodeIO/Long.js/master/Long.min',
'ByteBuffer' : '//rawgithub.com/dcodeIO/ByteBuffer.js/master/ByteBuffer.min',
'ProtoBuf' : '//rawgithub.com/dcodeIO/ProtoBuf.js/master/ProtoBuf.min'
}
})
require(['message'], function(message) {
var data = {
repos : ['a', 'b', 'c'],
labels: ['d', 'e', 'f'],
milestones : ['g', 'h', 'i'],
username : 'jgb'
}
var request = new message.arguments(data);
// Convert request data to base64
var base64String = request.toBase64();
console.log(base64String);
// Convert base64 back
var decodedRequest = message.arguments.decode64(base64String);
console.log(decodedRequest);
});
// Protobuf message definition
// Message definition could also be stored in a .proto definition file
// See: https://github.com/dcodeIO/ProtoBuf.js/wiki
define('message', ['ProtoBuf'], function(ProtoBuf) {
var proto = {
package : 'message',
messages : [
{
name : 'arguments',
fields : [
{
rule : 'repeated',
type : 'string',
name : 'repos',
id : 1
},
{
rule : 'repeated',
type : 'string',
name : 'labels',
id : 2
},
{
rule : 'repeated',
type : 'string',
name : 'milestones',
id : 3
},
{
rule : 'required',
type : 'string',
name : 'username',
id : 4
},
{
rule : 'optional',
type : 'bool',
name : 'with_comments',
id : 5
},
{
rule : 'optional',
type : 'bool',
name : 'without_comments',
id : 6
}
],
}
]
};
return ProtoBuf.loadJson(proto).build('message')
});
Short
Use a URL packing scheme such as my own, starting only from the params section of your URL.
Longer
As other's here have pointed out, typical compression systems don't work for short strings. But, it's important to recognise that URLs and Params are a serialization format of a data model: a text human-readable format with specific sections - we know that the scheme is first, the host is found directly after, the port is implied but can be overridden, etc...
With the underlying conceptual data model, one can serialize with a more bit-efficient serialization scheme. In fact, I have created such a serialization myself which archives around 50% compression: see http://blog.alivate.com.au/packed-url/
Conceptually, my scheme was written with the conceptual data model in mind, it doesn't deserialize the URL into that conceptual model as a distinct step. However, that's possible, and that formal approach might yield greater efficiencies, where the bits don't need to be in the same order as what a string URL might be.
Perhaps you can find a url shortener with a jsonp API, that way you could make all the URLs really short automatically.
http://yourls.org/ even has jsonp support.
Update: I released an NPM package with some more optimizations, see https://www.npmjs.com/package/@yaska-eu/jsurl2
Some more tips:
a..zA..Z0..9+/=
, and un-encoded URI characters are a..zA..Z0..9-_.~
. So Base64 results only need to swap +/=
for -_.
and it won't expand URIs.{foo:3,bar:{g:'hi'}}
becomes a3,b{c'hi'}
given key array ['foo','bar','g']
Interesting libraries:
{"name":"John Doe","age":42,"children":["Mary","Bill"]}
becomes ~(name~'John*20Doe~age~42~children~(~'Mary~'Bill))
and with a key dictionary ['name','age','children']
that could be ~(0~'John*20Doe~1~42~2~(~'Mary~'Bill))
, thus going from 101 bytes URI encoded to 38.
compressToEncodedURIComponent()
function to produce URI-safe output.
So basically I'd recommend picking one of these two libraries and consider the problem solved.
It looks like the Github APIs have numeric IDs for many things (looks like repos and users have them, but labels don't) under the covers. It might be possible to use those numbers instead of names wherever advantageous. You then have to figure out how to best encode those in something that'll survive in a query string, e.g. something like base64(url).
For example, your hoodie.js repository has ID 4780572
.
Packing that into a big-endian unsigned int (as many bytes as we need) gets us \x00H\xf2\x1c
.
We'll just toss the leading zero, we can always restore that later, now we have H\xf2\x1c
.
Encode as URL-safe base64, and you have SPIc
(toss any padding you might get).
Going from hoodiehq/hoodie.js
to SPIc
seems like a good-sized win!
More generally, if you're willing to invest the time, you can try to exploit a bunch of redudancies in your query strings. Other ideas are along the lines of packing the two boolean params into a single character, possibly along with other state (like what fields are included). If you use base64-encoding (which seems the best option here due to the URL-safe version -- I looked at base85, but it has a bunch of characters that won't survive in a URL), that gets you 6 bits of entropy per character... there's a lot you can do with that.
To add to Thomas Fuchs' note, yes, if there's some kind of inherent, immutable ordering in some of things you're encoding, than that would obviously also help. However, that seems hard for both the labels and the milestones.
Maybe any simple JS minifier will help you. You'll need only to integrate it on serialization and deserialization points only. I think it'd be the easiest solution.