Base64 length calculation?

允我心安 提交于 2019-12-27 10:59:49

问题


After reading the base64 wiki ...

I'm trying to figure out how's the formula working :

Given a string with length of n , the base64 length will be

Which is : 4*Math.Ceiling(((double)s.Length/3)))

I already know that base64 length must be %4==0 to allow the decoder know what was the original text length.

The max number of padding for a sequence can be = or ==.

wiki :The number of output bytes per input byte is approximately 4 / 3 (33% overhead)

Question:

How does the information above settle with the output length

?

回答1:


Each character is used to represent 6 bits (log2(64) = 6).

Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes.

So you need 4*(n/3) chars to represent n bytes, and this needs to be rounded up to a multiple of 4.

The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.




回答2:


4 * n / 3 gives unpadded length.

And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.

((4 * n / 3) + 3) & ~3



回答3:


For reference, the Base64 encoder's length formula is as follows:

As you said, a Base64 encoder given n bytes of data will produce a string of 4n/3 Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula is Ceiling(4n/3).

The Wikipedia article shows exactly how the ASCII string Man encoded into the Base64 string TWFu in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.

You ask in a comment what the size of encoding 123456 would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters.

Putting 123456 into a Base64 encoder creates MTIzNDU2, which is 8 characters long, just as we expected.




回答4:


Integers

Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.

For this it is a good idea to remember how to perform the ceiling division: ceil(x / y) in doubles can be written as (x + y - 1) / y (while avoiding negative numbers, but beware of overflow).

Readable

If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):

public static int ceilDiv(int x, int y) {
    return (x + y - 1) / y;
}

public static int paddedBase64(int n) {
    int blocks = ceilDiv(n, 3);
    return blocks * 4;
}

public static int unpaddedBase64(int n) {
    int bits = 8 * n;
    return ceilDiv(bits, 6);
}

// test only
public static void main(String[] args) {
    for (int n = 0; n < 21; n++) {
        System.out.println("Base 64 padded: " + paddedBase64(n));
        System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
    }
}

Inlined

Padded

We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):

blocks = (bytes + 3 - 1) / 3
chars = blocks * 4

or combined:

chars = ((bytes + 3 - 1) / 3) * 4

your compiler will optimize out the 3 - 1, so just leave it like this to maintain readability.

Unpadded

Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:

bits = bytes * 8
chars = (bits + 6 - 1) / 6

or combined:

chars = (bytes * 8 + 6 - 1) / 6

we can however still divide by two (if we want to):

chars = (bytes * 4 + 3 - 1) / 3

Unreadable

In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):

Padded

((n + 2) / 3) << 2

Unpadded

((n << 2) | 2) / 3

So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.

Notes:

  • Obviously you may need to add 1 to the calculations to include a null termination byte.
  • For Mime you may need to take care of possible line termination characters and such (look for other answers for that).



回答5:


I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.

The answer is (floor(n / 3) + 1) * 4 + 1

This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.

Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.




回答6:


Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:

private Double calcBase64SizeInKBytes(String base64String) {
    Double result = -1.0;
    if(StringUtils.isNotEmpty(base64String)) {
        Integer padding = 0;
        if(base64String.endsWith("==")) {
            padding = 2;
        }
        else {
            if (base64String.endsWith("=")) padding = 1;
        }
        result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
    }
    return result / 1000;
}



回答7:


Seems to me that the right formula should be:

n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)



回答8:


While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:

$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c

525

$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c

710

So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.




回答9:


I believe that this one is an exact answer if n%3 not zero, no ?

    (n + 3-n%3)
4 * ---------
       3

Mathematica version :

SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]

Have fun

GI




回答10:


Simple implementantion in javascript

function sizeOfBase64String(base64String) {
    if (!base64String) return 0;
    const padding = (base64String.match(/(=*)$/) || [])[1].length;
    return 4 * Math.ceil((base64String.length / 3)) - padding;
}



回答11:


In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:

Mine64 string allocation size (approximate) = (((4 * ((binary buffer size) + 1)) / 3) + 1)

So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.




回答12:


If there is someone interested in achieve the @Pedro Silva solution in JS, I just ported this same solution for it:

const getBase64Size = (base64) => {
  let padding = base64.length
    ? getBase64Padding(base64)
    : 0
  return ((Math.ceil(base64.length / 4) * 3 ) - padding) / 1000
}

const getBase64Padding = (base64) => {
  return endsWith(base64, '==')
    ? 2
    : 1
}

const endsWith = (str, end) => {
  let charsFromEnd = end.length
  let extractedEnd = str.slice(-charsFromEnd)
  return extractedEnd === end
}



回答13:


For all people who speak C, take a look at these two macros:

// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 encoding operation
#define B64ENCODE_OUT_SAFESIZE(x) ((((x) + 3 - 1)/3) * 4 + 1) 

// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 decoding operation
#define B64DECODE_OUT_SAFESIZE(x) (((x)*3)/4) 

Taken from here.



来源:https://stackoverflow.com/questions/13378815/base64-length-calculation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!