Python: Ignore 'Incorrect padding' error when base64 decoding

匿名 (未验证) 提交于 2019-12-03 01:31:01

问题:

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string) 

it raises an 'Incorrect padding' error. Is there another way?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data 

回答1:

As said in other responses, there are various ways in which base64 data could be corrupted.

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

def decode_base64(data):     """Decode base64, padding being optional.      :param data: Base64 data as an ASCII byte string     :returns: The decoded byte string.      """     missing_padding = len(data) % 4     if missing_padding != 0:         data += b'='* (4 - missing_padding)     return base64.decodestring(data) 

Tests for this function: weasyprint/tests/test_css.py#L68



回答2:

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong



回答3:

Just add padding as required. Heed Michael's warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh 


回答4:

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

If suggested "adding padding" methods don't work, try removing some trailing bytes:

lens = len(strg) lenx = lens - (lens % 4 if lens % 4 else 4) try:     result = base64.decodestring(strg[:lenx]) except etc 

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

from collections import defaultdict d = defaultdict(int) import string s = set(string.ascii_letters + string.digits) for c in your_data:    if c not in s:       d[c] += 1 print d 


回答5:

Use

string += '=' * (-len(string) % 4)  # restore stripped '='s 

Credit goes to a comment somewhere here.

>>> import base64  >>> enc = base64.b64encode('1')  >>> enc >>> 'MQ=='  >>> base64.b64decode(enc) >>> '1'  >>> enc = enc.rstrip('=')  >>> enc >>> 'MQ'  >>> base64.b64decode(enc) ... TypeError: Incorrect padding  >>> base64.b64decode(enc + '=' * (-len(enc) % 4)) >>> '1'  >>>  


回答6:

Before attempting any other thing, try to use base64.urlsafe_b64decode(s).

Decode string s using a URL-safe alphabet, which substitutes - instead of + and _ instead of / in the standard Base64 alphabet.



回答7:

Simply add additional characters like "=" or any other and make it a multiple of 4 before you try decoding the target string value. Something like;

if len(value) % 4 != 0: #check if multiple of 4     while len(value) % 4 != 0:         value = value + "="     req_str = base64.b64decode(value) else:     req_str = base64.b64decode(value) 


回答8:

Adding the padding is rather... fiddly. Here's the function I wrote with the help of the comments in this thread as well as the wiki page for base64 (it's surprisingly helpful) https://en.wikipedia.org/wiki/Base64#Padding.

import logging import base64 def base64_decode(s):     """Add missing padding to string and return the decoded base64 string."""     log = logging.getLogger()     s = str(s).strip()     try:         return base64.b64decode(s)     except TypeError:         padding = len(s) % 4         if padding == 1:             log.error("Invalid base64 string: {}".format(s))             return ''         elif padding == 2:             s += b'=='         elif padding == 3:             s += b'='         return base64.b64decode(s) 


回答9:

In case this error came from a web server: Try url encoding your post value. I was POSTing via "curl" and discovered I wasn't url-encoding my base64 value so characters like "+" were not escaped so the web server url-decode logic automatically ran url-decode and converted + to spaces.

"+" is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode.



回答10:

In my case I faced that error while parsing an email. I got the attachment as base64 string and extract it via re.search. Eventually there was a strange additional substring at the end.

dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz MHtPcFwyNTZbezU/VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU/VzheXDM0MXFcMzEx KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK  --_=ic0008m4wtZ4TqBFd+sXC8-- 

When I deleted --_=ic0008m4wtZ4TqBFd+sXC8-- and strip the string then parsing was fixed up.

So my advise is make sure that you are decoding a correct base64 string.



回答11:

You should use

base64.b64decode(b64_string, ' /') 

By default, the altchars are '+/'.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!