问题
I need to get an array buffer from an http request sending me a base64 answer.
For this request, I can't use XMLHttpRequest.responseType="arraybuffer".
The response I get from this request is read through xhr.responseText. Hence it's encoded as a DOMString. I'm trying to get it back as an array buffer.
I've tried to go back to the base64 from the DOMString using btoa(mysString) or window.btoa(unescape(encodeURIComponent(str))) but the first option just fails, whereas the second option doesn't give the same base64. Example of the first few characters from each base64:
Incoming : UEsDBBQACAgIACp750oAAAAAAAAAAAAAAAALAAAAX3JlbHMvLnJlbH
After the second processing: UEsDBBQACAgIAO+/ve+/ve+/vUoAAAAAAAAAAAAAAAALAAAAX3JlbHMvLnJlbH
As you can see a part of it is similar, but some parts are way off. What am I missing to get it right?
回答1:
I have got same issue too.
The solution (I ran at Chrome(68.0.3440.84))
let url = ''
let iso_8859_15_table = { 338: 188, 339: 189, 352: 166, 353: 168, 376: 190, 381: 180, 382: 184, 8364: 164 }
function iso_8859_15_to_uint8array(iso_8859_15_str) {
let buf = new ArrayBuffer(iso_8859_15_str.length);
let bufView = new Uint8Array(buf);
for (let i = 0, strLen = iso_8859_15_str.length; i < strLen; i++) {
let octet = iso_8859_15_str.charCodeAt(i);
if (iso_8859_15_table.hasOwnProperty(octet))
octet = iso_8859_15_table[octet]
bufView[i] = octet;
if(octet < 0 || 255 < octet)
console.error(`invalid data error`)
}
return bufView
}
req = new XMLHttpRequest();
req.overrideMimeType('text/plain; charset=ISO-8859-15');
req.onload = () => {
console.log(`Uint8Array : `)
var uint8array = iso_8859_15_to_uint8array(req.responseText)
console.log(uint8array)
}
req.open("get", url);
req.send();
Below is explanation what I learned to solve it.
Explanation
Why some parts are way off?
because TextDecoder cause data loss (Your case is utf-8).
For example, let's talk about UTF-8
variable width character encoding for Unicode.
It has
rules(This will become problem.) for reasons such as variable length characteristics and ASCII compatibility, etc.so, decoder may replace a non-conforming characters to replacement character such as U+003F(?, Question mark) or U+FFFD(�, Unicode replacement character).
in utf-8 case, 0~127 of values are stable, 128~255 of values are unstable. 128~255 will converted to U+FFFD
Are other Text Decoders safe except UTF-8?
No. In most cases, not safe from rules.
UTF-8 is also unrecoverable. (128~255 are set to U+FFFD)
If the binary data and the decoded result can be corresponded to one-to-one, they can be recovered.
How to solve it?
- Finds recoverable Text Decoders.
- Force MIME type to recoverable charset of the incoming data.
xhr_object.overrideMimeType('text/plain; charset=ISO-8859-15') - Recover binary data from string with
recover tablewhen received.
Finds recoverable Text Decoders.
To recover, avoid the situation when decoded results' are duplicated.
The following code is a simple example, so there may be missing recoverable text decoders because it only consider Uint8Array.
let bufferView = new Uint8Array(256);
for (let i = 0; i < 256; i++)
bufferView[i] = i;
let recoverable = []
let decoding = ['utf-8', 'ibm866', 'iso-8859-2', 'iso-8859-3', 'iso-8859-4', 'iso-8859-5', 'iso-8859-6', 'iso-8859-7', 'iso-8859-8', 'iso-8859-8i', 'iso-8859-10', 'iso-8859-13', 'iso-8859-14', 'iso-8859-15', 'iso-8859-16', 'koi8-r', 'koi8-u', 'macintosh', 'windows-874', 'windows-1250', 'windows-1251', 'windows-1252', 'windows-1253', 'windows-1254', 'windows-1255', 'windows-1256', 'windows-1257', 'windows-1258', 'x-mac-cyrillic', 'gbk', 'gb18030', 'hz-gb-2312', 'big5', 'euc-jp', 'iso-2022-jp', 'shift-jis', 'euc-kr', 'iso-2022-kr', 'utf-16be', 'utf-16le', 'x-user-defined', 'ISO-2022-CN', 'ISO-2022-CN-ext']
for (let dec of decoding) {
try {
let decodedText = new TextDecoder(dec).decode(bufferView);
let loss = 0
let recoverTable = {}
let unrecoverable = 0
for (let i = 0; i < decodedText.length; i++) {
let charCode = decodedText.charCodeAt(i)
if (charCode != i)
loss++
if (!recoverTable[charCode])
recoverTable[charCode] = i
else
unrecoverable++
}
let tableCnt = 0
for (let props in recoverTable) {
tableCnt++
}
if (tableCnt == 256 && unrecoverable == 0){
recoverable.push(dec)
setTimeout(()=>{
console.log(`[${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
},10)
}
else {
console.log(`!! [${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
}
} catch (e) {
console.log(`!! [${dec}] : not supported.`)
}
}
setTimeout(()=>{
console.log(`recoverable Charset : ${recoverable}`)
}, 10)
In my console, this return
recoverable Charset : ibm866,iso-8859-2,iso-8859-4,iso-8859-5,iso-8859-10,iso-8859-13,iso-8859-14,iso-8859-15,iso-8859-16,koi8-r,koi8-u,macintosh,windows-1250,windows-1251,windows-1252,windows-1254,windows-1256,windows-1258,x-mac-cyrillic,x-user-defined
And I used iso-8859-15 at beginning of this answer. (It has Smallest table size.)
Additional test) Comparison between UTF-8's and ISO-8859-15's result
Check U+FFFD is really disappeared when using ISO-8859-15.
function requestAjax(url, charset) {
let req = new XMLHttpRequest();
if (charset)
req.overrideMimeType(`text/plain; charset=${charset}`);
else
charset = 'utf-8';
req.open('get', url);
req.onload = () => {
console.log(`==========\n${charset}`)
console.log(`${req.responseText.split('', 50)}\n==========`);
console.log('\n')
}
req.send();
}
var url = '';
requestAjax(url, 'ISO-8859-15');
requestAjax(url);
Bottom line
- Recover binary data to, from string needs some additional job.
- Find recoverable text encoder/decoder.
- Make a recover table
- Recover with the table.
- (You can refer to the very top of code.)
- For use this trick, force MIME type of incoming data to desired charset.
来源:https://stackoverflow.com/questions/44974003/recover-arraybuffer-from-xhr-responsetext