encoding | 易学教程

Convert TXT File of Unknown Encoding to String

阅读更多关于 Convert TXT File of Unknown Encoding to String

问题 How can I convert Plain Text (.txt) files to a string if the encoding type is unknown? I'm working on a feature that would allow users to import txt files into my app. This means the file could have been created in any number of apps, utilizing any of a variety of encodings that would be considered valid for a plain text file. My understanding is this could include (ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, or EBCDIC?!) Things had been going well using the

How to save pdf in proper encoding via nodejs

阅读更多关于 How to save pdf in proper encoding via nodejs

问题 So I'm trying to download a pdf file from a website with my script but the problem is that the file gets broken in the process and I'm pretty sure it's because of wrong encoding being used. I'm using request lib for downloading the file and I've set the Content-type to application-pdf My code is pretty simple:4 var fs = require('fs'); var request = require("request"); request({uri: 'xxxxxxxxxxxxxx.pdf', headers: { 'Content-type' : 'applcation/pdf' }} , function (error, response, body) { if (

Encoding conversion of a fetch response

阅读更多关于 Encoding conversion of a fetch response

问题 Inside a React Native method I'm fetching a xml encoded in ISO-8859-1. As long as the fetching is completed I'm trying to convert it to UTF-8. Here the code: const iconv = require('iconv-lite'); fetch('http://www.band.uol.com.br/rss/colunista_64.xml', { headers: { "Content-type": "text/xml; charset=ISO-8859-1" } }) .then(res=>res.text()}) .then(text => { const decodedText = iconv.decode(Buffer.from(text, 'latin1'), 'latin1') , output = iconv.encode(decodedText, 'utf8') console.log(output

URL to URI encoding changes a “%3D” to “%253D”

阅读更多关于 URL to URI encoding changes a “%3D” to “%253D”

问题 I'm having trouble encoding a URL to a URI: mUrl = "A string url that needs to be encoded for use in a new HttpGet()"; URL url = new URL(mUrl); URI uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), null); This does not do what I expect for the following URL: Passing in the String: http://m.bloomingdales.com/img?url=http%3A%2F%2Fimages.bloomingdales.com%2Fis%2Fimage%2FBLM%2Fproducts%2F3%2Foptimized%2F1140443_fpx.tif%3Fwid%3D52%26qlt%3D90%2C0%26layer%3Dcomp

Nodejs: convert string to buffer

阅读更多关于 Nodejs: convert string to buffer

问题 I'm trying to write a string to a socket (socket is called "response"). Here is the code I have sofar (I'm trying to implement a byte caching proxy...): var http = require('http'); var sys=require('sys'); var localHash={}; http.createServer(function(request, response) { var proxy = http.createClient(80, request.headers['host']) var proxy_request = proxy.request(request.method, request.url, request.headers); proxy_request.addListener('response', function (proxy_response) { proxy_response

Nodejs: convert string to buffer

阅读更多关于 Nodejs: convert string to buffer

c# Detect xml encoding from Byte Array?

阅读更多关于 c# Detect xml encoding from Byte Array?

问题 Well i have a byte array, and i know its a xml serilized object in the byte array is there any way to get the encoding from it? Im not going to deserilize it but im saving it in a xml field on a sql server... so i need to convert it to a string? 回答1: You could look at the first 40-ish bytes 1 . They should contain the document declaration (assuming it has an document declaration) which should either contain the encoding or you can assume it's UTF-8 or UTF-16, which should should be obvious

How to remove non-printable/invisible characters in ruby?

阅读更多关于 How to remove non-printable/invisible characters in ruby?

问题 Sometimes I have evil non-printable characters in the middle of a string. These strings are user input, so I must make my program receive it well instead of try to change the source of the problem. For example, they can have zero width no-break space in the middle of the string. For example, while parsing a .po file, one problematic part was the string "he is a man of god" in the middle of the file. While it everything seems correct, inspecting it with irb shows: "he is a man of god"

invalid byte 2 of 2-byte UTF-8 sequence

阅读更多关于 invalid byte 2 of 2-byte UTF-8 sequence

问题 I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8> but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence . Does anybody know what caused this problem? 回答1: Most commonly it's due to feeding ISO-8859-x (Latin-x, like Latin-1) but parser thinking it is getting UTF-8 . Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8 , and specifically such that based on first byte, second

PowerShell out-file: prevent encoding changes

阅读更多关于 PowerShell out-file: prevent encoding changes

问题 I'm currently working on some search and replace operation that I'm trying to automate using powershell. Unfortunately I recognized yesterday that we've different file encodings in our codebase (UTF8 and ASCII). Because we're doing these search and replace operations in a different branch I can't change the file encodings at this stage. If I'm running the following lines it changes all files to UCS-2 Little Eindian even though my default powershell encoding is set to iso-8859-1 (Western