encoding | 易学教程

Hex characters in varchar() is actually ascii. Need to decode it

阅读更多关于 Hex characters in varchar() is actually ascii. Need to decode it

问题 This is such an edge-case of a question, I'd be surprised if there is an easy way to do this. I have a MS SQL DB with a field of type varchar(255). It contains a hex string which is actually a Guid when you decode it using an ascii decoder. I know that sounds REALLY weird but here's an example: The contents of the field: "38353334373838622D393030302D343732392D383436622D383161336634396339663931" What it actually represents: "8534788b-9000-4729-846b-81a3f49c9f91" I need a way to decode this,

How to encode Java files in UTF-8 using Apache Ant?

阅读更多关于 How to encode Java files in UTF-8 using Apache Ant?

问题 In my build.xml file I fetch some Java files by cxf. Some of these Java files need to be encoded in UTF-8. How can I use Ant to change the encoding to UTF-8? PS: I found instructions for how to set the encoding for javac to UTF-8, but prior to javac I need Java files to be in UTF-8. Otherwise I get an error: warning: unmappable character for encoding utf-8 Here is my code: <macrodef name="lpwservice"> <attribute name="name"/> <attribute name="package"/> <sequential> <property name="wsdlfile"

DBF - encoding cp1250

阅读更多关于 DBF - encoding cp1250

问题 I have dbf database encoded in cp1250 and I am reading this database using folowing code: import csv from dbfpy import dbf import os import sys filename = sys.argv[1] if filename.endswith('.dbf'): print "Converting %s to csv" % filename csv_fn = filename[:-4]+ ".csv" with open(csv_fn,'wb') as csvfile: in_db = dbf.Dbf(filename) out_csv = csv.writer(csvfile) names = [] for field in in_db.header.fields: names.append(field.name) #out_csv.writerow(names) for rec in in_db: out_csv.writerow(rec

UTF-8 encode URLs

阅读更多关于 UTF-8 encode URLs

问题 Info: I've a program which generates XML sitemaps for Google Webmaster Tools (among other things). GWTs is giving me errors for some sitemaps because the URLs contain character sequences like ã¾, ã‹, ã€, etc. ** GWTs says: We require your Sitemap file to be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters: & , ' , " , < , > . The special characters are excaped in the XML

Hash keys encoding: Why do I get here with Devel::Peek::Dump two different results?

阅读更多关于 Hash keys encoding: Why do I get here with Devel::Peek::Dump two different results?

问题 Why do I get here with Devel::Peek::Dump two different results? #!/usr/bin/env perl use warnings; use 5.014; use utf8; binmode STDOUT, ':encoding(utf-8)'; use Devel::Peek; my %hash1 = ( 'müller' => 1 ); say Dump $_ for keys %hash1; my %hash2; $hash2{'müller'} = 1; say Dump $_ for keys %hash2; Output: SV = PV(0x753270) at 0x76d230 REFCNT = 2 FLAGS = (POK,pPOK,UTF8) PV = 0x759750 "m\303\274ller"\0 [UTF8 "m\x{fc}ller"] CUR = 7 LEN = 8 SV = PV(0x753270) at 0x7d75a8 REFCNT = 2 FLAGS = (POK,FAKE

Check if UTF-8 character requires maximum three bytes

阅读更多关于 Check if UTF-8 character requires maximum three bytes

问题 I need to save a user input to database to column with utf8_general_ci encoding which requires maximum three bytes per code point. But if the user input contains characters which uses four bytes (for example emojis), the input is not saved into column. What I need is to check the input to only contain characters that uses maximum three bytes. I know I can just change column encoding to utf8mb4 but I don't want to do it. So how can I do something like this: if (maxThreeBytes("😄")) { //return

Efficient small byte-arrays in C#

阅读更多关于 Efficient small byte-arrays in C#

问题 I have a huge collection of very small objects. To ensure the data is stored very compactly I rewrote the class to store all information within a byte-array with variable-byte encoding. Most instances of these millions of objects need only 3 to 7 bytes to store all the data . After memory-profiling I found out that these byte-arrays always take at least 32 bytes . Is there a way to store the information more compactly than bit-fiddled into a byte[]? Would it be better to point to an unmanaged

How to correctly get Unicode text input from QPlainTextEdit? [duplicate]

阅读更多关于 How to correctly get Unicode text input from QPlainTextEdit? [duplicate]

问题 This question already has answers here : UnicodeEncodeError: 'charmap' codec can't encode characters (6 answers) Python 'ascii' codec can't encode character with request.get (1 answer) Closed last year . Just running the application I got the correct results on the QPlainTextEdit area on the screen: But when clicking on the button Start Simulation and recovering the input from it with QPlainTextEdit.toPlainText() , the output goes invalid: def handle_first_input_text(self): textEdit = self

XML encoding issue

阅读更多关于 XML encoding issue

问题 I want to know whether there is quick way to find whether an XML document is correctly encoded in UTF-8 and does not contains any characters which is not allowed in XML UTF-8 encoding. <?xml version="1.0" encoding="utf-8"?> thanks in advance, George EDIT1: here is the content of my XML file, in both text form and in binary form. http://tinypic.com/view.php?pic=2r2akvr&s=5 I have tried to use tools like xmlstarlet to check, the result is correct (invalid because of out of range of UTF-8), but

What's the behavior the browser encoding URL?

阅读更多关于 What's the behavior the browser encoding URL?

问题 I'm doing a test, how the Firefox encoding character. But the fact confused me. HTML code: <html lang="zh_CN"> <head> <title>some Chinese character</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <img src="http://localhost/xxx" /> </body> The xxx is some Chinese characters. These character must be encode into format like %xx to transport by HTTP. First, I encoding the source file in UTF-8. use firefox to open the html file. The img label will send a