character-encoding | 易学教程

server-side includes and character encoding

阅读更多关于 server-side includes and character encoding

问题 I created a static website in which each page has the following structure: Common stuff like header, menu, etc. Page specific stuff in main content div Footer In this website, all the common content is duplicated in each page. In order to improve the maintainability I refactored the pages to use server-side includes (SSI) so that the common content is not duplicated. The structure of each page is now SSI for Common stuff like header, menu, etc. Page specific stuff in main content div SSI for

Using special characters in a string argument to the awk match function. Current locale settings

阅读更多关于 Using special characters in a string argument to the awk match function. Current locale settings

问题 I have a problem using the match function in awk on a string containing special characters. Consider the file test.awk : { match($0,"(^.*)kon",a); print a[1]; } and a corresponding test file "test.txt" with contents "Testing Håkon" (note the norwegian character "å"). The file is encoded in "iso-8859-1" with a length of 14 bytes. The hex dump of the file is given by xxd -p test.txt as 54657374696e672048e56b6f6e0a From which we can see that the norwegian character "å" has been encoded with the

How to output a utf-8 string list as it is in python?

阅读更多关于 How to output a utf-8 string list as it is in python?

问题 Well, character encoding and decoding sometimes frustrates me a lot. So we know u'\u4f60\u597d' is the utf-8 encoding of 你好 , >>> print hellolist [u'\u4f60\u597d'] >>> print hellolist[0] 你好 Now what I really want to get from the output or write to a file is [u'你好'] , but it's [u'\u4f60\u597d'] all the time, so how do you do it? 回答1: When you print (or write to a file) a list it internally calls the str() method of the list , but list internally calls repr() on its elements. repr() returns the

Default code page for each language version of Windows

阅读更多关于 Default code page for each language version of Windows

问题 Where can I find information about which code page is default for each language version of Windows? I.e the "ANSI" code page for each language version. I've found the Code Pages Supported by Windows, but I cannot find the defaults for each language. I'm guessing that for instance, Windows-1253 (Greek) is the default when installing the Greek language version. But what about the other code pages? And is Windows-1253 the default for any other language version? 回答1: You can enumerate all the

Understanding character encoding in typical Java web app

阅读更多关于 Understanding character encoding in typical Java web app

问题 Some pseudocode: String a = "A bunch of text"; //UTF-16 saveTextInDb(a); //Write to Oracle VARCHAR(15) column String b = readTextFromDb(); //UTF-16 out.write(b); //Write to http response When you save the Java String (UTF-16) to Oracle VARCHAR(15) does Oracle also store this as UTF-16? Does the length of an Oracle VARCHAR refer to number of Unicode characters (and not number of bytes)? When we write b to the ServletResponse is this being written as UTF-16 or are we by default converting to

Java XMLReader not clearing multi-byte UTF-8 encoded attributes

阅读更多关于 Java XMLReader not clearing multi-byte UTF-8 encoded attributes

问题 I've got a really strange situation where my SAX ContentHandler is being handed bad Attributes by XMLReader. The document being parsed is UTF-8 with multi-byte characters inside XML attributes. What appears to happen is that these attributes are being accumulated each time my handler is called. So rather than being passed in succession, they get concatenated onto the previous node's value. Here is an example which demonstrates this using public data (Wikipedia). public class MyContentHandler

Java XMLReader not clearing multi-byte UTF-8 encoded attributes

阅读更多关于 Java XMLReader not clearing multi-byte UTF-8 encoded attributes

Powershell curl double quotes

阅读更多关于 Powershell curl double quotes

问题 I am trying to invoke a curl command in powershell and pass some JSON information. Here is my command: curl -X POST -u username:password -H "Content-Type: application/json" -d "{ "fields": { "project": { "key": "key" }, "summary": "summary", "description": "description - here", "type": { "name": "Task" }}}" I was getting globbing errors and "unmatched braces" and host could not be resolved, etc. Then I tried prefixing the double quotes in the string with the backtick character, but it could

Why does printf( “%c”, 1) return smiley face instead of coded char for 1

阅读更多关于 Why does printf( “%c”, 1) return smiley face instead of coded char for 1

问题 This is my code #include <stdio.h> int x,y; int main( void ) { for ( x = 0; x < 10; x++, printf( "\n" ) ) for ( y = 0; y < 10; y++ ) printf( "%c", 1 ); return 0; } It returns smiley faces. I searched everywhere for a code for smiley face or a code for 1 but I didn't manage to find any links whatsoever or any explanation why char value for 1 returns smiley face, when the ascii code for 1 is SOH. I researched answers for this question but I didn't find any answers that explain why this happens.

Encoding issue when using Nokogiri replace

阅读更多关于 Encoding issue when using Nokogiri replace

问题 I have this code: # encoding: utf-8 require 'nokogiri' s = "<a href='/path/to/file'>Café Verona</a>".encode('UTF-8') puts "Original string: #{s}" @doc = Nokogiri::HTML::DocumentFragment.parse(s) links = @doc.css('a') only_text = 'Café Verona'.encode('UTF-8') puts "Replacement text: #{only_text}" links.first.replace(only_text) puts @doc.to_html However, the output is this: Original string: <a href='/path/to/file'>Café Verona</a> Replacement text: Café Verona CafÃ© Verona Why does the text in