character-encoding

server-side includes and character encoding

和自甴很熟 提交于 2019-12-23 13:24:09
问题 I created a static website in which each page has the following structure: Common stuff like header, menu, etc. Page specific stuff in main content div Footer In this website, all the common content is duplicated in each page. In order to improve the maintainability I refactored the pages to use server-side includes (SSI) so that the common content is not duplicated. The structure of each page is now SSI for Common stuff like header, menu, etc. Page specific stuff in main content div SSI for

Using special characters in a string argument to the awk match function. Current locale settings

纵然是瞬间 提交于 2019-12-23 12:45:21
问题 I have a problem using the match function in awk on a string containing special characters. Consider the file test.awk : { match($0,"(^.*)kon",a); print a[1]; } and a corresponding test file "test.txt" with contents "Testing Håkon" (note the norwegian character "å"). The file is encoded in "iso-8859-1" with a length of 14 bytes. The hex dump of the file is given by xxd -p test.txt as 54657374696e672048e56b6f6e0a From which we can see that the norwegian character "å" has been encoded with the

How to output a utf-8 string list as it is in python?

我与影子孤独终老i 提交于 2019-12-23 11:57:07
问题 Well, character encoding and decoding sometimes frustrates me a lot. So we know u'\u4f60\u597d' is the utf-8 encoding of 你好 , >>> print hellolist [u'\u4f60\u597d'] >>> print hellolist[0] 你好 Now what I really want to get from the output or write to a file is [u'你好'] , but it's [u'\u4f60\u597d'] all the time, so how do you do it? 回答1: When you print (or write to a file) a list it internally calls the str() method of the list , but list internally calls repr() on its elements. repr() returns the

Default code page for each language version of Windows

删除回忆录丶 提交于 2019-12-23 10:53:38
问题 Where can I find information about which code page is default for each language version of Windows? I.e the "ANSI" code page for each language version. I've found the Code Pages Supported by Windows, but I cannot find the defaults for each language. I'm guessing that for instance, Windows-1253 (Greek) is the default when installing the Greek language version. But what about the other code pages? And is Windows-1253 the default for any other language version? 回答1: You can enumerate all the

Understanding character encoding in typical Java web app

冷暖自知 提交于 2019-12-23 10:16:55
问题 Some pseudocode: String a = "A bunch of text"; //UTF-16 saveTextInDb(a); //Write to Oracle VARCHAR(15) column String b = readTextFromDb(); //UTF-16 out.write(b); //Write to http response When you save the Java String (UTF-16) to Oracle VARCHAR(15) does Oracle also store this as UTF-16? Does the length of an Oracle VARCHAR refer to number of Unicode characters (and not number of bytes)? When we write b to the ServletResponse is this being written as UTF-16 or are we by default converting to

Java XMLReader not clearing multi-byte UTF-8 encoded attributes

老子叫甜甜 提交于 2019-12-23 10:06:50
问题 I've got a really strange situation where my SAX ContentHandler is being handed bad Attributes by XMLReader. The document being parsed is UTF-8 with multi-byte characters inside XML attributes. What appears to happen is that these attributes are being accumulated each time my handler is called. So rather than being passed in succession, they get concatenated onto the previous node's value. Here is an example which demonstrates this using public data (Wikipedia). public class MyContentHandler

Java XMLReader not clearing multi-byte UTF-8 encoded attributes

China☆狼群 提交于 2019-12-23 10:05:43
问题 I've got a really strange situation where my SAX ContentHandler is being handed bad Attributes by XMLReader. The document being parsed is UTF-8 with multi-byte characters inside XML attributes. What appears to happen is that these attributes are being accumulated each time my handler is called. So rather than being passed in succession, they get concatenated onto the previous node's value. Here is an example which demonstrates this using public data (Wikipedia). public class MyContentHandler

Powershell curl double quotes

落花浮王杯 提交于 2019-12-23 09:47:52
问题 I am trying to invoke a curl command in powershell and pass some JSON information. Here is my command: curl -X POST -u username:password -H "Content-Type: application/json" -d "{ "fields": { "project": { "key": "key" }, "summary": "summary", "description": "description - here", "type": { "name": "Task" }}}" I was getting globbing errors and "unmatched braces" and host could not be resolved, etc. Then I tried prefixing the double quotes in the string with the backtick character, but it could

Why does printf( “%c”, 1) return smiley face instead of coded char for 1

霸气de小男生 提交于 2019-12-23 09:34:03
问题 This is my code #include <stdio.h> int x,y; int main( void ) { for ( x = 0; x < 10; x++, printf( "\n" ) ) for ( y = 0; y < 10; y++ ) printf( "%c", 1 ); return 0; } It returns smiley faces. I searched everywhere for a code for smiley face or a code for 1 but I didn't manage to find any links whatsoever or any explanation why char value for 1 returns smiley face, when the ascii code for 1 is SOH. I researched answers for this question but I didn't find any answers that explain why this happens.

Encoding issue when using Nokogiri replace

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-23 09:27:41
问题 I have this code: # encoding: utf-8 require 'nokogiri' s = "<a href='/path/to/file'>Café Verona</a>".encode('UTF-8') puts "Original string: #{s}" @doc = Nokogiri::HTML::DocumentFragment.parse(s) links = @doc.css('a') only_text = 'Café Verona'.encode('UTF-8') puts "Replacement text: #{only_text}" links.first.replace(only_text) puts @doc.to_html However, the output is this: Original string: <a href='/path/to/file'>Café Verona</a> Replacement text: Café Verona Café Verona Why does the text in