character-encoding

How do I fix charset problems in .gs script?

五迷三道 提交于 2021-02-06 09:30:56
问题 I have a problem with charsets. I parsed a csv file in google-app-engine and I'm posting to an uiapp table. But I checked special characters like áéíóú and those are not well displayed (?square symbol). When I was setting up my code I played writing the string imported to a google docs document and it worked the same. some advice please? I search for: a global charset definition to the code. or string var transformation that makes the chars appear like I want to. (avoiding html &number

How do I fix charset problems in .gs script?

白昼怎懂夜的黑 提交于 2021-02-06 09:27:06
问题 I have a problem with charsets. I parsed a csv file in google-app-engine and I'm posting to an uiapp table. But I checked special characters like áéíóú and those are not well displayed (?square symbol). When I was setting up my code I played writing the string imported to a google docs document and it worked the same. some advice please? I search for: a global charset definition to the code. or string var transformation that makes the chars appear like I want to. (avoiding html &number

What can cause git to mess with character encoding?

北慕城南 提交于 2021-02-06 07:59:56
问题 Edit: git does not mess with character encoding. This is still here to share knowlege and avoid others making the same mistake. The context : My enterprise uses an svn repository. I'm using git-svn as a client to interact with this repository. All text files in the project are (and must be) encoded with windows default encoding (cp-....). I use git-extensions, and sometimes the command line to pilot git. What I did : During the last 3 days, I was working on a new feature, and I did a number

What can cause git to mess with character encoding?

半城伤御伤魂 提交于 2021-02-06 07:57:09
问题 Edit: git does not mess with character encoding. This is still here to share knowlege and avoid others making the same mistake. The context : My enterprise uses an svn repository. I'm using git-svn as a client to interact with this repository. All text files in the project are (and must be) encoded with windows default encoding (cp-....). I use git-extensions, and sometimes the command line to pilot git. What I did : During the last 3 days, I was working on a new feature, and I did a number

Remove all hex characters from string in Python

感情迁移 提交于 2021-02-06 03:01:50
问题 Although there are similar questions, I can't seem to find a working solution for my case: I'm encountering some annoying hex chars in strings, e.g. '\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah' What I need is to remove these hex \xHH characters, and them alone, in order to get the following result: 'http://www.google.com blah blah#%#@$^blah' decoding doesn't help: s.decode('utf8') # u'\u201chttp://www.google.com\u201d blah blah#%#@$^blah' How can I achieve that? 回答1:

Remove all hex characters from string in Python

早过忘川 提交于 2021-02-06 03:01:28
问题 Although there are similar questions, I can't seem to find a working solution for my case: I'm encountering some annoying hex chars in strings, e.g. '\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah' What I need is to remove these hex \xHH characters, and them alone, in order to get the following result: 'http://www.google.com blah blah#%#@$^blah' decoding doesn't help: s.decode('utf8') # u'\u201chttp://www.google.com\u201d blah blah#%#@$^blah' How can I achieve that? 回答1:

Can the Encoding API decode a Stream/noncontinuous bytes?

China☆狼群 提交于 2021-02-05 11:12:26
问题 Usually we can get a string from a byte[] using something like var result = Encoding.UTF8.GetString(bytes); However, I am having this problem: my input is an IEnumerable<byte[]> bytes (implementation can be any structure of my choice). It is not guaranteed a character is within a byte[] (for example, a 2-byte UTF8 char can have its 1st byte in bytes[1][length - 1] and its 2nd byte in bytes[2][0]). Is there anyway to decode them without merging/copying all the array together? UTF8 is main

Can the Encoding API decode a Stream/noncontinuous bytes?

送分小仙女□ 提交于 2021-02-05 11:10:21
问题 Usually we can get a string from a byte[] using something like var result = Encoding.UTF8.GetString(bytes); However, I am having this problem: my input is an IEnumerable<byte[]> bytes (implementation can be any structure of my choice). It is not guaranteed a character is within a byte[] (for example, a 2-byte UTF8 char can have its 1st byte in bytes[1][length - 1] and its 2nd byte in bytes[2][0]). Is there anyway to decode them without merging/copying all the array together? UTF8 is main

Beautiful Soup default decode charset?

旧时模样 提交于 2021-02-05 08:44:07
问题 I have a huge set of web pages with different encodings, and I try to parse it using Beautiful Soup. As I have noticed, BS detects encoding using meta-charset or xml-encoding tags. But there are documents with no such tags or typos in charset name - and BS fails on all of them. I suppose it's default guess is utf-8, which is wrong. Luckily, all such pages (or nearly all of them) have the same encoding. Is there any way to set it as default? I've also tried to grep charset and use iconv to

Strange results when converting from byte array to string

心已入冬 提交于 2021-02-05 08:44:06
问题 I get strange results when converting byte array to string and then converting the string back to byte array. Try this: byte[] b = new byte[1]; b[0] = 172; string s = Encoding.ASCII.GetString(b); byte[] b2 = Encoding.ASCII.GetBytes(s); MessageBox.Show(b2[0].ToString()); And the result for me is not 172 as I'd expect but... 63. Why does it happen? 回答1: Why does it happen? Because ASCII only contains values up to 127. When faced with binary data which is invalid for the given encoding, Encoding