utf-8 | 易学教程

When is padding required for encryption?

阅读更多关于 When is padding required for encryption?

问题 I asked a question here why AES java decryption return extra characters? about getting extra characters when I decrypt the encrypted data. Thanks to a comment by user "Ebbe M. Pedersen" I now understand that the problem is not using the same padding mechanism in both the PHP and Android Java code. So I changed the Java code to Java code public class encryption { private String iv = "fedcba9876543210";//Dummy iv (CHANGE IT!) private IvParameterSpec ivspec; private SecretKeySpec keyspec;

When is padding required for encryption?

阅读更多关于 When is padding required for encryption?

UTF-8 encoding problems with R

阅读更多关于 UTF-8 encoding problems with R

问题 Trying to parse Senate statements from the Mexican Senate, but having trouble with UTF-8 encodings of the web page. This html comes through clearly: library(rvest) Senate<-html("http://comunicacion.senado.gob.mx/index.php/informacion/versiones/19675-version-estenografica-de-la-reunion-ordinaria-de-las-comisiones-unidas-de-puntos-constitucionales-de-anticorrupcion-y-participacion-ciudadana-y-de-estudios-legislativos-segunda.html") Here is an example of a bit of the webpage: "CONTINÚA EL

Encoding problem (UTF-8) in PHP

阅读更多关于 Encoding problem (UTF-8) in PHP

问题 I want to output the following string in PHP: ä ö ü ß € Therefore, I've encoded it to utf8 manually: Ã¤ Ã¶ Ã¼ ÃŸ Â€ So my script is: <?php header('content-type: text/html; charset=utf-8'); echo 'Ã¤ Ã¶ Ã¼ ÃŸ Â€'; ?> The first 4 characters are correct (ä ö ü ß) but unfortunately the € sign isn't correct: ä ö ü ß Here you can see it. Can you tell me what I've done wrong? My editor (Notepad++) has settings for Encoding (Ansi/UTF-8) and Format (Windows/Unix). Do I have to change them? I hope you

converting binary to utf-8 in python

阅读更多关于 converting binary to utf-8 in python

问题 I have a binary like this: 1101100110000110110110011000001011011000101001111101100010101000 and I want to convert it to utf-8. how can I do this in python? 回答1: Cleaner version: >>> test_string = '1101100110000110110110011000001011011000101001111101100010101000' >>> print ('%x' % int(test_string, 2)).decode('hex').decode('utf-8') نقاب Inverse (from @Robᵩ's comment): >>> '{:b}'.format(int(u'نقاب'.encode('utf-8').encode('hex'), 16)) 1:

Convert QString into QByteArray with either UTF-8 or Latin1 encoding

阅读更多关于 Convert QString into QByteArray with either UTF-8 or Latin1 encoding

问题 I would like to covert a QString into either a utf8 or a latin1 QByteArray, but today I get everything as utf8. And I am testing this with some char in the higher segment of latin1 higher than 0x7f, where the german ü is a good example. If I do like this: QString name("\u00fc"); // U+00FC = ü QByteArray utf8; utf8.append(name); qDebug() << "utf8" << name << utf8.toHex(); QByteArray latin1; latin1.append(name.toLatin1()); qDebug() << "Latin1" << name << latin1.toHex(); QTextCodec *codec =

UnicodeDecodeError: 'utf-8' codec can't decode byte error

阅读更多关于 UnicodeDecodeError: 'utf-8' codec can't decode byte error

问题 I'm trying to get a response from urllib and decode it to a readable format. The text is in Hebrew and also contains characters like { and / top page coding is: # -*- coding: utf-8 -*- raw string is: b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3

How to convert utf-8 fancy quotes to neutral quotes

阅读更多关于 How to convert utf-8 fancy quotes to neutral quotes

问题 I'm writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can't process correctly. Fancy quotes show up quite often (u'\u201c'). Is there a quick and easy (and smart) way of replacing those with the neutral ascii-supported quotes, so I can just write line.encode('ascii') to the csv file? I have tried to find the left quote and replace it: val = line.find(u'\u201c') if val >= 0: line[val] = '"' But to

Decoding if it's not unicode

阅读更多关于 Decoding if it's not unicode

问题 I want my function to take an argument that could be an unicode object or a utf-8 encoded string. Inside my function, I want to convert the argument to unicode. I have something like this: def myfunction(text): if not isinstance(text, unicode): text = unicode(text, 'utf-8') ... Is it possible to avoid the use of isinstance? I was looking for something more duck-typing friendly. During my experiments with decoding, I have run into several weird behaviours of Python. For instance: >>> u'hello'

Protocol buffers and UTF-8

阅读更多关于 Protocol buffers and UTF-8

问题 The history of Encoding Schemes / multiple Operating Systems and Endian-nes have led to a mess in terms of encoding all forms of string data (--i.e., all alphabets); for this reason protocol buffers only deals with ASCII or UTF-8 in its string types, and I can't see any polymorphic overloads that accept the C++ wstring. The question then is how is one expected to get a UTF-16 string into a protocol buffer ? Presumably I need to keep the data as a wstring in my application code and then