character-encoding | 易学教程

Parse XML in Python with encoding other than utf-8

阅读更多关于 Parse XML in Python with encoding other than utf-8

问题 Any clue on how to parse xml in python that has: encoding='Windows-1255' in it? At least the lxml.etree parser won't even look at the string when there's an "encoding" tag in the XML header which isn't "utf-8" or "ASCII". Running the following code fails with: ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. from lxml import etree parser = etree.XMLParser(encoding='utf-8') def convert_xml_to_utf8(xml_str):

How to open Arabic text file with correct encoding in Visual Studio

阅读更多关于 How to open Arabic text file with correct encoding in Visual Studio

问题 I have a C# file that has some arabic text in it, I got the file from another source, the arabic text is now scrambled. looking like this ("ÇáãæÇÞÚ ÇáÞÇÈáÉ ááÊØæíÑ ÇáÓíÇÍì"), I tried to save the file in another encoding (UTF-8) but still same result, I desperately need to read this arabic text as this is the only back up we have Thanks 回答1: Try right-clicking the file in VS solution explorer, then choose: Open With... -> CSharp Editor with Encoding This should force VS to read the file with a

How to open Arabic text file with correct encoding in Visual Studio

阅读更多关于 How to open Arabic text file with correct encoding in Visual Studio

How to identify character encoding from website?

阅读更多关于 How to identify character encoding from website?

问题 What I'm trying to do: I'm getting from a database a list of uris and download them, removing the stopwords and counting the frequency that the words appears in the webpage, then trying to save in the mongodb. The Problem: When I try to save the result in the database I get the error bson.errors.invalidDocument: the document must be a valid utf-8 it appears to be related to the codes '\xc3someotherstrangewords', '\xe2something' when I'm processing the webpages I try remove the punctuation,

Is it possible to “sniff” the Character encoding?

阅读更多关于 Is it possible to “sniff” the Character encoding?

问题 I have a webpage that accepts CSV files. These files may be created in a variety of places. (I think) there is no way to specify the encoding in a CSV file - so I can not reliably treat all of them as utf-8 or any other encoding. Is there a way to intelligently guess the encoding of the CSV I am getting? I am working with Python, but willing to work with language agnostic methods too. 回答1: There is no correct way to determine the encoding of a file by looking at only the file itself, but you

Store Gtk.Textbuffer in SQL database. Encoding troubles

阅读更多关于 Store Gtk.Textbuffer in SQL database. Encoding troubles

问题 I'm working on a note taking app using python2/Gtk3/Glade . The notes are stored in a MySQL Database and displayed in a TextView widget . I can load/store/display plain text fine. However I want the ability to add images to the note page, and store them in the Database.so the data has to be serialised and I'm having some trouble figuring out how to encode/decode the serialised data going in and out of the Database. I'm getting unicode start byte errors. If was working with files I could just

Cleaning SQL “Incorrect string value” Error from PHP

阅读更多关于 Cleaning SQL “Incorrect string value” Error from PHP

问题 I've seem this question a million times, but everyone seems to want to solve the problem in the database. I do not. I'm getting this error when parsing a large text file, picking out what I need and inserting it into my database. Out of 24 thousand rows or so, 30 or so have invalid characters in them. Here is an example of the error, followed by the query that caused it: [Query Error: Incorrect string value: '\xEF\xBC\x89' for column 'company' at row 1] [INSERT INTO mac_address_db_new (hex

Unable to change encoding of text files in Windows

阅读更多关于 Unable to change encoding of text files in Windows

问题 I have some text files with different encodings. Some of them are UTF-8 and some others are windows-1251 encoded. I tried to execute following recursive script to encode it all to UTF-8 . Get-ChildItem *.nfo -Recurse | ForEach-Object { $content = $_ | Get-Content Set-Content -PassThru $_.Fullname $content -Encoding UTF8 -Force} After that I am unable to use files in my Java program, because UTF-8 encoded has also wrong encoding, I couldn't get back original text. In case of windows-1251

UnicodeEncodeError: 'gbk' codec can't encode character '\ue13b' in position 25: illegal multibyte sequence

阅读更多关于 UnicodeEncodeError: 'gbk' codec can't encode character '\ue13b' in position 25: illegal multibyte sequence

问题 Error : UnicodeEncodeError: 'gbk' codec can't encode character '\ue13b' in position 25: illegal multibyte sequence The file encoding format is utf-8, and there is an unrecognized word in the file when it is read. ‘左足趾麻木’ Code : for line in open(label_filepath, encoding='utf-8'): print(line) 回答1: The error is happening when Python tries to print. When printing, that is writing to sys.stdout, Python encodes the text to be printed with the encoding expected by the terminal. In this case the

Determining text file encoding schema

阅读更多关于 Determining text file encoding schema

问题 I am trying to create a method that can detect the encoding schema of a text file. I know there are many out there, but I know for sure my text file with be either ASCII , UTF-8 , or UTF-16 . I only need to detect these three. Anyone know a way to do this? 回答1: Use the StreamReader to identify the encoding. Example: using(var r = new StreamReader(filename, Encoding.Default)) { richtextBox1.Text = r.ReadToEnd(); var encoding = r.CurrentEncoding; } 回答2: First, open the file in binary mode and