encoding | 易学教程

Detect encoding in wrongly encoded UTF-8 text file

阅读更多关于 Detect encoding in wrongly encoded UTF-8 text file

问题 I have an encoding issue. I have millions of text files that I need to parse for a language data science project. Each text file is encoded as UTF-8, but I just found that some of these source files are not encoded properly. For example. I have a Chinese text file, that is encoded as UTF-8, but text in the file looks like this: Subject: »Ø¸´: ÎÒÉý¼¶µ½ When I use Python to detect the encoding of this Chinese text file: Chardet tells me the file is encoded as UTF-8: with open(path,'rb') as f:

Thai script seems to lose UTF-8 encoding in java for-each loop

阅读更多关于 Thai script seems to lose UTF-8 encoding in java for-each loop

问题 I'm trying to develop an application within Android Studio on Windows 10. PROBLEM: The following string array of Thai words: String[] myTHarr = {"มาก","เชี่ยว","แน่","ม่อน","บ้าน","พูด","เลื่อย","เมื่อ","ช่ำ","แร่"}; ...when processed by the following for-each loop: for (String s:myTHarr){ //s = à¸¡à¸²à¸� before executing any of the below code: byte[] utf8EncodedThaiArr = s.getBytes("UTF-8"); String utf8EncodedThai = new String(utf8EncodedThaiArr); //setting breakpoint here // s is still à¸¡à

Questions about Base64 encoding

阅读更多关于 Questions about Base64 encoding

问题 I have 3 questions about base64: 1) The base64 encoding purpose is binary-to-text. Isn't text going to be sent through web as binary? Then what is good for? 2) In past they used 7-bit communication systems, now it's 8-bit. Then why we still using it now? 3) How it increases the size? I just take 3-bytes with 28-bit and rearrange them to 4-bytes of 6-bit but in total they still 28-bit? 回答1: 1) The purpose is not only binary to text encoding, but also to encode text which uses specific

How to solve “a bytes-like object is required, not 'str'” in create_message() function?

阅读更多关于 How to solve “a bytes-like object is required, not 'str'” in create_message() function?

问题 I'm getting an error in creating a new message using create_message(). function listed over https://developers.google.com/gmail/api/guides/drafts. def create_message(sender, to, subject, message_text): message = MIMEText(message_text) message['to'] = to message['from'] = sender message['subject'] = subject return {'raw': base64.urlsafe_b64encode(message.as_string())} Error: TypeError: a bytes-like object is required, not 'str' 回答1: base64.urlsafe_b64encode expects bytes , but the type of

Decoding “=C3=A4” in a string

阅读更多关于 Decoding “=C3=A4” in a string

问题 I tried a lot of different things to get my string correctly displayed but I can't make it work. That's the string: f=C3=A4hrt (German word: fährt) My file is encoded in utf-8, the file is loaded within Joomla. I tried both $geschichte->inhalt = utf8_encode($geschichte->inhalt); and $geschichte->inhalt = mb_convert_encoding($geschichte->inhalt, "UTF-8"); but nothing works. I hope someone can help me... 回答1: This encoding has nothing to do with UTF-8 or such, it looks like quoted printable

Scrapy exporting weird symbols into csv file

阅读更多关于 Scrapy exporting weird symbols into csv file

问题 Ok, so here's the issue. I'm a beginner who has just started to delve into scrapy/python. I use the code below to scrape a website and save the results into a csv. When I look in the command prompt, it turns words like Officiële into Offici\xele. In the csv file, it changes it to officiÃ«le. I think this is because it's saving in unicode instead of UTF-8? I however have 0 clue how to change my code, and I've been trying all morning so far. Could anyone help me out here? I'm specifically

Unicode String in urllib.request [duplicate]

阅读更多关于 Unicode String in urllib.request [duplicate]

问题 This question already has answers here : UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3 (2 answers) Closed 1 year ago . The short version: I have a variable s = 'bär' . I need to convert s to ASCII so that s = 'b%C3%A4r' . Long version: I'm using urllib.request.urlopen() to read an mp3 pronunciation file from URL. This has worked very well, except I ran into a problem because the URLs often contain unicode characters. For example, the

Open a word document and specify encoding with PowerShell

阅读更多关于 Open a word document and specify encoding with PowerShell

问题 I'm trying to tell PowerShell to open a text file and choose a certain encoding option. By default, when opening this text file in Word manually, it tries to open it with Japanese encoding and so doesn't show certain characters correctly. I've tried lots of different things but nothing works so I'm totally stuck. This text file, amongst others, needs to be converted to PDF on a daily basis. My current script is as follows: $wdFormatPDF = 17 $word = New-Object -ComObject Word.Application $word

Open a word document and specify encoding with PowerShell

阅读更多关于 Open a word document and specify encoding with PowerShell

Read file from multiple encoding in c# [duplicate]

阅读更多关于 Read file from multiple encoding in c# [duplicate]

问题 This question already has answers here : How can I detect the encoding/codepage of a text file (20 answers) How to use ReadAllText when file encoding unknown (2 answers) Closed 5 years ago . ENV: C#, VStudio 2013, 4.5 Framework, Winforms, nHapi 2.3 dll I really need help on this. I have tried soo many things and did alot of research with my best friend google ;-). But no luck. I'm building a HL7 sender tools and I'm reading files from a folder. My files come from multiple sources and I found