encoding

Python read from file and remove non-ascii characters

时间秒杀一切 提交于 2020-01-01 17:11:10
问题 I have the following program that reads a file word by word and writes the word again to another file but without the non-ascii characters from the first file. import unicodedata import codecs infile = codecs.open('d.txt','r',encoding='utf-8',errors='ignore') outfile = codecs.open('d_parsed.txt','w',encoding='utf-8',errors='ignore') for line in infile.readlines(): for word in line.split(): outfile.write(word+" ") outfile.write("\n") infile.close() outfile.close() The only problem that I am

Redirect binary data from Process.StandardOutput causes corrputed data

谁说胖子不能爱 提交于 2020-01-01 15:09:10
问题 On top of this problem, I have another. I try to get binary data from a external process, but the data(a image) seems to be corrupted. The screenshot below shows the corruption: The left image was done by executing the program on command line, the right one from code. My Code so far: var process = new Process { StartInfo = { Arguments = string.Format(@"-display"), FileName = configuration.PathToExternalSift, RedirectStandardError = true, RedirectStandardInput = true, RedirectStandardOutput =

Redirect binary data from Process.StandardOutput causes corrputed data

社会主义新天地 提交于 2020-01-01 15:09:08
问题 On top of this problem, I have another. I try to get binary data from a external process, but the data(a image) seems to be corrupted. The screenshot below shows the corruption: The left image was done by executing the program on command line, the right one from code. My Code so far: var process = new Process { StartInfo = { Arguments = string.Format(@"-display"), FileName = configuration.PathToExternalSift, RedirectStandardError = true, RedirectStandardInput = true, RedirectStandardOutput =

Python 3.4 hex to Japanese Characters

不羁的心 提交于 2020-01-01 14:21:07
问题 I am currently writing a script to pull information off my site which contains Japanese characters. So far I have my script pulling out the data off the site. It has return as a string: "\xe5\xb9\xb4\xe3\x81\xab\xe4\xb8\x80\xe5\xba\xa6\xe3\x81\xae\xe6\x99\xb4\xe3\x82\x8c\xe5\xa7\xbf" Using an online hex to text tool, I am giving: 年に一度の晴れ姿 I know this phrase is correct, but my question is how do I convert it in python? When I run something like: name = "\xe5\xb9\xb4\xe3\x81\xab\xe4\xb8\x80\xe5

Saving a Binary tree to a file [closed]

我的梦境 提交于 2020-01-01 12:37:11
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I have a non-balanced (not binary-search) binary tree Need to incode (and later decode) it to txt file. How can I do it in efficient way? I found this link which talks about similar (same) problem,but it is obvious for me 回答1: Please look at this on LeetCode. I like this solution because it's relatively

Encoding of AVMetadataItem

末鹿安然 提交于 2020-01-01 12:11:21
问题 I have a AVMetadataItem which has fields encoded in CP1251 (Cyrillic). After reading item.stringValue I get garbage - incorrectly encoded string. I've tried converting that string to raw UTF8 and then creating a new string using the CP1251 encoding - no luck, result is nil. Tried taking the item.dataValue - no dice, it contains a raw list data (starting with bplist...). Any ideas are very appreciated. Thanks in advance. 回答1: Swift 2.0 solution: let origTitleMeta: NSData = (<AVMetadataItem>

Foreign characters and LDAP. What encoding/charset does LDAP expect?

ε祈祈猫儿з 提交于 2020-01-01 10:07:33
问题 I am parsing XML, with simplexml_load_string() , and using the data within it to update Active Directory (AD) objects, via LDAP. Example XML (simplified): <?xml version="1.0" encoding="UTF-8"?> <users> <user>Bìlbö Bággįnš</user> <user>Gãńdåłf Thê Gręât</user> <user>Śām Wīšë</user> </users> I firstly run an ldap_search() to find a single user and then proceed to change their attributes. Pumping the above values straight into AD, using LDAP, will result in some pretty mangled characters showing

PHP: Shorter/obscured encoding for a URL embedded in another URL?

我们两清 提交于 2020-01-01 10:04:06
问题 I'm writing myself a script which basically lets me send a URL and two integer dimensions in the querystring of a single get request. I'm using base64 to encode it, but its pretty damn long and I'm concerned the URL may get too big. Does anyone know an alternative, shorter method of doing this? It needs to be decode-able when received in a get request, so md5/sha1 are not possible. Thanks for your time. Edit : Sorry - I should have explained better: Ok, on our site we display screenshots of

Assigning Arabic text to R variables

↘锁芯ラ 提交于 2020-01-01 10:02:46
问题 R doesn't display correctly Arabic text. I get very weird stuff when I use Arabic. Here's a screenshot: The problem is that I want to create a wordcloud with Arabic text and I need to solve this problem first. R version: R 2.15.2 GUI 1.53 Leopard build 64-bit (6335) Here are more info: > options("encoding") $encoding [1] "native.enc" > Encoding("الله") [1] "unknown" SessionInfo(): > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/C

Incorrect encoding after redirecting `dir` output to a file

感情迁移 提交于 2020-01-01 09:46:14
问题 I run this code on Windows cmd.exe in Europe and I use local settings here, for my language. So I use diacritics in names of the directories. I try to list names of the directories and they are displayed correctly. Then I save them into file, but when I open it in notepad, the diacritics is not readable: for example, instead of Střední Čechy I have Stýednˇ ¬echy . What did I do wrong and how can I correct it? @echo off del directories.conf FOR /F "delims=!" %%R IN ('dir * /b /a:d /o:n') DO (