byte-order-mark

Working with utf-8 files in Eclipse

可紊 提交于 2019-11-27 01:51:13
问题 Quite straight forward question. Is there a way to configure Eclipse to work with text files encoded with utf-8 with and without the BOM? So far I've used eclipse with utf-8 encoding and it works, but when I try to edit a file generated by another editor that includes the BOM, Eclipse doesn't handle it properly, it 'shows an invisible character' at the begining of the file (the BOM). Is there a way to make Eclipse understand utf-8 encoded files with BOM? 回答1: Both bug 78455 ("Provide an

Encoding a string as UTF-8 with BOM in PHP

久未见 提交于 2019-11-27 01:18:35
问题 how can I force PHP to add the BOM when using utf8_encode ? Here's what I am trying to do: $zip->addFromString($filename, utf8_encode($xml)); Unfortunately (for me), the result will not have the BOM mark at the beginning. 回答1: Have you tried adding one yourself? The UTF-8 BOM seems to be 0xEF 0xBB 0xBF , so you can attach it to your string after conversion to UTF-8. $utf8_with_bom = chr(239) . chr(187) . chr(191) . $utf8_string; Watch out, though. utf8_encode wants an ISO-8859-1 string. If

How to GetBytes() in C# with UTF8 encoding with BOM?

◇◆丶佛笑我妖孽 提交于 2019-11-27 00:55:43
I'm having a problem with UTF8 encoding in my asp.net mvc 2 application in C#. I'm trying let user download a simple text file from a string. I am trying to get bytes array with the following line: var x = Encoding.UTF8.GetBytes(csvString); but when I return it for download using: return File(x, ..., ...); I get a file which is without BOM so I don't get Croatian characters shown up correctly. This is because my bytes array does not include BOM after encoding. I triend inserting those bytes manually and then it shows up correctly, but that's not the best way to do it. I also tried creating

Export UTF-8 BOM to .csv in R

好久不见. 提交于 2019-11-26 22:59:19
问题 I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן). However, even when exporting it using write.csv and fileEncoding="UTF-8" the output looks like <U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446> (in this case this is not the string above but a Bulgarian one) for Bulgarian, Hebrew, Chinese and so on. Other special characters like ã,ç etc work fine. I suspect this is because of UTF-8 BOM but I did not

Encoding.UTF8.GetString doesn't take into account the Preamble/BOM

a 夏天 提交于 2019-11-26 22:50:20
In .NET, I'm trying to use Encoding.UTF8.GetString method, which takes a byte array and converts it to a string . It looks like this method ignores the BOM (Byte Order Mark) , which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character. I know I can use a TextReader to digest the BOM as needed, but I thought that the GetString method should be some kind of a macro that makes our code shorter. Am I missing something? Is this like so intentionally? Here's a reproduction code: static void Main(string[] args) { string s1 = "abc"; byte[] abcWithBom;

How to avoid tripping over UTF-8 BOM when reading files

情到浓时终转凉″ 提交于 2019-11-26 22:22:43
I'm consuming a data feed that has recently added a Unicode BOM header (U+FEFF), and my rake task is now messed up by it. I can skip the first 3 bytes with file.gets[3..-1] but is there a more elegant way to read files in Ruby which can handle this correctly, whether a BOM is present or not? With ruby 1.9.2 you can use the mode r:bom|utf-8 text_without_bom = nil #define the variable outside the block to keep the data File.open('file.txt', "r:bom|utf-8"){|file| text_without_bom = file.read } or text_without_bom = File.read('file.txt', encoding: 'bom|utf-8') or text_without_bom = File.read('file

Running SQL script through psql gives syntax errors that don't occur in PgAdmin

前提是你 提交于 2019-11-26 22:19:00
问题 I have the following script to create a table: -- Create State table. DROP TABLE IF EXISTS "State" CASCADE; CREATE TABLE "State" ( StateID SERIAL PRIMARY KEY NOT NULL, StateName VARCHAR(50) ); It runs fine in the query tool of PgAdmin. But when I try to run it from the command line using psql: psql -U postgres -d dbname -f 00101-CreateStateTable.sql I get a syntax error as shown below. 2: ERROR: syntax error at or near "" LINE 1: ^ psql:00101-CreateStateTable.sql:6: NOTICE: CREATE TABLE will

Read a UTF-8 text file with BOM

纵饮孤独 提交于 2019-11-26 22:16:51
问题 I have a text file with Byte order mark (U+FEFF) at the beginning. I am trying to read the file in R. Is it possible to avoid the Byte order mark? The function fread (from the data.table package) reads the file, but adds  at the beginning of the first variable name: > names(frame_pers)[1] [1] "reg_date" The same is with read.csv function. Currently I have made a function which removes the BOM from the first column name, but I believe there should be a way how to automatically strip the

Removing BOM characters using Java [duplicate]

你离开我真会死。 提交于 2019-11-26 22:00:19
问题 This question already has an answer here: Byte order mark screws up file reading in Java 8 answers What needs to happen to a string using Java to be an equivalent of vi s :set nobomb Assume that BOM comes from the file I am reading. 回答1: Java does not handle BOM properly. In fact Java handles a BOM like every other char. Found this: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html public static final String UTF8_BOM = "\uFEFF"; private static String removeUTF8BOM(String

Removing BOM from gzip'ed CSV in Python

放肆的年华 提交于 2019-11-26 21:26:03
问题 I'm using the following code to unzip and save a CSV file: with gzip.open(filename_gz) as f: file = open(filename, "w"); output = csv.writer(file, delimiter = ',') output.writerows(csv.reader(f, dialect='excel', delimiter = ';')) Everything seems to work, except for the fact that the first characters in the file are unexpected. Googling around seems to indicate that it is due to BOM in the file. I've read that encoding the content in utf-8-sig should fix the issue. However, adding: .read()