byte-order-mark

Is it possible to get GCC to compile UTF-8 with BOM source files?

眉间皱痕 提交于 2019-11-28 11:00:29
I develop C++ cross platform using Microsoft Visual Studio on Windows and GCC on uBuntu Linux. In Visual Studio I can use unicode symbols like " π " and " ² " in my code. Visual Studio always saves the source files as UTF-8 with BOM (Byte Order Mark). For example: // A = π.r² double π = 3.14; GCC happily compiles these files only if I remove the BOM first. If I do not remove the BOM, I get errors like these: wwga_hydutils.cpp:28:9: error: stray ‘\317’ in program wwga_hydutils.cpp:28:9: error: stray ‘\200’ in program Which brings me to the question: Is there a way to get GCC to compile UTF-8

How can I remove the BOM from a UTF-8 file?

强颜欢笑 提交于 2019-11-28 09:51:51
I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file? $ file test.xml test.xml: XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF. With bash, you can create a UTF-8 BOM with the $'' special quoting form, which implements Unicode escapes: $'\uFEFF' . So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be: sed -i $'1s/^\uFEFF//' file.txt This will

How to Remove BOM from an XML file in Java

只谈情不闲聊 提交于 2019-11-28 09:40:45
I need suggestions on the way to remove BOM from an UTF-8 file and create a copy of the rest of the xml file. TacticalCoder Having a tool breaking because of a BOM in an UTF-8 file is a very common thing in my experience. I don't know why there where so many downvotes (but then it gives me the chance to try to get enough vote to win a special SO badge ; ) More seriously: an UTF-8 BOM doesn't typically make that much sense but it is fully valid (although discouraged) by the specs. Now the problem is that a lot of people aren't aware that a BOM is valid in UTF-8 and hence wrote broken tools /

How do I encode/decode UTF-16LE byte arrays with a BOM?

允我心安 提交于 2019-11-28 09:01:56
I need to encode/decode UTF-16 byte arrays to and from java.lang.String . The byte arrays are given to me with a Byte Order Marker (BOM) , and I need to encoded byte arrays with a BOM. Also, because I'm dealing with a Microsoft client/server, I'd like to emit the encoding in little endian (along with the LE BOM) to avoid any misunderstandings. I do realize that with the BOM it should work big endian, but I don't want to swim upstream in the Windows world. As an example, here is a method which encodes a java.lang.String as UTF-16 in little endian with a BOM: public static byte[] encodeString

Java: UTF-8 and BOM

拜拜、爱过 提交于 2019-11-28 08:23:37
问题 On a page of Java's Bug Database http://bugs.sun.com/view_bug.do?bug_id=4508058 it reads that Sun/Oracle will not fix the problem of Java not parsing the BOM of a UTF-8-encoded string. Since the most recent comment on this page dates back to 2010, I would like to know if there is any younger info about that? Is it still true that Java cannot handle BOM of UTF-8? 回答1: Yes, it is still true that Java cannot handle the BOM in UTF8 encoded files. I came across this issue when parsing several XML

create an UTF-8 string with BOM

不想你离开。 提交于 2019-11-28 04:26:57
问题 I'm using MD5 function and Base64 Encoding to generate a User Secret (used to login to data layer of the used API) I did the code in javascript and it's fine, but in Objective C I'm strugling with the BOM my code is: NSString *str = [[NSString alloc] initWithFormat:@"%@%@%@%d", [auth uppercaseString], [user uppercaseString], [pwd uppercaseString], totalDaysSince2000]; NSString *sourceString = [[NSString alloc] initWithFormat:@"%02x%02x%02x%@", 0xEF, 0xBB, 0xBF, str]; NSString *strMd5 =

Is it possible to run a SQLPLUS script on a file encoded as UTF-8 with BOM

倖福魔咒の 提交于 2019-11-28 03:54:29
问题 I'm trying to run a collection of scripts which have been auto-generated from a large number of sources. Unfortunately some of these have been generated as UTF-8 with BOM. I have in place a system for automatically removing the BOM, but its a bit of a messy process. Failing to remove the BOM generates the error: SP2-0042: unknown command "" - rest of line ignored. Is it possible to run SQLPLUS on a script file which has a BOM? 回答1: It is possible to run SQLPLUS with such script, but

Running SQL script through psql gives syntax errors that don't occur in PgAdmin

…衆ロ難τιáo~ 提交于 2019-11-28 01:59:51
I have the following script to create a table: -- Create State table. DROP TABLE IF EXISTS "State" CASCADE; CREATE TABLE "State" ( StateID SERIAL PRIMARY KEY NOT NULL, StateName VARCHAR(50) ); It runs fine in the query tool of PgAdmin. But when I try to run it from the command line using psql: psql -U postgres -d dbname -f 00101-CreateStateTable.sql I get a syntax error as shown below. 2: ERROR: syntax error at or near "" LINE 1: ^ psql:00101-CreateStateTable.sql:6: NOTICE: CREATE TABLE will create implicit sequence "State_stateid_seq" for serial column "State.stateid" psql:00101

why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

和自甴很熟 提交于 2019-11-28 01:57:52
I have an xml with utf8 encoding. And this file contains BOM a beginning of the file. So during parsing I am facing with org.xml.sax.SAXParseException: Content is not allowed in prolog. I can not remove those 3 bytes from the files. I can not load file into memory and remove them here (files are big). So for performance reasons I'm using SAX parser and want just to skip those 3 bytes if they are present before "" tag. Should I inherit InputStreamReader for this? I'm new in java - show me the right way please. Adrian Cox This has come up before, and I found the answer on Stack Overflow when it

Removing BOM characters using Java [duplicate]

╄→гoц情女王★ 提交于 2019-11-28 01:48:48
This question already has an answer here: Byte order mark screws up file reading in Java 8 answers What needs to happen to a string using Java to be an equivalent of vi s :set nobomb Assume that BOM comes from the file I am reading. Java does not handle BOM properly. In fact Java handles a BOM like every other char. Found this: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html public static final String UTF8_BOM = "\uFEFF"; private static String removeUTF8BOM(String s) { if (s.startsWith(UTF8_BOM)) { s = s.substring(1); } return s; } May be I would use apache IO instead: