byte-order-mark | 易学教程

XML - Data At Root Level is Invalid

阅读更多关于 XML - Data At Root Level is Invalid

问题 I have an XSD file that is encoded in UTF-8, and any text editor I run it through doesn\'t show any character at the beginning of the file, but when I pull it up in Visual Studio\'s debugger, I clearly see an empty box in front of the file. I also get the error: Data at the root level is invalid. Line 1, position 1. Anyone know what this is? Update: Edited post to qualify type of file. It\'s an XSD file created by Microsoft\'s XSD creator. 回答1: It turns out, the answer is that what I'm seeing

UTF-8 BOM signature in PHP files

阅读更多关于 UTF-8 BOM signature in PHP files

问题 I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ș (which is a UTF-8 character, ...and a strange name, I know). Even though I save the file as UTF-8, some friends reported that they see that character totally messed up ( È™ ). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don\'t know that much about it, except from what I saw on Wikipedia and on some other similar questions here

Using awk to remove the Byte-order mark

阅读更多关于 Using awk to remove the Byte-order mark

问题 How would an awk script (presumably a one-liner) for removing a BOM look like? Specification: print every line after the first ( NR > 1 ) for the first line: If it starts with #FE #FF or #FF #FE , remove those and print the rest 回答1: Try this: awk 'NR==1{sub(/^\xef\xbb\xbf/,"")}{print}' INFILE > OUTFILE On the first record (line), remove the BOM characters. Print every record. Or slightly shorter, using the knowledge that the default action in awk is to print the record: awk 'NR==1{sub(/^\xef

How to remove multiple UTF-8 BOM sequences

阅读更多关于 How to remove multiple UTF-8 BOM sequences

问题 Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML. private function fetch($name) { $path = $this->j->config[\'template_path\'] . $name . \'.html\'; if (!file_exists($path)) { dbgerror(\'Could not find the template \"\' . $name . \'\" in \' . $path); } $f = fopen($path, \'r\'); $t = fread($f, filesize($path)); fclose($f); if (substr($t, 0, 3) == b\'\\xef\\xbb\\xbf\') { $t = substr($t, 3); } return $t; } Even though I\'ve added the BOM fix I\

How do I remove ï»¿ from the beginning of a file?

阅读更多关于 How do I remove ï»¿ from the beginning of a file?

问题 I have a CSS file that looks fine when I open it using gedit, but when it\'s read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: ï»¿ PHP removes all whitespace, so a random ï»¿ in the middle of the code messes up the entire thing. As I mentioned, I can\'t actually see these characters when I open the file in gedit, so I can\'t remove them very easily. I googled the problem, and there is clearly something wrong with the file encoding, which

How can I output a UTF-8 CSV in PHP that Excel will read properly?

阅读更多关于 How can I output a UTF-8 CSV in PHP that Excel will read properly?

问题 I\'ve got this very simple thing that just outputs some stuff in CSV format, but it\'s got to be UTF-8. I open this file in TextEdit or TextMate or Dreamweaver and it displays UTF-8 characters properly, but if I open it in Excel it\'s doing this silly íÄ kind of thing instead. Here\'s what I\'ve got at the head of my document: header(\"content-type:application/csv;charset=UTF-8\"); header(\"Content-Disposition:attachment;filename=\\\"CHS.csv\\\"\"); This all seems to have the desired effect

How to detect the character encoding of a text file?

阅读更多关于 How to detect the character encoding of a text file?

问题 I try to detect which character encoding is used in my file. I try with this code to get the standard encoding public static Encoding GetFileEncoding(string srcFile) { // *** Use Default of Encoding.Default (Ansi CodePage) Encoding enc = Encoding.Default; // *** Detect byte order mark if any - otherwise assume default byte[] buffer = new byte[5]; FileStream file = new FileStream(srcFile, FileMode.Open); file.Read(buffer, 0, 5); file.Close(); if (buffer[0] == 0xef && buffer[1] == 0xbb &&

UTF-8 without BOM

阅读更多关于 UTF-8 without BOM

问题 I have javascript files that I need them to be saved in UTF-8 (without BOM), every time I convert them to the correct format in Notepad++ , they are reverted back to UTF-8 with BOM when I open them in Visual Studio. How can I stop VS2010 from doing that? Another question, is UTF-8 without signature in Visual Studio the same as UTF-8 without BOM? 回答1: BOM or Byte Order Mark is sometimes quite annoying. Visual Studio does not change the file unless you save it (as Hans said). And here is the

Using PowerShell to write a file in UTF-8 without the BOM

阅读更多关于 Using PowerShell to write a file in UTF-8 without the BOM

问题 Out-File seems to force the BOM when using UTF-8: $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding \"UTF8\" $MyPath How can I write a file in UTF-8 with no BOM using PowerShell? 回答1: Using .NET's UTF8Encoding class and passing $False to the constructor seems to work: $MyFile = Get-Content $MyPath $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False [System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding) 回答2: The proper way as of now is to use a solution

Byte order mark screws up file reading in Java

阅读更多关于 Byte order mark screws up file reading in Java

问题 I\'m trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there an easy way to skip the byte order mark when it is present? Thanks! 回答1: EDIT : I've made a proper release on GitHub: https://github.com/gpakosz/UnicodeBOMInputStream Here is a class I coded a while ago, I just edited the package name before