问题
I have below readfile() java function to read .htm files
private String readfile(String inputDoc) throws IOException {
FileInputStream fis = null;
InputStreamReader isr = null;
String text = null;
//open input stream to file
fis = new FileInputStream(inputDoc);
isr = new InputStreamReader(fis, "UTF-8");
StringBuffer buffer = new StringBuffer();
int c;
while( (c = isr.read()) != -1 ) {
buffer.append((char)c);
}
text = buffer.toString();
isr.close();
return text;
}
Here is example snippet of input doc
<?xml version="1.0" encoding="utf-8"?><html>
<head>
For some reason text string returned from readfile() is <?xml version="1.0" encoding="utf-8"?><html>\r\r\n<head>
but I expect it to be <?xml version="1.0" encoding="utf-8"?><html>\r\n<head>
as it is outlined here newline char in windows \r\n
I ran above function in IntelliJ Idea on Windows 7. (IDEA default encoding is set to UTF-8)
Does anyone know why I get this weird result from readfile(String inputDoc) function for newline
回答1:
You get this because it is like this in the input file. Try to open the input file in a hex editor to verify.
回答2:
When you write the \n
, it is expanded to \r\n
on Windows for portability. That way, no matter what operating system you run it on, you get the correct result with no additional code: \r\n
on Windows, or just \n
on Unix. It looks like you are reading the input in binary mode (In text mode, the same expansions happen in reverse: any \r\n
in the input becomes just \n
, so you again do not have to worry about OS), so you see the \r
. Then, when you write the \n
, it gets expanded to \r\n
, leaving two \r
s.
来源:https://stackoverflow.com/questions/14059212/why-i-get-r-r-n-as-newline-instead-of-r-n-as-newline-char-in-windows