问题
I have below readfile() java function to read .htm files
private String readfile(String inputDoc) throws IOException {
FileInputStream fis = null;
InputStreamReader isr = null;
String text = null;
//open input stream to file
fis = new FileInputStream(inputDoc);
isr = new InputStreamReader(fis, "UTF-8");
StringBuffer buffer = new StringBuffer();
int c;
while( (c = isr.read()) != -1 ) {
buffer.append((char)c);
}
text = buffer.toString();
isr.close();
return text;
}
Here is example snippet of input doc
<?xml version="1.0" encoding="utf-8"?><html>
<head>
For some reason text string returned from readfile() is <?xml version="1.0" encoding="utf-8"?><html>\r\r\n<head>
but I expect it to be <?xml version="1.0" encoding="utf-8"?><html>\r\n<head>
as it is outlined here newline char in windows \r\n
I ran above function in IntelliJ Idea on Windows 7. (IDEA default encoding is set to UTF-8)
Does anyone know why I get this weird result from readfile(String inputDoc) function for newline
回答1:
You get this because it is like this in the input file. Try to open the input file in a hex editor to verify.
回答2:
When you write the \n, it is expanded to \r\n on Windows for portability. That way, no matter what operating system you run it on, you get the correct result with no additional code: \r\n on Windows, or just \n on Unix. It looks like you are reading the input in binary mode (In text mode, the same expansions happen in reverse: any \r\n in the input becomes just \n, so you again do not have to worry about OS), so you see the \r. Then, when you write the \n, it gets expanded to \r\n, leaving two \rs.
来源:https://stackoverflow.com/questions/14059212/why-i-get-r-r-n-as-newline-instead-of-r-n-as-newline-char-in-windows