I have a file which is encoded as iso-8859-1, and contains characters such as ô .
I am reading this file with java code, something like:
File in = n
Parsing the file as fixed-size blocks of bytes is not good --- what if some character has a byte representation that straddles across two blocks? Use an InputStreamReader with the appropriate character encoding instead:
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream("myfile.csv"), "ISO-8859-1");
char[] buffer = new char[4096]; // character (not byte) buffer
while (true)
{
int charCount = br.read(buffer, 0, buffer.length);
if (charCount == -1) break; // reached end-of-stream
String s = String.valueOf(buffer, 0, charCount);
// alternatively, we can append to a StringBuilder
System.out.println(s);
}
Btw, remember to check that the unicode character can indeed be displayed correctly. You could also redirect the program output to a file and then compare it with the original file.
As Jon Skeet suggests, the problem may also be console-related. Try System.console().printf(s) to see if there is a difference.