问题
I am reading a text file using java Scanner.
try {
while(sc.hasNextLine()) {
//Read input from file
inputLine = sc.nextLine().toUpperCase();
System.out.println(inputLine);
}
The above gives below output while my text file only includes "aabbcc". How to avoid scanner from scanning the garbage? Thanks.
{\RTF1\ANSI\ANSICPG1252\COCOARTF1265\COCOASUBRTF210
{\FONTTBL\F0\FSWISS\FCHARSET0 HELVETICA;}
{\COLORTBL;\RED255\GREEN255\BLUE255;}
\PAPERW11900\PAPERH16840\MARGL1440\MARGR1440\VIEWW10800\VIEWH8400\VIEWKIND0
\PARD\TX566\TX1133\TX1700\TX2267\TX2834\TX3401\TX3968\TX4535\TX5102\TX5669\TX6236\TX6803\PARDIRNATURAL
\F0\FS24 \CF0 AABBCC}
回答1:
You are reading a RTF Document. If you want to read the text only you can try reading it into a byte array and parsing out the text using swings rtfeditorkit.
Path path = Paths.get("path/to/file");
byte[] data = Files.readAllBytes(path);
RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(data), document, 0);
String text = document.getText(0, document.getLength());
回答2:
This was solved by setting TextEdit preferences, Format to "Plain text" and recreated the input file. Managed to get the output without garbage.
Source: File input in Java for Mac
回答3:
The problem isn't that the Scanner is reading in garbage. It is that your file isn't plain text. From the looks of it, your file is actually "rich text", and that garbage contains formatting info. I was able to produce similar output by saving a .rtf using MS WordPad.
来源:https://stackoverflow.com/questions/26565967/java-scanner-reading-garbage