Java scanner reading garbage

て烟熏妆下的殇ゞ 提交于 2019-12-25 19:01:51

问题


I am reading a text file using java Scanner.

try {
            while(sc.hasNextLine()) {
                //Read input from file
                inputLine = sc.nextLine().toUpperCase();
                System.out.println(inputLine);
}

The above gives below output while my text file only includes "aabbcc". How to avoid scanner from scanning the garbage? Thanks.

{\RTF1\ANSI\ANSICPG1252\COCOARTF1265\COCOASUBRTF210
{\FONTTBL\F0\FSWISS\FCHARSET0 HELVETICA;}
{\COLORTBL;\RED255\GREEN255\BLUE255;}
\PAPERW11900\PAPERH16840\MARGL1440\MARGR1440\VIEWW10800\VIEWH8400\VIEWKIND0
\PARD\TX566\TX1133\TX1700\TX2267\TX2834\TX3401\TX3968\TX4535\TX5102\TX5669\TX6236\TX6803\PARDIRNATURAL

\F0\FS24 \CF0 AABBCC}

回答1:


You are reading a RTF Document. If you want to read the text only you can try reading it into a byte array and parsing out the text using swings rtfeditorkit.

Path path = Paths.get("path/to/file");
byte[] data = Files.readAllBytes(path);

RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(data), document, 0);
String text = document.getText(0, document.getLength());



回答2:


This was solved by setting TextEdit preferences, Format to "Plain text" and recreated the input file. Managed to get the output without garbage.

Source: File input in Java for Mac




回答3:


The problem isn't that the Scanner is reading in garbage. It is that your file isn't plain text. From the looks of it, your file is actually "rich text", and that garbage contains formatting info. I was able to produce similar output by saving a .rtf using MS WordPad.



来源:https://stackoverflow.com/questions/26565967/java-scanner-reading-garbage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!