Lazy CSV Filtering / Parsing - Increasing Performance

蹲街弑〆低调 提交于 2019-12-04 15:43:12

That's horrible for just 2.3GB of data, may I suggest you trying to use uniVocity-parsers for better performance? Try this:

CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true); // grabs headers from input

//select the fieds you are interested in. The filtered ones get in front to make things easier
settings.selectFields("API_Call", "Remove"/*, ... and everything else you are interested in*/);

//defines a processor to filter the rows you want
settings.setProcessor(new AbstractRowProcessor() {
    @Override
    public void rowProcessed(String[] row, ParsingContext context) {
        if (row[0].equals("Updates") && row[1].isEmpty()) {
            System.out.println(Arrays.toString(row));
        }
    }
});

// create the parser
CsvParser parser = new CsvParser(settings);

//parses everything. All rows will be sent to the processor defined above
parser.parse(file, "UTF-8"); 

I know it's not functional but it took 20 seconds to process a 4 GB file I created to test this, while consuming less than 75mb of memory the whole time. From your graphic it seems your current approach takes 1 minute for a smaller file, and needs 10 times as much memory.

Give this example a try, I believe it will help considerably.

Disclaimer, I'm the author of this library, it's open-source and free (Apache 2.0 license)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!