How can I process a large file via CSVParser?

筅森魡賤 提交于 2019-11-30 02:57:23

问题


I have a large .csv file (about 300 MB), which is read from a remote host, and parsed into a target file, but I don't need to copy all the lines to the target file. While copying, I need to read each line from the source and if it passes some predicate, add the line to the target file.

I suppose that Apache CSV ( apache.commons.csv ) can only parse whole file

CSVFormat csvFileFormat = CSVFormat.EXCEL.withHeader();
CSVParser csvFileParser = new CSVParser("filePath", csvFileFormat);
List<CSVRecord> csvRecords = csvFileParser.getRecords();

so I can't use BufferedReader. Based on my code, a new CSVParser() instance should be created for each line, which looks inefficient.

How can I parse a single line (with known header of the table) in the case above?


回答1:


No matter what you do, all of the data from your file is going to come over to your local machine because your system needs to parse through it to determine validity. Whether the file arrives via a file read through the parser (so you can parse each line), or whether you just copy the entire file over for parsing purposes, it will all come over to local. You will need to get the data local, then trim the excess.

Calling csvFileParser.getRecords() is already a lost battle because the documentation explains that that method loads every row of your file into memory. To parse the record while conserving active memory, you should instead iterate over each record; the documentation implies the following code loads one record to memory at a time:

CSVParser csvFileParser = CSVParser.parse(new File("filePath"), csvFileFormat);

for (CSVRecord csvRecord : csvFileParser) {
     ... // qualify the csvRecord; output qualified row to new file and flush as needed.
}

Since you explained that "filePath" is not local, the above solution is prone to failure due to connectivity issues. To eliminate connectivity issues, I recommend you copy the entire remote file over to local, ensure the file copied accurately by comparing checksums, parse the local copy to create your target file, then delete the local copy after completion.




回答2:


This is a late response, but you CAN use a BufferedReader with the CSVParser:

try (BufferedReader reader = new BufferedReader(new FileReader(fileName), 1048576 * 10)) {
    Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(reader);
    for (CSVRecord line: records) {
        // Process each line here
    }
catch (...) { // handle exceptions from your bufferedreader here


来源:https://stackoverflow.com/questions/32123969/how-can-i-process-a-large-file-via-csvparser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!