Best strategy for processing large CSV files in Apache Camel

匿名 (未验证) 提交于 2019-12-03 08:59:04

问题:

I'd like to develop a route that polls a directory containing CSV files, and for every file it unmarshals each row using Bindy and queues it in activemq.

The problem is files can be pretty large (a million rows) so I'd prefer to queue one row at a time, but what I'm getting is all the rows in a java.util.ArrayList at the end of Bindy which causes memory problems.

So far I have a little test and unmarshaling is working so Bindy configuration using annotations is ok.

Here is the route:

from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")   .unmarshal()   .bindy(BindyType.Csv, "com.ess.myapp.core")              .to("jms:rawTraffic"); 

Environment is: Eclipse Indigo, Maven 3.0.3, Camel 2.8.0

Thank you

回答1:

If you use the Splitter EIP then you can use streaming mode which means Camel will process the file on a row by row basis.

from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")   .split(body().tokenize("\n")).streaming()     .unmarshal().bindy(BindyType.Csv, "com.ess.myapp.core")                .to("jms:rawTraffic"); 


回答2:

For the record and for other users which might have searched for this as much as me, meanwhile there seems to be an easier method which also works well with useMaps:

CsvDataFormat csv = new CsvDataFormat()     .setLazyLoad(true)     .setUseMaps(true);  from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")     .unmarshal(csv)     .split(body()).streaming()     .to("log:mappedRow?multiline=true"); 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!