I sketch here a few more propositions and consider the following concerns:
- scalability (file size, clustering, etc.)
- batch architecture (job recovery, error handling, monitoring, etc.)
- compliance with J2EE
With JCA
JCA connectors belong to the Java EE stack and permit inboud/outboud connectivity from/to the EJB world. JDBC and JMS are usually implemented as JCA connector. An inbound JCA connector can use thread (through the worker abstraction) and transactions. It can then forward any processing to a message-driven bean (MDB).
- write a JCA connector that polls for new file, then process them and delegate further processing to message-driven bean in a synchronous way.
- the MDB can then persit the information in database with JPA
- the JCA connector has control over the transaction, and several MDB invocations can be in the same transaction
- file system is not transactional so you will somehow need to figure out how to deal with error such as faulty input files
- you can probably use streaming (InputStream) all along the pipleline
With plain threads
We can achieve more or less the same as the JCA way, using threads that are launched from a web servlet context listener (or evt. an EJB Timer).
- The thread polls for new file, if file is found it processes it and delegates further processing to regular SLSB in a synchronous way.
- Thread in web container have access to UserTransaction and can control the transaction
- EJB can be local so that InputStream is passed by reference
- Deployment of the web module + ejb can be done with an ear
With JMS
To avoid the need of having several concurrent polling threads and the problem of job acquision/locking, the actual processing can be realized asynchronously using JMS. JMS can also be interesting to split the processing in smaller tasks.
- A periodic task polls for new file. If file is found, a JMS message is queued.
- When the JMS message is delivered, the file is read and processed and the information is persisted in database with JPA
- if JMS processing fails, the app. server may retries automatically or put the message in the dead message queue
- monitoring/error handling is more complicated
- you can probably use streaming
With ESB
Many projects have emerged in the past year to deal with integration: JBI, ServiceMix, OpenESB, Mule, Spring integration, Java CAPS, BPEL. Some are technologies, some are platform, and there is some overlap between them. They all have a wagon of connectors to route, transform and orchestrate message flow. IMHO, the message are suppose to be small piece of information, and it may be hard to use these technologies to process your large data file. The website patterns of enterprise application integration is an excellent website for more information.
IMO, the approach that fits best the Java EE philosophy is JCA. But the effort to invest is relatively high. In your case, the usage of plain thread that delegate further processing to SLSB is maybe the easiest solution. The JMS approach (close to the proposition of P. Thivent) can be interesting if the processing pipelie gets more complicated. Using an ESB seems overkill to me.