问题
I am currently designing a correlation engine in java which is extracting data from pdf files and correlating (raising alerts where necessary) it structured data from a relational database.
Focusing on the processing of the pdf files the system consists of:
A component which is performing the custom extraction from the pdf.
A component which parses the sometimes unordered unclean data into the required data structures
A normalisation component which will normalises the values for comparison
And a component which interfaces with the db (where the extracted data will be inserted with the rest of the data)
The components should be reusable in other processing chains but they will all run on the same system initially.
I think it's wise to have some sort of buffering between components, is it wise to be using JMS Queueing or would this over complicate matters? I have been experimenting with a simple linkedblockingqueue object but this object has to be passed between components so it requires a master components which drives everything which i am not sure is desirable, is there a standard way of approaching this problem?
回答1:
I would use chained calls unless you have additional requirements.
loadPDF(new PDFExtractor(new PDFParser(new Normalizer(new DBEnricher(listener)))));
If you want multiple threads, I would process each file in a different thread using an ExecutorService thread pool.
来源:https://stackoverflow.com/questions/6295677/queuing-jobs-in-a-processing-chain-in-java