问题
I am having multiple log files 1.csv,2.csv and 3.csv generated by a log report. I want to read those files and parse them concurrently using Scriptella.
回答1:
Scriptella does not provide parallel job execution out of the box. Instead you should use a job scheduler provided by an operating system or a programming environment (e.g. run multiple ETL files by submitting jobs to an ExecutorService).
Here is a working example to import a single file specified as a system property:
ETL file:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<connection id="in" driver="csv" url="$input"/>
<connection id="out" driver="text"/>
<query connection-id="in">
<script connection-id="out">
Importing: $1, $2
</script>
</query>
</etl>
Java code to run files in parallel:
//Imports 3 csv files in parallel using a fixed thread pool
public class ParallelCsvTest {
public static void main(String[] args) throws EtlExecutorException, MalformedURLException, InterruptedException {
final ExecutorService service = Executors.newFixedThreadPool(3);
for (int i=1;i<=3;i++) {
//Pass a name as a parameter to ETL file, e.g. input<i>.csv
final Map<String,?> map = Collections.singletonMap("input", "input"+i+".csv");
EtlExecutor executor = EtlExecutor.newExecutor(new File("parallel.csv.etl.xml").toURI().toURL(), map);
service.submit((Callable<ExecutionStatistics>)executor);
}
service.shutdown();
service.awaitTermination(10, TimeUnit.SECONDS);
}
}
Tu run this example create 3 csv files input1.csv, input2.csv and input3.csv and put them in the current working directory. Example of the CSV file:
Level, Message
INFO,Process 1 started
INFO,Process 1 stopped
来源:https://stackoverflow.com/questions/12383025/how-to-etl-multiple-files-using-scriptella