I am just getting started with Google Data Flow, I have written a simple flow that reads a CSV file from cloud storage. One of the steps involves calling a web service to
You can buffer elements in a local member variable of your DoFn, and call your web service when the buffer is large enough, as well as in finishBundle. For example:
class CallServiceFn extends DoFn<String, String> {
private List<String> elements = new ArrayList<>();
public void processElement(ProcessContext c) {
elements.add(c.element());
if (elements.size() >= MAX_CALL_SIZE) {
for (String result : callServiceWithData(elements)) {
c.output(result);
}
elements.clear();
}
}
public void finishBundle(Context c) {
for (String result : callServiceWithData(elements)) {
c.output(result);
}
}
}
Note that a GroupIntoBatches transform was added to make this even easier.