Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I\'m newbee to use dataflow. How to get filename when use file patten match,
This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.
This could also be seen as an extracted code from the solution provided by @Reza Rokni.
PCollection listOfFilenames =
pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
.apply(FileIO.readMatches())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(FileIO.ReadableFile file) -> {
String f = file.getMetadata().resourceId().getFilename();
System.out.println(f);
return f;
}));
pipe.run().waitUntilFinish();
Above PCollection
will have a list of files available at any provided directory.