How to Get Filename when using file pattern match in google-cloud-dataflow

后端 未结 5 1455
无人共我
无人共我 2020-12-06 14:10

Someone know how to get Filename when using file pattern match in google-cloud-dataflow?

I\'m newbee to use dataflow. How to get filename when use file patten match,

5条回答
  •  既然无缘
    2020-12-06 14:39

    This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.

    This could also be seen as an extracted code from the solution provided by @Reza Rokni.

    PCollection listOfFilenames =
        pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
            .apply(FileIO.readMatches())
            .apply(
                MapElements.into(TypeDescriptors.strings())
                    .via(
                        (FileIO.ReadableFile file) -> {
                          String f = file.getMetadata().resourceId().getFilename();
                          System.out.println(f);
                          return f;
                        }));
    
    pipe.run().waitUntilFinish();
    

    Above PCollection will have a list of files available at any provided directory.

提交回复
热议问题