Skipping header rows - is it possible with Cloud DataFlow?

后端 未结 1 1256
天命终不由人
天命终不由人 2020-12-20 18:36

I\'ve created a Pipeline, which reads from a file in GCS, transforms it, and finally writes to a BQ table. The file contains a header row (fields).

Is there any way

1条回答
  •  伪装坚强ぢ
    2020-12-20 18:50

    This is not currently possible. It sounds like there are two potential requests here:

    • Specifying presence and skip behavior for header lines for a BigQuery import.
    • Specifying that a GCS text source should skip a header line.

    Future work on this is tracked in https://issues.apache.org/jira/browse/BEAM-123.

    Also, in the meantime, you could add a simple filter to your ParDo code to skip headers. Something like this:

    PCollection rows = ...;
    PCollection nonHeaders =
       rows.apply(Filter.by(new MatchIfNonHeader()));
    

    0 讨论(0)
提交回复
热议问题