发表新帖

发表新帖

Skipping header rows - is it possible with Cloud DataFlow?

后端未结

关注

 1  1269

天命终不由人

I\'ve created a Pipeline, which reads from a file in GCS, transforms it, and finally writes to a BQ table. The file contains a header row (fields).

Is there any way

相关标签:

1条回答

伪装坚强ぢ

2020-12-20 18:50
This is not currently possible. It sounds like there are two potential requests here:
- Specifying presence and skip behavior for header lines for a BigQuery import.
- Specifying that a GCS text source should skip a header line.
Future work on this is tracked in https://issues.apache.org/jira/browse/BEAM-123.

Also, in the meantime, you could add a simple filter to your ParDo code to skip headers. Something like this:
```
PCollection<X> rows = ...;
PCollection<X> nonHeaders =
   rows.apply(Filter.by(new MatchIfNonHeader()));
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题