skip header while reading a CSV file in Apache Beam

后端 未结 3 843
花落未央
花落未央 2021-01-06 05:52

I want to skip header line from a CSV file. As of now I\'m removing the header manually before loading it to google storage.

Below is my code :

PCo         


        
3条回答
  •  情歌与酒
    2021-01-06 06:15

    This code works for me. I have used Filter.by() to filter out the header row from csv file.

    static void run(GcsToDbOptions options) {
    
    Pipeline p = Pipeline.create(options);
    // Read the CSV file from GCS input file path
    p.apply("Read Rows from " + options.getInputFile(), TextIO.read()
        .from(options.getInputFile()))
        // filter the header row
        .apply("Remove header row",
            Filter.by((String row) -> !((row.startsWith("dwid") || row.startsWith("\"dwid\"")
                || row.startsWith("'dwid'")))))
        // write the rows to database using prepared statement
        .apply("Write to Auths Table in Postgres", JdbcIO.write()
            .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(dataSource(options)))
            .withStatement(INSERT_INTO_MYTABLE)
            .withPreparedStatementSetter(new StatementSetter()));
    PipelineResult result = p.run();
    try {
      result.getState();
      result.waitUntilFinish();
    } catch (UnsupportedOperationException e) {
      // do nothing
    } catch (Exception e) {
      e.printStackTrace();
    }}
    

提交回复
热议问题