Flink - Why should I create my own RichSinkFunction instead of just open and close my PostgreSql connection?

不羁岁月 提交于 2020-05-14 11:25:48

问题


I would like to know why I really need to create my own RichSinkFunction or use JDBCOutputFormat to connect on the database instead of just Create my connection, perform the query and close the connection using the traditional PostgreSQL drivers inside my SinkFunction?

I found many articles telling do to that but does not explain why? What is the difference?

Code example using JDBCOutputFormat,

JDBCOutputFormat jdbcOutput = JDBCOutputFormat.buildJDBCOutputFormat()
     .setDrivername("org.postgresql.Driver")
     .setDBUrl("jdbc:postgresql://localhost:1234/test?user=xxx&password=xxx")
     .setQuery(query)
     .setSqlTypes(new int[] { Types.VARCHAR, Types.VARCHAR, Types.VARCHAR }) //set the types
     .finish();

Code example implementing the own RichSinkFunction,

public class RichCaseSink extends RichSinkFunction<Case> {

  private static final String UPSERT_CASE = "INSERT INTO public.cases (caseid, tracehash) "
      + "VALUES (?, ?) "
      + "ON CONFLICT (caseid) DO UPDATE SET "
      + "  tracehash=?";

  private PreparedStatement statement;


  @Override
  public void invoke(Case aCase) throws Exception {

    statement.setString(1, aCase.getId());
    statement.setString(2, aCase.getTraceHash());
    statement.setString(3, aCase.getTraceHash());
    statement.addBatch();
    statement.executeBatch();
  }

  @Override
  public void open(Configuration parameters) throws Exception {
    Class.forName("org.postgresql.Driver");
    Connection connection =
        DriverManager.getConnection("jdbc:postgresql://localhost:5432/casedb?user=signavio&password=signavio");

    statement = connection.prepareStatement(UPSERT_CASE);
  }

}

why I cannot just use the PostgreSQL driver?

public class Storable implements SinkFunction<Activity>{

    @Override
    public void invoke(Activity activity) throws Exception {
        Class.forName("org.postgresql.Driver");
        try(Connection connection =
            DriverManager.getConnection("jdbc:postgresql://localhost:5432/casedb?user=signavio&password=signavio")){

        statement = connection.prepareStatement(UPSERT_CASE);

        //Perform the query

        //close connection...
        }
    }

}

Does someone know the technical answer to the best practice in Flink? Does Implementation of RichSinkFunction or usage of JDBCOutputFormat do something special?

Thank you in advance.


回答1:


Well You can use your own SinkFunction that will simply use invoke() method to open connection and write data and it should work in general. But it's performance will be very, very poor in most cases.

The actual difference between first example and the second example is the fact that in the RichSinkFunction you are using open() method to open the connection and prepare the statement. This open() method is invoked only once when the function is initialized. In the second example you will open the connection to the database and prepare statement inside the invoke() method, which is invoked for every element of the input DataStream.You will actually open a new connection for every element in the stream.

Creating a database connection is expensive thing to do, and it will for sure have terrible performance drawbacks.



来源:https://stackoverflow.com/questions/56245901/flink-why-should-i-create-my-own-richsinkfunction-instead-of-just-open-and-clo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!