Using Redshfit as Spring batch Job Repository and alternatives to SEQUENCE in Redshfit

问题

One of the requirements in my project is to place the spring batch schema on amazon redshift db.
I am planning to start from the schema-postgresql.sql as the base line as redshift is based on postgres.

Looking at the spring batch source code it looks like you need to do few things to make this work:

Extending JobRepositoryFactoryBean, DefaultDataFieldMaxValueIncrementerFactory.
Adding My own RedshfitMaxValueIncrementer that extends AbstractSequenceMaxValueIncrementer

Looking at the redshift datatypes it does not look like I will not have any issues converting the schema script aside from sequence which used to create job,execution,step execution ids.

What do you suggest as the best workaround for the missing sequences?

Specifies those columns as an IDENTITY column. Looks as the easiest way from the redshift point of view. This can be problematic as DataFieldMaxValueIncrementer.nextLongValue() return long and not Long and we need to return null and let IDENTITY do the job for us
Implementation base on something like select max(STEP_EXECUTION_ID) from BATCH_STEP_EXECUTION And doing something similar to MySQLMaxValueIncrementer that extends AbstractColumnMaxValueIncrementer
Creating the sequences in java code only; using tools similar to the ones hibernate use
An approach not mentioned above

回答1:

Here's how I got at least that part to (apparently) work:

In my subclass of DefaultBatchConfigurer, I added this code:

@Override
protected JobRepository createJobRepository() throws Exception
{
    JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
    factory.setDataSource(dataSource);
    factory.setTransactionManager(getTransactionManager());
    factory.setIncrementerFactory(new RedshiftIncrementerFactory(dataSource));
    factory.afterPropertiesSet();
    return factory.getObject();
}

The factory object looks like

public class RedshiftIncrementerFactory implements DataFieldMaxValueIncrementerFactory
{
    private DataSource dataSource;

    public RedshiftIncrementerFactory(DataSource ds)
    {
        this.dataSource = ds;
    }

    @Override
    public DataFieldMaxValueIncrementer getIncrementer(String databaseType, String incrementerName)
    {
        return new RedshiftIncrementer(dataSource, incrementerName);
    }

    @Override
    public boolean isSupportedIncrementerType(String databaseType)
    {
        return POSTGRES.toString().equals(databaseType);
    }

    @Override
    public String[] getSupportedIncrementerTypes()
    {
        return new String[]{POSTGRES.toString()};
    }

}

And then, finally, the incrementer itself:

public class RedshiftIncrementer extends AbstractSequenceMaxValueIncrementer
{
    public RedshiftIncrementer(DataSource dataSource, String incrementorName)
    {
        super(dataSource, incrementorName);
    }

    // I need to run two queries here, since Redshift doesn't support sequences
    @Override
    protected long getNextKey() throws DataAccessException {
        Connection con = DataSourceUtils.getConnection(getDataSource());
        Statement stmt = null;
        ResultSet rs = null;
        try {
            stmt = con.createStatement();
            DataSourceUtils.applyTransactionTimeout(stmt, getDataSource());
            String table = getIncrementerName();
            stmt.executeUpdate("UPDATE " + table + " SET ID = ID + 1");
            rs = stmt.executeQuery("SELECT ID FROM " + table + " WHERE UNIQUE_KEY='0'");
            if (rs.next()) {
                return rs.getLong(1);
            }
            else {
                throw new DataAccessResourceFailureException("Sequence query did not return a result");
            }
        }
        catch (SQLException ex) {
            throw new DataAccessResourceFailureException("Could not obtain sequence value", ex);
        }
        finally {
            JdbcUtils.closeResultSet(rs);
            JdbcUtils.closeStatement(stmt);
            DataSourceUtils.releaseConnection(con, getDataSource());
        }
    }

    @Override
    protected String getSequenceQuery()
    {
        // No longer used
        return null;
    }
}

This at least allows the job to start. However, there are other problems with Redshift that I will detail elsewhere.

来源：https://stackoverflow.com/questions/20789731/using-redshfit-as-spring-batch-job-repository-and-alternatives-to-sequence-in-re

标签

Spring

amazon-web-services

spring-batch

amazon-redshift