Spring Batch and S3 Integration - how to remove null characters first from S3 before start reading?

问题

In my case, we get the FlatFile from the source system and keep it on server and then we push this file to Amazon S3 Bucket due to some automated process.

The Source system is Mainframe which somehow puts null characters into FlatFile and its unavoidable for them. Now before we start reading FlatFile we must need to remove null characters (like we do using linux command - tr \'\\000\' \' \' < \"%s\" > \"%s\") from the file present in Amazon S3 bucket.

So far I don't see a way (not unable to find out how to do it) to remove null characters without download and once null characters got removed, then start reading it.

Any code help is much appreciated.

Note - Since we've deployed Batch App on PCF, we cant download on PCF and remove NULL characters and upload again, because PCF support team confirms that File System within PCF is transient and hence doing anything related to file is not advisable there.

回答1:

I don't know if you can change the file inline in s3 without downloading it. That said, having a transient file system does not mean not doing any file operations, it rather means don't rely on that FS for persistent storage. Any temporary file manipulation can be done on that FS without any issue.

So even if the file system on PCF is transient, I don't see any downside of downloading the file and transforming it in a tasklet step before starting the chunk-oriented processing (obviously as long as you have enough space to store the file). A SystemCommandTasklet is appropriate for your tr command. The file can be cleaned up in a final step or in a job listener.

来源：https://stackoverflow.com/questions/65590769/spring-batch-and-s3-integration-how-to-remove-null-characters-first-from-s3-be

标签

amazon-web-services

amazon-s3

spring-batch