问题
I see Kafka Connect can write to S3 in Avro or JSON formats. But there is no Parquet support. How hard would this be to add?
回答1:
The Qubole connector supports writing out parquet - https://github.com/qubole/streamx
回答2:
Try secor
:
https://github.com/pinterest/secor
Can work with AWS S3, google cloud, Azure's blob storage etc.
Note that the solution you choose must have key features like: Guarantee writing each message exactly once, load distribution, fault tolerance, monitoring, partitioning data etc.
Secor
has it all and as stated above, can easily work with other "s3" style services..
来源:https://stackoverflow.com/questions/43878719/parquet-output-from-kafka-connect-to-s3