问题
I currently have an architecture where my kinesis -> kinesis firehouse -> s3
I am creating records directly in kinesis using:
aws kinesis put-record --stream-name <some_kinesis_stream> --partition-key 123 --data testdata --profile sandbox
The data when I run:
aws kinesis get-records --shard-iterator --profile sandbox
looks like this:
{
"SequenceNumber": "49597697038430366340153578495294928515816248592826368002",
"ApproximateArrivalTimestamp": 1563835989.441,
"Data": "eyJrZXkiOnsiZW1wX25vIjo1Mjc2OCwiZGVwdF9ubyI6ImQwMDUifSwidmFsdWUiOnsiYmVmb3JlIjpudWxsLCJhZnRlciI6eyJlbXBfbm8iOjUyNzY4LCJkZXB0X25vIjoiZDAwNSIsImZyb21fZGF0ZSI6Nzk2NSwidG9fZGF0ZSI6MjkzMjUzMX0sInNvdXJjZSI6eyJ2ZXJzaW9uIjoiMC45LjUuRmluYWwiLCJjb25uZWN0b3IiOiJteXNxbCIsIm5hbWUiOiJraW5lc2lzIiwic2VydmVyX2lkIjowLCJ0c19zZWMiOjAsImd0aWQiOm51bGwsImZpbGUiOiJteXNxbC1iaW4tY2hhbmdlbG9nLjAwMDAwMiIsInBvcyI6MTU0LCJyb3ciOjAsInNuYXBzaG90Ijp0cnVlLCJ0aHJlYWQiOm51bGwsImRiIjoiZW1wbG95ZWVzIiwidGFibGUiOiJkZXB0X2VtcCIsInF1ZXJ5IjpudWxsfSwib3AiOiJjIiwidHNfbXMiOjE1NjM4MzEzMTI2Njh9fQ==",
"PartitionKey": "-591791328"
}
but in s3, it looks like:
`testdatatestdatatestdatatestdatatestdatatestdatatestdatatestdata`
because I ran the putrecords several times.
So what is going on? When I run get-records
, what records am I obtaining? What is that data? How is that data then decrypted into my original string? What is going on?
回答1:
15 days old now, so hopefully you found the answer already.
If not, it seems the reason you have a mismatch in data between get-records
and what you see in S3 is based on how you performed the aws kinesis get-records --shard-iterator --profile sandbox
call, you didn't explicitly provide a shard iterator value.
What you saw in S3 is correct and expected based on your --data testdata
put-record
calls.
testdatatestdatatestdatatestdatatestdatatestdatatestdatatestdata
What you saw in Kinesis is base64 encoded:
"Data": "eyJrZXkiOnsiZW1wX25vIjo1Mjc2OCwiZGVwdF9ubyI6ImQwMDUifSwidmFsdWUiOnsiYmVmb3JlIjpudWxsLCJhZnRlciI6eyJlbXBfbm8iOjUyNzY4LCJkZXB0X25vIjoiZDAwNSIsImZyb21fZGF0ZSI6Nzk2NSwidG9fZGF0ZSI6MjkzMjUzMX0sInNvdXJjZSI6eyJ2ZXJzaW9uIjoiMC45LjUuRmluYWwiLCJjb25uZWN0b3IiOiJteXNxbCIsIm5hbWUiOiJraW5lc2lzIiwic2VydmVyX2lkIjowLCJ0c19zZWMiOjAsImd0aWQiOm51bGwsImZpbGUiOiJteXNxbC1iaW4tY2hhbmdlbG9nLjAwMDAwMiIsInBvcyI6MTU0LCJyb3ciOjAsInNuYXBzaG90Ijp0cnVlLCJ0aHJlYWQiOm51bGwsImRiIjoiZW1wbG95ZWVzIiwidGFibGUiOiJkZXB0X2VtcCIsInF1ZXJ5IjpudWxsfSwib3AiOiJjIiwidHNfbXMiOjE1NjM4MzEzMTI2Njh9fQ==",
So decoding gets you:
{
"key":
{
"emp_no": 52768,
"dept_no": "d005"
},
"value":
{
"before": null,
"after":
{
"emp_no": 52768,
"dept_no": "d005",
"from_date": 7965,
"to_date": 2932531
},
"source":
{
"version": "0.9.5.Final",
"connector": "mysql",
"name": "kinesis",
"server_id": 0,
"ts_sec": 0,
"gtid": null,
"file": "mysql-bin-changelog.000002",
"pos": 154,
"row": 0,
"snapshot": true,
"thread": null,
"db": "employees",
"table": "dept_emp",
"query": null
},
"op": "c",
"ts_ms": 1563831312668
}
}
The reason why it didn't match your "testdata" is because you were looking into the wrong shard iterator on possibly the wrong shard. Unsure what your kinesis setup is exactly.
Give this article a once over, https://docs.aws.amazon.com/streams/latest/dev/fundamental-stream.html . Should give you the steps to test this workflow.
回答2:
It seems that you've configured your firehose to enable server-side data encryption. If this is the case then the following applies:
When you configure a Kinesis data stream as the data source of a Kinesis Data Firehose delivery stream, Kinesis Data Firehose no longer stores the data at rest. Instead, the data is stored in the data stream.
When you send data from your data producers to your data stream, Kinesis Data Streams encrypts your data using an AWS Key Management Service (AWS KMS) key before storing the data at rest. When your Kinesis Data Firehose delivery stream reads the data from your data stream, Kinesis Data Streams first decrypts the data and then sends it to Kinesis Data Firehose. Kinesis Data Firehose buffers the data in memory based on the buffering hints that you specify. It then delivers it to your destinations without storing the unencrypted data at rest.
Find out more at: https://docs.aws.amazon.com/firehose/latest/dev/encryption.html
来源:https://stackoverflow.com/questions/57173084/how-is-data-in-kinesis-decrypted-before-hitting-s3