amazon-kinesis

Athena can only see the first JSON record written to Firehose by Kinesis Analytics

为君一笑 提交于 2021-02-19 03:53:25
问题 I am using Kinesis Analytics to read in JSON from Kinesis Firehose. I am successfully filtering out some of the records and writing a subset of the JSON properties to another Firehose. I wanted to execute an Athena query on the data being written to S3 via the destination Firehose. However, the JSON records written to the files in S3 do not have any newlines. Consequently, when I query the data using Athena, it only returns the first record in each file. When I write records to the source

Apache Spark Kinesis Integration: connected, but no records received

為{幸葍}努か 提交于 2021-02-16 08:30:23
问题 tldr; Can't use Kinesis Spark Streaming integration, because it receives no data. Testing stream is set up, nodejs app sends 1 simple record per second. Standard Spark 1.5.2 cluster is set up with master and worker nodes (4 cores) with docker-compose, AWS credentials in environment spark-streaming-kinesis-asl-assembly_2.10-1.5.2.jar is downloaded and added to classpath job.py or job.jar (just reads and prints) submitted. Everything seems to be okay, but no records what-so-ever are received.

Same Kinesis Consumer running on multiple EC2 instances

青春壹個敷衍的年華 提交于 2021-02-10 19:11:11
问题 I have multiple instances of EC2 running for a same microservice, which has a Kinesis consumer running(with KCL). My question is, when Kinesis stream gets a new event, since all consumers are polling, will the same event be consumed by consumers of all instances? 回答1: The event will be consumed only by one consumer 回答2: KCL is designed so that each shard is processed by only one worker - the built-in lease mechanism is the key to providing this functionality. While under normal circumstances

Same Kinesis Consumer running on multiple EC2 instances

*爱你&永不变心* 提交于 2021-02-10 19:09:45
问题 I have multiple instances of EC2 running for a same microservice, which has a Kinesis consumer running(with KCL). My question is, when Kinesis stream gets a new event, since all consumers are polling, will the same event be consumed by consumers of all instances? 回答1: The event will be consumed only by one consumer 回答2: KCL is designed so that each shard is processed by only one worker - the built-in lease mechanism is the key to providing this functionality. While under normal circumstances

Same Kinesis Consumer running on multiple EC2 instances

时光毁灭记忆、已成空白 提交于 2021-02-10 19:09:05
问题 I have multiple instances of EC2 running for a same microservice, which has a Kinesis consumer running(with KCL). My question is, when Kinesis stream gets a new event, since all consumers are polling, will the same event be consumed by consumers of all instances? 回答1: The event will be consumed only by one consumer 回答2: KCL is designed so that each shard is processed by only one worker - the built-in lease mechanism is the key to providing this functionality. While under normal circumstances

Python 3.6 keep log info in buffer and then send it

烈酒焚心 提交于 2021-01-29 05:21:11
问题 I have a homemade tool that runs a lot of steps and then I'm caching all is going on with the package import logging and then writing it into a file for logging purposes. I would like to make my process smarter and I was wondering how to keep all the data collected for the log steps in a buffer and then send it to cloud or a database table or any other destination. thanks so much for your advice and approaches. 来源: https://stackoverflow.com/questions/52081408/python-3-6-keep-log-info-in

Using Python to parse and render Kinesis Video Streams and get an image representation of the input frame

允我心安 提交于 2021-01-29 02:54:40
问题 I have set up a pipeline in which, I live stream the video to Kinesis Video Stream (KVS), which sends the frames to Amazon Rekognition for face recognition, which further sends them to Kinesis Data Stream (KDS). Finally, KDS sends the results to a lambda. For a frame on which face recognition has been conducted, I get the JSON of the following format: https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output-reference.html My AIM is: Using this JSON, I somehow want to

How Apache Beam manage kinesis checkpointing?

守給你的承諾、 提交于 2021-01-28 08:00:52
问题 I have a streaming pipeline developed in Apache Beam (using Spark Runner) which reads from kinesis stream. I am looking out for options in Apache Beam to manage kinesis checkpointing (i.e. stores periodically the current position of kinesis stream) so as it allows the system to recover from failures and continue processing where the stream left off. Is there a provision available for Apache Beam to support kinesis checkpointing as similar to Spark Streaming (Reference link - https://spark

Checkpointing records with Amazon KCL throws ProvisionedThroughputExceededException

ぃ、小莉子 提交于 2021-01-27 20:23:44
问题 We are experiencing a ProvisionedThroughputExceededException upon checkpointing many events together. The exception stacktrace is the following: com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException: Rate exceeded for shard shardId-000000000000 in stream mystream under account accountid. (Service: AmazonKinesis; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: ea36760b-9db3-0acc-bbe9-87939e3270aa) at com.amazonaws.http.AmazonHttpClient

How to parse Kinesis data stream in AWS Lambda Java

不羁岁月 提交于 2021-01-05 09:04:27
问题 I am creating a AWS Lambda function in Java to process Kinesis Data Stream. My current setup of parsing involves: Stringify using UTF-8 as suggested in AWS Documentation for(KinesisEvent.KinesisEventRecord rec : event.getRecords()) { String stringRecords = new String(rec.getKinesis().getData().array(), "UTF-8"); pageEventList.add(pageEvent); } Clean up characters using Regex Patterns a. non-ascii: "[^\\x00-\\x7F]"; b. ascii-control-characters: "[\\p{Cntrl}&&[^\r\n\t]]"; c. non-printable