amazon-kinesis

Can I invoke Lambda functions in parallel using a single Kinesis shard if record order doesn't matter?

限于喜欢 提交于 2019-12-06 06:34:15
I've got an application for which I only need the bandwidth of 1 Kinesis shard, but I need many lambda function invocations in parallel to keep up with the record processing. My record size is on the high end (some of them encroach on the 1000 KB limit), but the incoming rate is only 1 MB/s, as I'm using a single EC2 instance to populate the stream. Since each record contains an internal timestamp, I don't care about processing them in order. Basically I have several months' worth of data that I need to migrate, and I want to do it in parallel. The processed records provide records for a

Storing Firehose transfered files in S3 under custom directory names

和自甴很熟 提交于 2019-12-06 00:39:24
问题 We primarily do bulk transfer of incoming click stream data through Kinesis Firehose service. Our system is a multi tenant SaaS platform. The incoming click stream data are stored S3 through Firehose. By default, all the files are stored under directories named per given date-format. I would like to specify the directory path for the data files in Firehose planel \ through API in order to segregate the customer data. For example, the directory structure that I would like to have in S3 for

Shard [shardId-000000000000] is not closed. This can happen if we constructed the list of shards while a reshard operation was in progress

风格不统一 提交于 2019-12-05 16:47:50
I am getting this error while fetching data from Amazon kinesis Stream. I am doing below steps creating amazon kinesis Steam put the data using putRecord api of AmazonKinesisClient . Then using Worker Of KCL library to get the data from stream. There are a few possibilities. After you ordered to create the stream, did you wait long enough for completion? Sometimes, it may took 10 minutes for a shard to be created. Since you managed to use putRecord method, the stream and shard should be active. Did you configure the DynamoDB correctly? I assume you are using it for your Kinesis Application

Kinesis Firehose putting JSON objects in S3 without seperator comma

杀马特。学长 韩版系。学妹 提交于 2019-12-05 14:49:42
Before sending the data I am using JSON.stringify to the data and it looks like this {"data": [{"key1": value1, "key2": value2}, {"key1": value1, "key2": value2}]} But once it passes through AWS API Gateway and Kinesis Firehose puts it to S3 it looks like this { "key1": value1, "key2": value2 }{ "key1": value1, "key2": value2 } The seperator comma between the JSON objects are gone but I need it to process data properly. Template in the API Gateway: #set($root = $input.path('$')) { "DeliveryStreamName": "some-delivery-stream", "Records": [ #foreach($r in $root.data) #set($data = "{ ""key1"": ""

Kinesis: What is the best/safe way to shutdown a worker?

时光怂恿深爱的人放手 提交于 2019-12-05 10:36:47
I am using the AWS Kinesis Client Library . I need a way to shutdown Kinesis Worker thread during deployments so, that I stop at a checkpoint and not in the middle of processRecords() . I see a shutdown boolean present in Worker.java but it is made private. The reason I need is that checkpointing and idempotency is critical to me and I don't want to kill the process in the middle of a batch. [EDIT] Thanks to @CaptainMurphy, I noticed that Worker.java exposes shutdown() method which safely shuts down the worker and the LeaseCoordinator . What it doesn't do is call shutdown() task in the

Can I delete data records or shards from amazon Kinesis without deleting stream?

若如初见. 提交于 2019-12-05 03:44:15
I know data records in Kinesis Stream will be deleted automatically in 24 hrs. But in my application when ever I write some data into stream, for the 2nd time if I want to write some other data, Data inserted first should be deleted. Please anyone help me since I am new to using AWS Kinesis Stream...I didn't get any help from Kinesis Service API... You can not delete previously inserted data from stream, but you can read data using KCL. KCL will create checkpoint every after one data slot read, so whenever you go for next slot of new data, KCL will read it from last checkpoint created in

How to set JVM arguments in IntelliJ IDEA?

本小妞迷上赌 提交于 2019-12-05 00:14:51
I am confused about the instruction when using Kinesis Video Stream Run DemoAppMain.java in ./src/main/demo with JVM arguments set to -Daws.accessKeyId={YourAwsAccessKey} -Daws.secretKey={YourAwsSecretKey} -Djava.library.path={NativeLibraryPath} for non-temporary AWS credential. How to set these arguments in IntelliJ IDEA? I followed the documentation and found the "Run/Debug Configurations" and don't know what to do next. Any help? Thanks! You're correct about the Run/Debug Configurations section! All you need to do is add your arguments to VM options or Program arguments , depending on the

Kinesis stream pending message count

风格不统一 提交于 2019-12-04 16:22:51
I am trying to use AWS Kinesis stream for one of our data streams. I would like to monitor pending messages on my stream for ops purposes(scale downstream according to backlog), but unable to find any API that gives (approx) pending messages in my stream. This looks strange as messages get expired after 7 days and if the producers and consumers are isolated and can't communicate, how do you know messages are expiring. How do you handle this problem? Thanks! There is no such concept as "pending" message in Kinesis. All the incoming data will be placed on a shard. Your consumer application

How to put data from server to Kinesis Stream

痞子三分冷 提交于 2019-12-04 07:50:40
I am new to Kinesis. Reading out the documentation i found i can create the Kinesis Stream to get data from Producer. Then using KCL will read this data from Stream to further processing. I understand how to write the KCL application by implemeting IRecordProcessor . However the very first stage as how to put data on Kinesis stream is still not clear to me. Do we have some AWS API which does need implementation to achieve this. Scenarios: I have an server which is contineously getting data from various sources in the folders. Each folder is containing the text file whose rows are containing

Storing Firehose transfered files in S3 under custom directory names

一曲冷凌霜 提交于 2019-12-04 06:02:46
We primarily do bulk transfer of incoming click stream data through Kinesis Firehose service. Our system is a multi tenant SaaS platform. The incoming click stream data are stored S3 through Firehose. By default, all the files are stored under directories named per given date-format. I would like to specify the directory path for the data files in Firehose planel \ through API in order to segregate the customer data. For example, the directory structure that I would like to have in S3 for customers A, B and C : / A /2017/10/12/ / B /2017/10/12/ / C /2017/10/12/ How can I do it? You can