amazon-kinesis | 易学教程

How to put 25k record to kinesis stream and Test tool to acknowledge it

阅读更多关于 How to put 25k record to kinesis stream and Test tool to acknowledge it

问题 I have developed a piece of software which writes record to Amazon kinesis Stream web service. i am trying to understand do we have any software tool which will allow me to measure what maximum throughput my code is generating to Kinesis Stream for 1 Shard in one second. Yes i do agree it depends on hardware configurations too. But for start i want o know for general purpose machine then might be i will able to see horizontal scalability With this i am trying to achieve 25k records per second

What is the difference between AWS Transcribe > Streaming Transcription feature and Kinesis Video Streams(For Audio Input) for live streaming audio

阅读更多关于 What is the difference between AWS Transcribe > Streaming Transcription feature and Kinesis Video Streams(For Audio Input) for live streaming audio

问题 Hi My requirement is I have live audio stream as input, say a call between 2 people, now to convert that audio to text on live and pick certain keywords from that extracted text and insert in Database. As per architecture in https://github.com/aws-samples/amazon-connect-realtime-transcription Both AWS Kinesis Video Streams service and AWS Transcribe used for live streaming but as per link : https://aws.amazon.com/blogs/machine-learning/amazon-transcribe-now-supports-real-time-transcriptions/

Consuming/producing data to particular shardID in amazon Kinesis

阅读更多关于 Consuming/producing data to particular shardID in amazon Kinesis

问题 I need to put the all the records into kinesis from various servers and need to output the data into multiple S3 Files. I have been trying with ShardID, but, not able to make it work out. Could you please help???? Python/Java would be fine. 回答1: ShardID is not that important. If you have 20 MB/sec input bandwidth with 20000 request/seconds rate; you should have 20 shards at least. And with each shard, your data will be spread accross, so it is just about capacity. Those shards does not affect

Firehose JSON -> S3 Parquet -> ETL Spark, error: Unable to infer schema for Parquet

阅读更多关于 Firehose JSON -> S3 Parquet -> ETL Spark, error: Unable to infer schema for Parquet

问题 It seems like this should be easy, like it's a core use case of this set of features, but it's been problem after problem. The latest is in trying to run commands via a Glue Dev endpoint (both the PySpark and Scala end-points). Following the instructions here: https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-repl.html import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.transforms import * glueContext = GlueContext

Amazon-Kinesis: Put record to every shard

阅读更多关于 Amazon-Kinesis: Put record to every shard

问题 I have an Amazon Kinesis stream, consisting of multiple shards. The number of shards, and therefore the number of consumers, is not a constant. There is an infrequent type of event that I want broadcasted to every consumer on the stream. Is there a way for a producer to broadcast a record, i.e. to discover the shards and put the record on each one? 回答1: You can do this! Kind of... The trick it to use the parameter "ExplicitHashKey". This lets you set the hash key used for the record, and

Event retention in Microsoft Azure EventHub

阅读更多关于 Event retention in Microsoft Azure EventHub

问题 I was checking on details about message retention in event hub. Suppose, I have set the retentionPolicy as 1 day and I had send some messages. Then, if I change the message retentionPolicy to 3 days, will the existing eventData also be retained for 3 days? 回答1: Absolutely Yes. And one more important detail about retention policy - EventHubs does not apply the retention policy at message level . Its at file-system level. EventHubs is a high-throughput event ingestion pipeline. In-short it's a

How is data in kinesis decrypted before hitting s3

阅读更多关于 How is data in kinesis decrypted before hitting s3

问题 I currently have an architecture where my kinesis -> kinesis firehouse -> s3 I am creating records directly in kinesis using: aws kinesis put-record --stream-name <some_kinesis_stream> --partition-key 123 --data testdata --profile sandbox The data when I run: aws kinesis get-records --shard-iterator --profile sandbox looks like this: { "SequenceNumber": "49597697038430366340153578495294928515816248592826368002", "ApproximateArrivalTimestamp": 1563835989.441, "Data":

How to implement a worker thread that will process Kinesis records and update GUI in javaFx?

阅读更多关于 How to implement a worker thread that will process Kinesis records and update GUI in javaFx?

问题 I'm working on a micro-services monitoring app. My app supposes to update a GUI accordingly when receiving a new consumed record, meaning: When I receive a new record: 1)I check if the request it represents is a part of a legal flow, and if that flow already has representation in the GUI. By representation, I mean a set of circles that represent the full flow. For example, if I get a transaction (MS1 received request) a legal flow num 1: that is MS1 to MS2 to MS3, so my GUI will add a table

Avoiding data loss when slow consumers force backpressure in stream processing (spark, aws)

阅读更多关于 Avoiding data loss when slow consumers force backpressure in stream processing (spark, aws)

问题 I'm new to distributed stream processing (Spark). I've read some tutorials/examples which cover how backpressure results in the producer(s) slowing down in response to overloaded consumers. The classic example given is ingesting and analyzing tweets. When there is an unexpected spike in traffic such that the consumers are unable to handle the load, they apply backpressure and the producer responds by adjusting its rate lower. What I don't really see covered is what approaches are used in

Kinesis max shard reads/sec and multiple consumers

阅读更多关于 Kinesis max shard reads/sec and multiple consumers

问题 So I have a AWS Kinesis stream where I publish events for multiple consumers. It is important for most of them to receive hot data - which means that many of them will possibly poll and read the latest data at the same time. According to the AWS documentation increasing the number of shards will increase the level of parallelism while the number of reads/sec can be max 5/sec per shard. My question is whether (and how?) would adding more shards help the situation where all my consumers are up