amazon-kinesis

How to put 25k record to kinesis stream and Test tool to acknowledge it

拟墨画扇 提交于 2019-12-13 14:10:07
问题 I have developed a piece of software which writes record to Amazon kinesis Stream web service. i am trying to understand do we have any software tool which will allow me to measure what maximum throughput my code is generating to Kinesis Stream for 1 Shard in one second. Yes i do agree it depends on hardware configurations too. But for start i want o know for general purpose machine then might be i will able to see horizontal scalability With this i am trying to achieve 25k records per second

What is the difference between AWS Transcribe > Streaming Transcription feature and Kinesis Video Streams(For Audio Input) for live streaming audio

大城市里の小女人 提交于 2019-12-13 05:05:58
问题 Hi My requirement is I have live audio stream as input, say a call between 2 people, now to convert that audio to text on live and pick certain keywords from that extracted text and insert in Database. As per architecture in https://github.com/aws-samples/amazon-connect-realtime-transcription Both AWS Kinesis Video Streams service and AWS Transcribe used for live streaming but as per link : https://aws.amazon.com/blogs/machine-learning/amazon-transcribe-now-supports-real-time-transcriptions/

Consuming/producing data to particular shardID in amazon Kinesis

给你一囗甜甜゛ 提交于 2019-12-13 02:08:37
问题 I need to put the all the records into kinesis from various servers and need to output the data into multiple S3 Files. I have been trying with ShardID, but, not able to make it work out. Could you please help???? Python/Java would be fine. 回答1: ShardID is not that important. If you have 20 MB/sec input bandwidth with 20000 request/seconds rate; you should have 20 shards at least. And with each shard, your data will be spread accross, so it is just about capacity. Those shards does not affect

Firehose JSON -> S3 Parquet -> ETL Spark, error: Unable to infer schema for Parquet

我与影子孤独终老i 提交于 2019-12-12 17:11:51
问题 It seems like this should be easy, like it's a core use case of this set of features, but it's been problem after problem. The latest is in trying to run commands via a Glue Dev endpoint (both the PySpark and Scala end-points). Following the instructions here: https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-repl.html import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.transforms import * glueContext = GlueContext

Amazon-Kinesis: Put record to every shard

做~自己de王妃 提交于 2019-12-12 17:03:48
问题 I have an Amazon Kinesis stream, consisting of multiple shards. The number of shards, and therefore the number of consumers, is not a constant. There is an infrequent type of event that I want broadcasted to every consumer on the stream. Is there a way for a producer to broadcast a record, i.e. to discover the shards and put the record on each one? 回答1: You can do this! Kind of... The trick it to use the parameter "ExplicitHashKey". This lets you set the hash key used for the record, and

Event retention in Microsoft Azure EventHub

痞子三分冷 提交于 2019-12-12 01:59:55
问题 I was checking on details about message retention in event hub. Suppose, I have set the retentionPolicy as 1 day and I had send some messages. Then, if I change the message retentionPolicy to 3 days, will the existing eventData also be retained for 3 days? 回答1: Absolutely Yes. And one more important detail about retention policy - EventHubs does not apply the retention policy at message level . Its at file-system level. EventHubs is a high-throughput event ingestion pipeline. In-short it's a

How is data in kinesis decrypted before hitting s3

心不动则不痛 提交于 2019-12-11 18:06:06
问题 I currently have an architecture where my kinesis -> kinesis firehouse -> s3 I am creating records directly in kinesis using: aws kinesis put-record --stream-name <some_kinesis_stream> --partition-key 123 --data testdata --profile sandbox The data when I run: aws kinesis get-records --shard-iterator --profile sandbox looks like this: { "SequenceNumber": "49597697038430366340153578495294928515816248592826368002", "ApproximateArrivalTimestamp": 1563835989.441, "Data":

How to implement a worker thread that will process Kinesis records and update GUI in javaFx?

一笑奈何 提交于 2019-12-11 17:35:44
问题 I'm working on a micro-services monitoring app. My app supposes to update a GUI accordingly when receiving a new consumed record, meaning: When I receive a new record: 1)I check if the request it represents is a part of a legal flow, and if that flow already has representation in the GUI. By representation, I mean a set of circles that represent the full flow. For example, if I get a transaction (MS1 received request) a legal flow num 1: that is MS1 to MS2 to MS3, so my GUI will add a table

Avoiding data loss when slow consumers force backpressure in stream processing (spark, aws)

为君一笑 提交于 2019-12-11 17:13:13
问题 I'm new to distributed stream processing (Spark). I've read some tutorials/examples which cover how backpressure results in the producer(s) slowing down in response to overloaded consumers. The classic example given is ingesting and analyzing tweets. When there is an unexpected spike in traffic such that the consumers are unable to handle the load, they apply backpressure and the producer responds by adjusting its rate lower. What I don't really see covered is what approaches are used in

Kinesis max shard reads/sec and multiple consumers

≡放荡痞女 提交于 2019-12-11 06:46:31
问题 So I have a AWS Kinesis stream where I publish events for multiple consumers. It is important for most of them to receive hot data - which means that many of them will possibly poll and read the latest data at the same time. According to the AWS documentation increasing the number of shards will increase the level of parallelism while the number of reads/sec can be max 5/sec per shard. My question is whether (and how?) would adding more shards help the situation where all my consumers are up