stream-processing

An ArrowCircuit instance for stream processors which could block

半世苍凉 提交于 2021-01-21 07:30:30
问题 The Control.Arrow.Operations.ArrowCircuit class is for: An arrow type that can be used to interpret synchronous circuits. I want to know what synchronous means here. I looked it up on Wikipedia, where they are speaking of digital electronics. My electronics is quite rusty, so here is the question: what is wrong (if anything is) with such an instance for the so-called asynchronous stream processors: data StreamProcessor a b = Get (a -> StreamProcessor a b) | Put b (StreamProcessor a b) | Halt

jq streaming - filter nested list and retain global structure

非 Y 不嫁゛ 提交于 2020-06-25 21:14:38
问题 In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document. My example input it this (but the real one is large enough to demand streaming). { "keep_untouched": { "keep_this": [ "this", "list" ] }, "filter_this": [ {"keep" : "true"}, { "keep": "true", "extra": "keeper" } , { "keep": "false", "extra": "non-keeper" } ] } The required output just has one element of the 'filter_this' block removed: { "keep_untouched": { "keep_this": [

Akka Stream Kafka vs Kafka Streams

牧云@^-^@ 提交于 2020-05-09 17:57:05
问题 I am currently working with Akka Stream Kafka to interact with kafka and I was wonderings what were the differences with Kafka Streams. I know that the Akka based approach implements the reactive specifications and handles back-pressure, functionality that kafka streams seems to be lacking. What would be the advantage of using kafka streams over akka streams kafka? 回答1: Your question is very general, so I'll give a general answer from my point of view. First, I've got two usage scenario:

Why does Apache Flink need Watermarks for Event Time Processing?

空扰寡人 提交于 2020-02-28 06:53:26
问题 Can someone explain Event timestamp and watermark properly. I understood it from docs, but it is not so clear. A real life example or layman definition will help. Also, if it is possible give an example ( Along with some code snippet which can explain it ).Thanks in advance 回答1: Here's an example that illustrates why we need watermarks, and how they work. In this example we have a stream of timestamped events that arrive somewhat out of order, as shown below. The numbers shown are event-time

Kafka Stream to Spark Stream python

青春壹個敷衍的年華 提交于 2020-01-15 12:15:08
问题 We have Kafka stream which use Avro. I need to connect it to Spark Stream. I use bellow code as Lev G suggest. kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers}, valueDecoder=MessageSerializer.decode_message) I got bellow error when i execute it through spark-submit. 2018-10-09 10:49:27 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Requesting driver to remove executor 12 for reason Container marked as failed: container_1537396420651_0008_01_000013 on

Apache flink (Stable version 1.6.2) does not work

纵饮孤独 提交于 2019-12-25 03:26:47
问题 Recently, the stable version (1.6.2) of apache flink was released. I read this instruction. But when I run the following command: ./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000 I get the following error: The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 264564a337d4c6705bde681b34010d28) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) at org

Troubleshooting onyx-kafka not writing to a topic. Howto run kafka in docker swarm. Error setting runtime volume size (/dev/shm)?

泪湿孤枕 提交于 2019-12-24 07:33:10
问题 I'm trying to i) troubleshoot a simple onyx-kafka job not writing to a topic. More details are given here. And you can try it out in this sample project. I think the reason is because there's only one kafka node. So I tried ii) launching kafka confluentinc/cp-kafka:3.3.1 (with zookeeper confluentinc/cp-zookeeper:3.3.1 ) running docker (17.09.0-ce, build afdb6d4) in swarm mode . But then I get this error. Warning: space is running low in /dev/shm (shm) threshold=167,772,160 usable=58,716,160 A

How to restart Apache Apex application?

六眼飞鱼酱① 提交于 2019-12-24 01:36:06
问题 From the apex documentation, it is clear that an app launched with apache apex can be killed or shutdwon using the commands: kill-app & shutdown-app respectively. But, when the application is turned off (shutdown/kill), how to restart it from its previous state? 回答1: Apache Apex provides a command line interface, "apex" (previously called "dtcli") script, to interact with the applications. Once an application is shut down or killed, you can restart it using following command: launch pi-demo-3

akka-stream Zipping Flows with SubFlows

拥有回忆 提交于 2019-12-23 12:28:46
问题 I've a short question about akka-streams. Basically, I try to split a stream into two streams, one of these two streams will be split again in multiple subFlows using groupBy, each of these subFlows needs to be connected with the other stream (zip). I tried to illustrate this here: Here is what I got so far val aggFlow = Flow.fromGraph(GraphDSL.create() { implicit builder => val broadcast = builder.add(Broadcast[Event](2)) val zip = builder .add(ZipWith[ChangedEvent, Long, (ChangedEvent, Long