stream-processing | 易学教程

Microservices Architecture for highly frequent data access; in memory solutions?

阅读更多关于 Microservices Architecture for highly frequent data access; in memory solutions?

问题 let us define the following use case : There has to be a simulation task fulfilled, which involves an iteration/simulation over [ day1, day2, ..., dayN ]. Every step of the iteration depends on the prior step, so the order is predefined. The task has a state represented by Object1 , this object is going to be changed within every step of the iteration. The step of an iteration involves 2 different tasks: Task1 and Task2 . To fulfill Task1 , data from Database1 is required. For Task2 to be

What is the difference between mini-batch vs real time streaming in practice (not theory)?

阅读更多关于 What is the difference between mini-batch vs real time streaming in practice (not theory)?

问题 What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame whereas real time streaming is more like do something as the data arrives but my biggest question is why not have mini batch with epsilon time frame (say one millisecond) or I would like to understand reason why one would be an effective solution than other? I recently came across one example where mini-batch (Apache

Apache Flink - Event time windows

阅读更多关于 Apache Flink - Event time windows

问题 I want to create keyed windows in Apache flink such that the windows for each key gets executed n minutes after arrival of first event for the key. Is it possible to be done using Event time characteristics ( as processing time depends on system clock and it is uncertain when will the first event arrives ). If it is possible please explain the assignment of Event time and watermark also to the events and also explain how to call the process window function after n minutes. Below is a part of

Calculate totals and emit periodically in flink

阅读更多关于 Calculate totals and emit periodically in flink

问题 I have a stream of events about resources that looks like this: id, type, count 1, view, 1 1, download, 3 2, view, 1 3, view, 1 1, download, 2 3, view, 1 I am trying to produce stats (totals) per resource, so if I get a stream like above, the result should be: id, views, downloads 1, 1, 5 2, 1, 0 3, 2, 0 Now I wrote a ProcessFunction that calculates the totals like this: public class CountTotals extends ProcessFunction<Event, ResourceTotals> { private ValueState<ResourceTotals> totalsState;

Share state among operators in Flink

阅读更多关于 Share state among operators in Flink

问题 I wonder if it is possible in Flink to share the state among operators. Say, for instance, that I have partitioning by key on an operator and I need a piece of state of partition A inside partition C (for any reason) (fig 1.a), or I need the state of operator C in downstream operator F (fig 1.b). I know it is possible to broadcast records to all partitions. So, if you include the internal state of an operator inside the records, you can share your internal state with downstream operators.

What is the difference between mini-batch vs real time streaming in practice (not theory)?

阅读更多关于 What is the difference between mini-batch vs real time streaming in practice (not theory)?

What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame whereas real time streaming is more like do something as the data arrives but my biggest question is why not have mini batch with epsilon time frame (say one millisecond) or I would like to understand reason why one would be an effective solution than other? I recently came across one example where mini-batch (Apache Spark) is used for Fraud detection and real time streaming (Apache Flink) used for Fraud Prevention.

What are the differences between kappa-architecture and lambda-architecture

阅读更多关于 What are the differences between kappa-architecture and lambda-architecture

If the Kappa-Architecture does analysis on stream directly instead of splitting the data into two streams, where is the datastored then, in a messagin-system like Kafka? or can it be in a database for recomputing? And is a seperate batch layer faster than recomputing with a stream processing engine for batch analytics? "A very simple case to consider is when the algorithms applied to the real-time data and to the historical data are identical. Then it is clearly very beneficial to use the same code base to process historical and real-time data, and therefore to implement the use-case using the

What are the differences between kappa-architecture and lambda-architecture

阅读更多关于 What are the differences between kappa-architecture and lambda-architecture

问题 If the Kappa-Architecture does analysis on stream directly instead of splitting the data into two streams, where is the datastored then, in a messagin-system like Kafka? or can it be in a database for recomputing? And is a seperate batch layer faster than recomputing with a stream processing engine for batch analytics? 回答1: "A very simple case to consider is when the algorithms applied to the real-time data and to the historical data are identical. Then it is clearly very beneficial to use

Share state among operators in Flink

阅读更多关于 Share state among operators in Flink

I wonder if it is possible in Flink to share the state among operators. Say, for instance, that I have partitioning by key on an operator and I need a piece of state of partition A inside partition C (for any reason) (fig 1.a), or I need the state of operator C in downstream operator F (fig 1.b). I know it is possible to broadcast records to all partitions. So, if you include the internal state of an operator inside the records, you can share your internal state with downstream operators. However, this could be an expensive operation instead of simply letting op1 specifically ask for op2 state

Server CPU and GPU With LAMP

阅读更多关于 Server CPU and GPU With LAMP

I am trying to figure out more about the hardware that can be utilized when running a php application or even a c++ compiled php app using HipHop. I would like to setup a microserver and use the GPU to help the CPU process requests... Anyone? PHP alone does not have the ability to leverage the GPU. This was recently discussed on the php internals developer list . Keep in mind that GPUs excel at certain types of workloads, while they're not that great for others. PHP wouldn't be able to really take advantage of GPU acceleration because the work it performs isn't really the best kind for a GPU.