Microservices Architecture for highly frequent data access; in memory solutions?

问题

let us define the following use case:

There has to be a simulation task fulfilled, which involves an iteration/simulation over [day1, day2, ..., dayN]. Every step of the iteration depends on the prior step, so the order is predefined.
The task has a state represented by Object1, this object is going to be changed within every step of the iteration.
The step of an iteration involves 2 different tasks: Task1 and Task2.
To fulfill Task1, data from Database1 is required.
For Task2 to be fulfilled, also external data is needed from a different database, namely Database2.
After Task1 has finished, Task2 needs to be applied.
Task1 and also Task2 needs to access Object1
After both tasks are done, the state of Object1 changes and one iteration step has finished.

This iteration/simulation task involves on average 10,000 iteration steps. And on average 100 iteration/simulation tasks need to be performed concurrently, started by several enduser.

Now we discuss a microservice architecture for the problem, due to the needed scalability of the application in production. Also for development purpose this is crucial, because Task1 and Task2 are recently added new features/parameters and scale differently in development.

So, to avoid the network bottleneck here, involving the constant database access in every iteration and also the send data between Task1 and Task2, what would be an appropriate system architecture to this problem?

Should there be at least two different services for Task1 and Task2 and maybe even one for the actual iteration/simulation state control? Can someone maybe tell us a little bit more about the use of an in memory data grid solution like hazlecast or only in-memory database like redis for this problem?

The main question here is what are the arguments for a microservice architecture due to probably communication/network bottleneck? The only way to speed this up is to spawn all needed data for the simulation task in memory and keep it there the whole time, to avoid the network bottleneck?

Thanks for your answers and valuable input on this.

(This question is not about inter service communication, like messaging or REST http (pub/sub or req/resp), both could apply highly network load for this task.)

回答1:

Now we discuss a microservice architecture for the problem, due to the needed scalability of the application in production. Also for development purpose this is crucial, because Task1 and Task2 are recently added new features/parameters and scale differently in development.

This is exactly what a stream processing platform is doing good. I recommend to use a system like Apache Kafka or Apache Pulsar for this problem.

Should there be at least two different services for Task1 and Task2 and maybe even one for the actual iteration/simulation state control?

Task1 and Task2 is what is called stream processors, they read (subscribe to) one topic, doing some operations/transformations and write (publishes) to another topic.

The main question here is what are the arguments for a microservice architecture due to probably communication/network bottleneck? The only way to speed this up is to spawn all needed data for the simulation task in memory and keep it there the whole time, to avoid the network bottleneck?

Again, this is exactly the problem that a system like Apache Kafka or Apache Pulsar is doing good. To scale writes and reads in a stream processing system, you can partition your topics.

回答2:

With Hazelcast, you get the best of both worlds - data storage (cache in Hazelcast cluster) and compute/processing. Within the same Hazelcast cluster, you can create caches using Hazelcast data structures and load them with the data from database (pre-load warmup or on-demand loading of data in cache). Then you execute your tasks within the cluster using Hazelcast Jet APIs. This way, your tasks will have access to the data previously loaded into the cluster and the advantage - data is at nearest possible location to your tasks, therefore extremely low latency for tasks execution.

Another benefit of Jet - since Jet is a DAG implementation, you can connect multiple tasks with each other in direction that you like. For example, Task1 can input into Task2, Task2 can input into Task3, Task3 can input into Task1 and 2 both, and so on etc. This gives you full control over full job execution that may entail multiple tasks at different stages. Jet provides both Stream and Batch processing of tasks, with same flexibility in designing and execution of your jobs.

You may find it problematic to use Kafka for tasks execution if used outside of Kafka ecosystem. Jet is highly flexible and can be connected to any source/sink, including Kafka.

来源：https://stackoverflow.com/questions/58902946/microservices-architecture-for-highly-frequent-data-access-in-memory-solutions

标签

Redis

microservices

shared-memory

hazelcast

stream-processing