Is handling very large concurrency not possible?

问题

Recently I am building a REST api as part of an assignment where I am suppose to increment a counter in the database table, assuming the table has only one column, I am suppose to fire like 1000 requests per second to this REST api to increment the counter and at the end the data should be consistent i.e. if initially the counter value in DB is 0 then after the succesful run of 1000 requests concurrently it should be 1000.

No worries so far I achieved it via database row-level locking, other way could be the use of transaction(with highest isolation) around the code which increment the counter, but what I have observed is though this is achievable to maintain consistency but it comes at the cost of high latency for example I run a Jmeter test with 1000 req/sec for 5 seconds and all requests full-filled in around 26 seconds which is really a huge latency.

This now created a lot of questions in my mind -

There must be some real time scenarios or apps where this level of high concurrency is handled with low latency, isn't there ?
Is it always the limitation with Relational database and could be solved non-relational nosql database in some way ?
I thought like queuing such concurrent requests with some message queue but again that will be non-realtime behavior if user is waiting on some response

Thanks in advance, any help appreciated.

回答1:

This is a limitation with relational databases and any database with strong concurrency guarantees in general. You can't really get around it except scaling up the hardware.

The thing that this all comes down to is I/O operations. To guarantee that your transaction is 100% written and can not be lost, databases usually flush the data to disk. Depending on what disks you have this takes super long, in the milliseconds range.

So to your questions:

Applications with high concurrency usually avoid transactions, strong guarantees, or at least I/O operations for each request.
Yes, there are plenty of non-relational databases that don't do flush for every request, or keep the data in memory entirely.
Queuing or other tricks can't solve the fundamental problem of io/second bottleneck.

You may be able to achieve your goal by switching to SSDs as disks, theoretically those can reach 1000s io/second, where a spinning disk does at most 100s io/second. Then you have to convince your database to do few as possible iops for one requests.

回答2:

The "limitation" has nothing to do with databases being "relational" (or not).

The essence of your scenario is that you can't begin adding 1 (e.g. to obtain 3) before the previous actor has ended adding 1 to obtain 2, and has fully ended and committed that change. If 2+1=3, you can't start the computation unless and until both of the values on the LHS are available and reliable. So if the 2 is the result of some other action, you won't be able to start until that other action has finished completely.

That is [sometimes] called "convoy syndrome" and it can occur in really just any technology.

There are lots of shops where they do apparently similar things with "high conurrency", but either they will achieve it by avoiding any form of shared central resource causing convoy syndrome (such as your counter) or they'll achieve it by sacrificing your "must end up with 1000" guarantee.

回答3:

There must be some real time scenarios or apps where this level of high concurrency is handled with low latency, isn't there ?

You want to review the literature on the use of the ring buffer data structure for messaging by LMAX.

The short answer is that, in some scenarios, batching your writes can save you, assuming that you can arrange the problem such that other constraints are satisfied.

来源：https://stackoverflow.com/questions/54304646/is-handling-very-large-concurrency-not-possible

标签

rest

concurrency

relational-database

consistency