Can you Delete in a replication based Distributed Database?

问题

I have thus far been living under the impression that you can not truly delete a row in a replication based Distributed Database. It all works well in a Copy based one. But in Replication you mark them as "consider this delete" and filter them out in every last query. But you do not ever actually delte something from teh DB. I think it is time to verify if that asumption is true.

My udnerstanding is that you would run into a Race Condtion with the Replicaiton if there was ever a key colission. It goes something like this:

Database A: Adds a Entry under Key 11 (11A)

Database B: Adds a Entry under Key 11 (11B)

Database A: Deletes a Entry under Key 11

Now it depends in wich Order these 3 operations "meet" in the wild: The expected order would be:

11A Create
11 Delete (wich means 11A)
11B Create

But what if this happens instead?

11A Create
11B Create (fails, already a key 11)
11 Delete

Or even worse, this?

11B Create
11A Create (fails, already a key 11)
11 Delte (wich will hit 11B)

回答1:

I'll assume that we are talking about a leaderless distributed database, that is one where all nodes play the same role (there is no master), so reads and writes can both be served by all nodes. Otherwise, if there's a single master, it can impose a specific ordering on all the writes/deletes and thus resolve the concurrency problem you are describing.

But in Replication you mark them as "consider this delete" and filter them out in every last query.

That's right and it's done for 2 main reasons:

correctness: if items were deleted instead of tombstoned, then there could be an ambiguous instance, where 2 nodes are consulted where node A has the item but node B does not. And the system as a whole cannot distinguish whether that item was deleted (but the delete failed in A) or whether the item was recently created (but the created failed in B). With tombstones, this distinction can be made clear.
performance: most of those systems do not perform in-place updates (as RDBMS databases usually do), but instead perform append-only operations. That's done in order to improve performance, since random access operations in disk are much slower than sequential operations. As a result, performing the deleted via tombstones aligns well with this approach.

But you do not ever actually delete something from the DB.

That is not necessarily true. Usually, the tombstones are eventually removed from the database (in a garbage-collection fashion). Eventually here means that they are deleted when the system can be sure that the example described above cannot happen anymore for these items (because the deletes have propagated to all the nodes).

My understanding is that you would run into a Race Condition with the Replication if there was ever a key collision

That's right for most of the distributed systems of that kind. The result will depend on the order the operations reached the database. However, some of these databases provide alternative mechanisms, such as conditional writes/deletes. In this way, you can only delete a specific version of an item or update an item only if its version if a specific one (thus aborting the update if someone else updated it in the meanwhile). An example of operations of this kind from Cassandra are conditional deletes and the so-called lightweight transactions

Below are some references that describe how Riak and Cassandra perform deletes, which contain a lot of information around tombstones as well:

Riak: Object deletion
About deletes and tombstones in Cassandra

来源：https://stackoverflow.com/questions/49899012/can-you-delete-in-a-replication-based-distributed-database

标签

database

database-design