Cassandra and Tombstones: Creating a Row , Deleting the Row, Recreating the Row = Performance?

陌路散爱 提交于 2019-12-05 04:20:58

I would like to belatedly clarify some things here.

First, with respect to Theodore's answer:

1) All rows have a tombstone field internally for simplicity, so when the new row is merged with the tombstone, it just becomes "row with new data, that also remembers that it was once deleted at time X." So there is no real penalty in that respect.

2) It is incorrect to say that "If you create and delete a column value rapidly enough that no flush takes place in the middle... the tombstone [is] simply discarded"; tombstones are always persisted, for correctness. Perhaps the situation Theodore was thinking was the other way around: if you delete, then insert a new column value, then the new column replaces the tombstone (just as it would any obsolete value). This is different from the row case since the Column is the "atom" of storage.

3) Given (2), the delete-row-and-insert-new-one is likely to be more performant if there are many columns to be deleted over time. But for a single column the difference is negligible.

Finally, regarding Tyler's answer, in my opinion it is more idiomatic to simply delete the column in question than to change its value to an empty [byte]string.

1). If you delete the whole row, then the tombstone is still kept and not reanimated by the subsequent insertion in step 3. This is because there may have been an insertion for the row a long time ago (e.g. step 0: key "1", field "name"). Row "1" key "name" needs to stay deleted, while row "1" key "user" is reanimated.

2). If you create and delete a column value rapidly enough that no flush takes place in the middle, there is no performance impact. The column will be updated in-place in the Memtable, and the tombstone simply discarded. Only a single value will end up being written persistently to an SSTable.

However, if the Memtable is flushed to disk between steps 2 and 3, then the tombstone will be written to the resulting SSTable. A subsequent flush will write the new value to the next SSTable. This will make subsequent reads slower, since the column now needs to be read from both SSTables and reconciled. (Similarly if a flush occurs between steps 1 and 2.)

Just set the "date" column to hold an empty string. That's what's typically used instead of null.

If you want to delete the column, just delete the column explicitly instead of deleting the entire row. The performance effect of this is similar to writing an empty string for the column value.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!