Tombstoned cells without DELETE

China☆狼群 提交于 2020-06-29 13:10:16

问题


I'm running Cassandra cluster

Software version: 2.0.9
Nodes: 3
Replication factor: 2

I'm having a very simple table where I insert and update data.

CREATE TABLE link_list (
      url text,
      visited boolean,
      PRIMARY KEY ((url))
    );

There is no expire on rows and I'm not doing any DELETEs. As soon as I run my application it quickly slows down due to the increasing number of tombstoned cells:

Read 3 live and 535 tombstoned cells

It gets up to thousands in few minutes.

My question is what is responsible for generating those cells if I'm not doing any deletions?

// Update

This is the implementation I'm using to talk to Cassandra with com.datastax.driver.

public class LinkListDAOCassandra implements DAO {


    public void save(Link link) {
        save(new VisitedLink(link.getUrl(), false));
    }

    @Override
    public void save(Model model) {
        save((Link) model);
    }

    public void update(VisitedLink link) {
        String cql = "UPDATE link_list SET visited = ? WHERE url = ?";
        Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getVisited(), link.getUrl());
    }

    public void save(VisitedLink link) {
        String cql = "SELECT url FROM link_list_inserted WHERE url = ?";

        if(Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl()).all().size() == 0) {
            cql = "INSERT INTO link_list_inserted (url) VALUES (?)";
            Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl());

            cql = "INSERT INTO link_list (url, visited) VALUES (?,?)";
            Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl(), link.getVisited());
        }
    }

    public VisitedLink getByUrl(String url) {
        String cql = "SELECT * FROM link_list WHERE url = ?";

        for(Row row : Cassandra.DB.execute(cql, url)) {
            return new VisitedLink(row.getString("url"), row.getBool("visited"));
        }

        return null;
    }

    public List<Link> getLinks(int limit) {
        List<Link> links = new ArrayList();
        ResultSet results;

        String cql = "SELECT * FROM link_list WHERE visited = False LIMIT ?";

        for(Row row : Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, limit)) {
            try {
                links.add(new Link(new URL(row.getString("url"))));
            }
            catch(MalformedURLException e) { }
        }

        return links;
    }
}

This is the execute implementation

public ResultSet execute(String cql, ConsistencyLevel cl, Object... values) {
        PreparedStatement statement = getSession().prepare( cql ).setConsistencyLevel(cl);
        BoundStatement boundStatement = new BoundStatement( statement );
        boundStatement.bind(values);

        return session.execute(boundStatement);
    }

// Update 2

An interesting finding from the cfstats shows that only one table has tombstones. It's link_list_visited. Does it mean that updating a column with a secondary index will create tombstones?

Table (index): link_list.link_list_visited
                SSTable count: 2
                Space used (live), bytes: 5055920
                Space used (total), bytes: 5055991
                SSTable Compression Ratio: 0.3491883995187955
                Number of keys (estimate): 256
                Memtable cell count: 15799
                Memtable data size, bytes: 1771427
                Memtable switch count: 1
                Local read count: 85703
                Local read latency: 2.805 ms
                Local write count: 484690
                Local write latency: 0.028 ms
                Pending tasks: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used, bytes: 32
                Compacted partition minimum bytes: 8240
                Compacted partition maximum bytes: 7007506
                Compacted partition mean bytes: 3703162
                Average live cells per slice (last five minutes): 3.0
                Average tombstones per slice (last five minutes): 674.0

回答1:


The only major differences between a secondary index and an extra column family to manually hold the index is that the secondary index only contains information about the current node (i.e. it does not contain information about other node's data) and the operations over the secondary index as a result of an update on the primary table are atomic operations. Other than that you can see it as a regular column family with the same weak spots, a high number of updates on the primary column family will lead to a high number of deletes on the index table because the updates on the primary table will be translated as a delete/insert operation on the index table. Said deletions in the index table are the source of the tombstones. Cassandra deletes are logical deletes until the next repair process (when the tombstones will be removed).

Hope it helps!



来源:https://stackoverflow.com/questions/25443979/tombstoned-cells-without-delete

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!