Kafka Streams limiting off-heap memory

问题

We are running kafka streams applications and frequency running into off heap memory issues. Our applications are deployed and kubernetes PODs and they keep on restarting.

I am doing some investigation and found that we can limit the off heap memory by implementing RocksDBConfigSetter as shown in following example.

public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {

  // See #1 below
  private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(TOTAL_OFF_HEAP_MEMORY, -1, false, INDEX_FILTER_BLOCK_RATIO);
  private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(TOTAL_MEMTABLE_MEMORY, cache);

  @Override
  public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {

    BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();

    // These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)
    tableConfig.setBlockCache(cache);
    tableConfig.setCacheIndexAndFilterBlocks(true);
    options.setWriteBufferManager(writeBufferManager);

    // These options are recommended to be set when bounding the total memory
    // See #2 below
    tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);
    tableConfig.setPinTopLevelIndexAndFilter(true);
    // See #3 below
    tableConfig.setBlockSize(BLOCK_SIZE);
    options.setMaxWriteBufferNumber(N_MEMTABLES);
    options.setWriteBufferSize(MEMTABLE_SIZE);

    options.setTableFormatConfig(tableConfig);
  }

  @Override
  public void close(final String storeName, final Options options) {
    // Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.
  }
}

In our application, we have input topics with 6 partitions and there are about 40 topics from which we are consuming data. Out appplication has just 1 topology which consumes from these topics, stores the data in statestores ( for dedup, look and some verfication). So, as per my understanding, kafka streams application will create following rocksdb instances and will need following max off heap memory. Please correct me if i am wrong.

Total rocksdb instances (assuming that each task will create its own instance of rocksdb)

6(partitions) * 40(topics) -> 240 rocksdb instances

Maximum off heap memory consumed

 240 * ( 50 (Block cache)  + 16*3(memcache) + filters(unknown))
- 240 * ~110 MB
- 26400 MB
- 25 GB

It seems to be a large number. Is the calculation correct? I know practically we should not hit this max number but is the calculation correct ?

Also, If we implement RocksDBConfigSetter and set the max off heap memory to 4 GB. Will the application complain(crash OOM) if rocksdb asks for more memory (since it is expecting about 25 GB) ?

回答1:

Not sure how many RocksDB instances you get. It depends on the structure of your program. You should check out TopologyDescription (via Topology#describe()). Sub-topologies are instantiated as tasks (base on number of partitions) and each task will have it's own RocksDB to maintain a shard of the overall state per store.

I would recommend to check out the Kafka Summit talk "Performance Tuning RocksDB for Kafka Streams' State Store": https://videos.confluent.io/watch/Ud6dtEC3DMYEtmK3dMK5ci

Also, If we implement RocksDBConfigSetter and set the max off heap memory to 4 GB. Will the application complain(crash OOM) if rocksdb asks for more memory (since it is expecting about 25 GB) ?

It won't crash. RocksDB will spill to disk. Being able to spill to disk is the reason why we use a persistent state store (and not an in-memory state store) by default. It allows to hold state that is larger than main-memory. As you use Kubernetes, you should attach corresponding volumes to your containers and size them accordingly (cf https://docs.confluent.io/platform/current/streams/sizing.html). You might also want to watch Kafka Summit talk "Deploying Kafka Streams Applications with Docker and Kubernetes": https://www.confluent.io/kafka-summit-sf18/deploying-kafka-streams-applications/

If state is larger than main-memory, you might also want to monitor RocksDB metrics if you run into per issues to tune the different "buffers" accordingly: https://docs.confluent.io/platform/current/streams/monitoring.html#rocksdb-metrics

来源：https://stackoverflow.com/questions/65814205/kafka-streams-limiting-off-heap-memory

标签

apache-kafka

apache-kafka-streams

rocksdb