Is there a way to delete all the data from a topic or delete the topic before every run?

后端 未结 13 1874
陌清茗
陌清茗 2020-12-12 10:33

Is there a way to delete all the data from a topic or delete the topic before every run?

Can I modify the KafkaConfig.scala file to change the logRetentionHour

相关标签:
13条回答
  • 2020-12-12 11:26

    In manually deleting a topic from a kafka cluster , you just might check this out https://github.com/darrenfu/bigdata/issues/6 A vital step missed a lot in most solution is in deleting the /config/topics/<topic_name> in ZK.

    0 讨论(0)
  • 2020-12-12 11:27

    I use the utility below to cleanup after my integration test run.

    It uses the latest AdminZkClient api. The older api has been deprecated.

    import javax.inject.Inject
    import kafka.zk.{AdminZkClient, KafkaZkClient}
    import org.apache.kafka.common.utils.Time
    
    class ZookeeperUtils @Inject() (config: AppConfig) {
    
      val testTopic = "users_1"
    
      val zkHost = config.KafkaConfig.zkHost
      val sessionTimeoutMs = 10 * 1000
      val connectionTimeoutMs = 60 * 1000
      val isSecure = false
      val maxInFlightRequests = 10
      val time: Time = Time.SYSTEM
    
      def cleanupTopic(config: AppConfig) = {
    
        val zkClient = KafkaZkClient.apply(zkHost, isSecure, sessionTimeoutMs, connectionTimeoutMs, maxInFlightRequests, time)
        val zkUtils = new AdminZkClient(zkClient)
    
        val pp = new Properties()
        pp.setProperty("delete.retention.ms", "10")
        pp.setProperty("file.delete.delay.ms", "1000")
        zkUtils.changeTopicConfig(testTopic , pp)
        //    zkUtils.deleteTopic(testTopic)
    
        println("Waiting for topic to be purged. Then reset to retain records for the run")
        Thread.sleep(60000L)
    
        val resetProps = new Properties()
        resetProps.setProperty("delete.retention.ms", "3000000")
        resetProps.setProperty("file.delete.delay.ms", "4000000")
        zkUtils.changeTopicConfig(testTopic , resetProps)
    
      }
    
    
    }
    

    There is an option delete topic. But, it marks the topic for deletion. Zookeeper later deletes the topic. Since this can be unpredictably long, I prefer the retention.ms approach

    0 讨论(0)
  • 2020-12-12 11:29

    Don't think it is supported yet. Take a look at this JIRA issue "Add delete topic support".

    To delete manually:

    1. Shutdown the cluster
    2. Clean kafka log dir (specified by the log.dir attribute in kafka config file ) as well the zookeeper data
    3. Restart the cluster

    For any given topic what you can do is

    1. Stop kafka
    2. Clean kafka log specific to partition, kafka stores its log file in a format of "logDir/topic-partition" so for a topic named "MyTopic" the log for partition id 0 will be stored in /tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the log.dir attribute
    3. Restart kafka

    This is NOT a good and recommended approach but it should work. In the Kafka broker config file the log.retention.hours.per.topic attribute is used to define The number of hours to keep a log file before deleting it for some specific topic

    Also, is there a way the messages gets deleted as soon as the consumer reads it?

    From the Kafka Documentation :

    The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the log retention is set to two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.

    In fact the only metadata retained on a per-consumer basis is the position of the consumer in in the log, called the "offset". This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads messages, but in fact the position is controlled by the consumer and it can consume messages in any order it likes. For example a consumer can reset to an older offset to reprocess.

    For finding the start offset to read in Kafka 0.8 Simple Consumer example they say

    Kafka includes two constants to help, kafka.api.OffsetRequest.EarliestTime() finds the beginning of the data in the logs and starts streaming from there, kafka.api.OffsetRequest.LatestTime() will only stream new messages.

    You can also find the example code there for managing the offset at your consumer end.

        public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
                                     long whichTime, String clientName) {
        TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
        Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
        requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
        kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);
        OffsetResponse response = consumer.getOffsetsBefore(request);
    
        if (response.hasError()) {
            System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );
            return 0;
        }
        long[] offsets = response.offsets(topic, partition);
        return offsets[0];
    }
    
    0 讨论(0)
  • 2020-12-12 11:30

    All data about topics and its partitions are stored in tmp/kafka-logs/. Moreover they are stored in a format topic-partionNumber, so if you want to delete a topic newTopic, you can:

    • stop kafka
    • delete the files rm -rf /tmp/kafka-logs/newTopic-*
    0 讨论(0)
  • 2020-12-12 11:33

    Tested with kafka 0.10

    1. stop zookeeper & Kafka server,
    2. then go to 'kafka-logs' folder , there you will see list of kafka topic folders, delete folder with topic name
    3. go to 'zookeeper-data' folder , delete data inside that.
    4. start zookeeper & kafka server again.
    

    Note : if you are deleting topic folder/s inside kafka-logs but not from zookeeper-data folder, then you will see topics are still there.

    0 讨论(0)
  • 2020-12-12 11:36

    As a dirty workaround, you can adjust per-topic runtime retention settings, e.g. bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my_topic --config retention.bytes=1 (retention.bytes=0 might also work)

    After a short while kafka should free the space. Not sure if this has any implications compared to re-creating the topic.

    ps. Better bring retention settings back, once kafka done with cleaning.

    You can also use retention.ms to persist historical data

    0 讨论(0)
提交回复
热议问题