Kafka介绍和使用

匿名 (未验证) 提交于 2019-12-03 00:25:02

什么是Kafka?

分布式流数据平台,类似消息队列

流数据平台可以:1、发布和订阅流数据

用于暂存重要数据)


Kafka通常用来做告诉输出的缓存,这是由于它变态的性能。另外它可以暂存重要数据,因为从Kafka消费(读取)数据后数据是不会立即删除的,而且这些数据可以设置备份。

Kafka的API

Producer:用于向Kafka生产数据,单个producer可以对应多topics

Consumer:用于从Kafka消费数据,单个consumer可以对应多topics

Streams:用于做简单的流数据处理,可生产可消费,多topics

Connector:用于创建运行多个producer和consumer

多语言:Java等主流语言都有对应的API

Topic:records的分类,所有record都是发布到某一个topic的

Record:{key,value,timestamp}

可以把Kafka理解为不可修改的queue(of records),保存一段历史时间的数据eg.两天

每个consumer元数据只保存一个offset,可由consumer自由控制,也就是说offset不一定是简单递增的,如果有这个需要,可以返回读取两天前的数据,或者直接读取最新的数据

Partition 一个topic被分为多个partition,可存在不同机器上,record被分配到不同partition(可根据key写分区函数),减少单个节点负载

Replica 每个partition有多个复制,可存在不同机器上,由一个leader和零或多个follower组成,只有leader可读可写,follower只写,leader挂掉后从follower中选出新的leader

保证:每个partition内按输入顺序存放

其他概念

Consumer group:同一个topic的不同partition平均动态分配到一个consumer group的所有consumer上,同一个topic可以broadcast到每一个consumer group(subscriber),每一个topic被分解传输到group里的不同consumer,如果同一个group中的消费者数量多于本topic的partition数量,多余的消费者将接收不到消息

Broker:Kafka集群中的机器/服务被成为broker, 是一个物理概念。

ISR:in-syncreplicas,存活的replica,如果一个topic的partitions有三个复制,存在0,1,2号机器上,那么他的ISR就是0,1,2

Shrink:如果一个broker挂掉了,ISR会收缩(shrink),重新起来后ISR会扩大(expand)

HW(highwatermark高水位线):指partition的ISR中所对应的log的LEO(LogEndOffset)中最小的那个值

Commit(提交):指consumer读取后更新当前分区offset,有多种处理提交的方法

Kafka性能:

Producer Throughput

Single producer thread, no replication

821,557 records/sec

(78.3 MB/sec)

Consumer Throughput

Single Consumer

940,521 records/sec

(89.7 MB/sec)

这只是单个producer和consumer,单个线程,多个线程、或者多个producer或consumer,这个数据还要恐怖。

详见

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines


安装和简单命令行producer、consumer

(来自http://kafka.apache.org/quickstart)

下载解压:

$tar -xzf kafka_2.11-1.1.0.tgz  $cd kafka_2.11-1.1.0

启动Kafka自带单机版zookeeper

$bin/zookeeper-server-start.sh config/zookeeper.properties

启动Kafka

$bin/kafka-server-start.sh config/server.properties

创建topic

$bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

查看topic信息

$bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test

创建producer发送数据(命令行)

$bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test  >This is amessage  >This is anothermessage  >

创建consumer接收数据(命令行)

$bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test


查看历史数据

前一条命令加上--from-beginning

查看partition最新offset(producer的offset)

$bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092--topic test

查看现有consumer group的group id

查看当前 已有的topic

$bin/kafka-topics.sh--list --zookeeper localhost:2181



Kafka Demo实现(java

producer

package com.cloudwiz.kafkatest.example;  import java.util.Properties; import java.util.concurrent.ExecutionException;  import org.apache.kafka.clients.producer.Callback; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata;  public class Producer extends Thread{ 	private final int producerNo; 	private final String topic; 	private final KafkaProducer<Integer,String> producer; 	 	public Producer(String topic,int producerNo) { 		this.topic = topic; 		this.producerNo=producerNo; 		         Properties props = new Properties();         props.put("bootstrap.servers", KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT);         props.put("client.id", "Producer_"+producerNo);         props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");         props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");         producer = new KafkaProducer<>(props);  	} 	 	public void run() { 		int messageNo = 0; 		while(true) { 			String messagestr = "Message_"+messageNo+"_from_producer_"+producerNo ; 			try { 				//同步发送 				RecordMetadata recordMetadata = producer.send(new ProducerRecord<>(topic , messageNo , messagestr)).get(); 				System.out.println("Sent message:"+messagestr+", partition_"+recordMetadata.partition() + ", offset"+recordMetadata.offset()); 				//异步发送 //				producer.send(new ProducerRecord<>(topic , messageNo , messagestr),new ProducerCallback()); //				System.out.println("Sent message:"+messagestr); 				//发送并忘记 //				producer.send(new ProducerRecord<>(topic , messageNo , messagestr)); 			} catch (InterruptedException | ExecutionException e) { 				e.printStackTrace(); 			} 			++messageNo; 			try { 				Thread.sleep(300); 			} catch (InterruptedException e) { 				e.printStackTrace(); 			} 		} 	} 	 	/** 	 * @param args [0]:topic [1] 一次生成producer数量 	 */ 	public static void main(String[] args) { 		int numProducer = Integer.parseInt(args[1]); 		for(int i = 0;i<numProducer;i++) { 			Producer producer = new Producer(args[0] , i); 			producer.start(); 		} 	} }  class ProducerCallback implements Callback{  	@Override 	public void onCompletion(RecordMetadata metadata, Exception e) { 		if(e!=null)e.printStackTrace(); 	} 	 } 

consumer

package com.cloudwiz.kafkatest.example;  import java.util.Collections; import java.util.Properties;  import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer;  public class Consumer extends Thread{ 	private final String topic; 	private final KafkaConsumer<Integer,String> consumer; 	private int consumerNo; 	private String groupId; 	 	Consumer(String topic,String groupId,int consumerNo){ 		this.topic=topic; 		 		this.groupId = groupId; 		Properties props = new Properties(); 		props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT);         props.put(ConsumerConfig.GROUP_ID_CONFIG, this.groupId);         props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");         props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");         props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000");         props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.IntegerDeserializer");         props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer"); 		consumer = new KafkaConsumer<>(props); 		 		this.consumerNo = consumerNo; 	} 	 	@Override 	public void run() { 		while(true) { //			consumer.subscribe(Collections.singletonList(topic)); 			ConsumerRecords<Integer,String> records = consumer.poll(1000); 			for(ConsumerRecord<Integer,String> record:records) { 				System.out.println("Consumer No." + this.consumerNo + " in group " + groupId + " received a record, " 						+ "key = "+record.key()+", value = "+record.value()+", partiton = "+record.partition()+", offset = "+record.offset()); 			} 		} 	} 	 	 	/** 	 * @param args [0]:topic [1]:一次生成consumer数量  [2]:consumer group id 	 */ 	public static void main(String[] args) { 		int numConsumer = Integer.parseInt(args[1]); 		for(int i = 0;i<numConsumer;i++) { 			Thread consumer = new Consumer(args[0],args[2],i); 			consumer.start(); 		} 	} } 

KafkaProperties

package com.cloudwiz.kafkatest.example;  public class KafkaProperties { 	public static final String KAFKA_SERVER_URL = "192.168.235.138"; 	public static final String KAFKA_SERVER_PORT = "9092"; } 

利用AdminClient API创建新topics

可指定partition数和热replica数

    //if topic already exist, will not override     private void createNewTopicInKafka(String token) {         Properties props = new Properties();         props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, ServerProperties.KAFKA_SERVER_IP +":"+ServerProperties.KAFKA_SERVER_PORT);         props.put(AdminClientConfig.CLIENT_ID_CONFIG, "admin");         AdminClient admin = AdminClient.create(props);          try {             if(admin.listTopics().names().get().contains(token))return;              CreateTopicsResult res = admin.createTopics(Collections.singletonList(new NewTopic(token,ServerProperties.KAFKA_NUM_PARTITIONS,ServerProperties.KAFKA_NUM_RELICAS)));             res.all().get();         }catch (InterruptedException | ExecutionException e) {             e.printStackTrace();         }finally{             admin.close();         }      }

文章来源: Kafka介绍和使用
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!