干货:使用Debezium同步Postgresql数据至Kafka,docker方式安装插件

女生的网名这么多〃 提交于 2020-08-13 06:38:56

目前国内关于Debezium的资料不多,大多都比较粗糙,本人实操的时候也确实走了很多弯路,现在在这里详细记录下来供大家参考;

项目需求是实现postgresql至es的数据同步;对此做了一些调研: https://my.oschina.net/u/3734816/blog/3210243;最终选定了debezium;这里记录一下psotgresql同步至kafka,kafka至es放在下一篇;

docker安装zk,kafka,posgresql,connect

zookeeper安装
docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper:latest

kafka安装
docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper debezium/kafka:latest

postgresql安装
docker run -it --rm --name database -p 5432:5432 -e POSTGRES_PASSWORD=xxx  -d debezium/example-postgres:latest

connect安装
docker volume create connect
docker run -itd --rm --name connect --volume connect:/kafka/connect -p 8093:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link zookeeper:zookeeper --link kafka:kafka --link postgres:postgres debezium/connect:latest
为何不可为大牛? 
解析:docker volume create connect 创建了一个connect目录用来挂载connect容器中的/kafka/connect目录(连接器插件目录,为了以后方便添加es connect plugin)
     -p 8093:8083,connect默认端口时8083,这里因为我服务器8083已占用,所以我重新映射成8093;这里connect官方文档提供了一些api接口用来管理connectors,plugins,tasks

安装完成用postman测试  get: ip:8093,安装成功会显示如下信息

{
    "version": "2.3.1",
    "commit": "18a913733fb71c01",
    "kafka_cluster_id": "sgc2CHb1TcKugWd5R6zcXA"
}

创建postgresql连接器

post ip:8093/connectors 
{
    "name": "test-connector1",      
    "config": {
        "name": "test-connector1",    
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "tasks.max": "1",
        "database.hostname": "39.106.xxx.xx",   
        "database.port": "5432",
        "database.dbname": "xdeasdb",
        "database.user": "postgres",
        "database.password": "postgres",
        "database.server.name": "know",   //自定义服务名
        "table.whitelist": "knowledge.formal_new",   //数据库表白名单,要同步的表单,模式名+表名;和上面配置的"know"生成一个topic "know.knowledge.formal_new"
        "plugin.name": "pgoutput"  //pg9后自带的输出插件
    }
}
 为何不可为大牛? 验证:获取所有的connectors:get ip:8093/connectors/ 
[
    "test-connector1"
]

订阅topic验证数据同步

进入kafka容器 docker exec -it kafka容器id /bin/bash
进入bin目录 cd bin
订阅队列在控制台输出:sh ./kafka-console-consumer.sh --bootstrap-server ip:9092 --topic  know.knowledge.formal_new  --from-beginning;
修改数据库表数据或插入数据;

输出如下信息:
"payload":{"before":null,
"after":{"id":"1","collect_id":"1","title":"test","content":"1","publish_date":1591025759000000,"collect_date":1591025761000000,"status":1,"create_date":1591025764000000,"creater":"1","update_date":1591025769000000,"updater":"1","link":"1","label":["1"],"origin":"4"},
"source":{"version":"1.1.1.Final","connector":"postgresql","name":"know","ts_ms":1591006642405,"snapshot":"false","db":"xdeasdb","schema":"knowledge","table":"knowledge_formal_new","txId":1604,"lsn":29368760,"xmin":null},
"op":"u","ts_ms":1591006642869,"transaction":null}}

 "after"字段即同步过来的最新数据,后面同步至es时我们也只需要这个字段的数据; 为何不可为大牛? 

 

 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!