postgresql , etcd , patroni 做failover

os: centos 7.4
etcd:3.2

主从IP信息
192.168.56.101 node1 master
192.168.56.102 node2 slave
192.168.56.103 node3 slave

yum下载、安装
# yum install etcd

# yum list installed |grep -i etcd
etcd.x86_64 3.2.18-1.el7 @extras

建议使用yum来安装，简单，直接。

etcd 配置
使用 wget 方式和 yum 方式，会在一些文件路径有有所差异。
以 wget 方式的所有文件都在 /usr/etcd-v3.2.18 下
下面是以yum方式为例配置，

# cp /etc/etcd/etcd.conf /etc/etcd/etcd.conf.bak
# vi /etc/etcd/etcd.conf

# cat etcd.conf
#[Member]
#ETCD_CORS=""
ETCD_DATA_DIR="/var/lib/etcd/node1.etcd"
#ETCD_WAL_DIR=""
ETCD_LISTEN_PEER_URLS="http://192.168.56.101:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.56.101:2379,http://127.0.0.1:2379"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
ETCD_NAME="node1"
#ETCD_SNAPSHOT_COUNT="100000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
#ETCD_QUOTA_BACKEND_BYTES="0"
#ETCD_MAX_REQUEST_BYTES="1572864"
#ETCD_GRPC_KEEPALIVE_MIN_TIME="5s"
#ETCD_GRPC_KEEPALIVE_INTERVAL="2h0m0s"
#ETCD_GRPC_KEEPALIVE_TIMEOUT="20s"
#
#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.56.101:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.56.101:2379"
#ETCD_DISCOVERY=""
#ETCD_DISCOVERY_FALLBACK="proxy"
#ETCD_DISCOVERY_PROXY=""
#ETCD_DISCOVERY_SRV=""
ETCD_INITIAL_CLUSTER="node1=http://192.168.56.101:2380,node2=http://192.168.56.102:2380,node3=http://192.168.56.103:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
#ETCD_STRICT_RECONFIG_CHECK="true"
#ETCD_ENABLE_V2="true"
#
#[Proxy]
#ETCD_PROXY="off"
#ETCD_PROXY_FAILURE_WAIT="5000"
#ETCD_PROXY_REFRESH_INTERVAL="30000"
#ETCD_PROXY_DIAL_TIMEOUT="1000"
#ETCD_PROXY_WRITE_TIMEOUT="5000"
#ETCD_PROXY_READ_TIMEOUT="0"
#
#[Security]
#ETCD_CERT_FILE=""
#ETCD_KEY_FILE=""
#ETCD_CLIENT_CERT_AUTH="false"
#ETCD_TRUSTED_CA_FILE=""
#ETCD_AUTO_TLS="false"
#ETCD_PEER_CERT_FILE=""
#ETCD_PEER_KEY_FILE=""
#ETCD_PEER_CLIENT_CERT_AUTH="false"
#ETCD_PEER_TRUSTED_CA_FILE=""
#ETCD_PEER_AUTO_TLS="false"
#
#[Logging]
#ETCD_DEBUG="false"
#ETCD_LOG_PACKAGE_LEVELS=""
#ETCD_LOG_OUTPUT="default"
#
#[Unsafe]
#ETCD_FORCE_NEW_CLUSTER="false"
#
#[Version]
#ETCD_VERSION="false"
#ETCD_AUTO_COMPACTION_RETENTION="0"
#
#[Profiling]
#ETCD_ENABLE_PPROF="false"
#ETCD_METRICS="basic"
#
#[Auth]
#ETCD_AUTH_TOKEN="simple"

# egrep ^[A-Z] ./etcd.conf
ETCD_DATA_DIR="/var/lib/etcd/node1.etcd"
ETCD_LISTEN_PEER_URLS="http://192.168.56.101:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.56.101:2379,http://127.0.0.1:2379"
ETCD_NAME="node1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.56.101:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.56.101:2379"
ETCD_INITIAL_CLUSTER="node1=http://192.168.56.101:2380,node2=http://192.168.56.102:2380,node3=http://192.168.56.103:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

修改 etcd.service
# systemctl status etcd.service
● etcd.service - Etcd Server
Loaded: loaded (/usr/lib/systemd/system/etcd.service; disabled; vendor preset: disabled)
Active: inactive (dead)

# vi /usr/lib/systemd/system/etcd.service
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd \
--name=\"${ETCD_NAME}\" \
--data-dir=\"${ETCD_DATA_DIR}\" \
--listen-peer-urls=\"${ETCD_LISTEN_PEER_URLS}\" \
--listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\" \
--initial-advertise-peer-urls=\"${ETCD_INITIAL_ADVERTISE_PEER_URLS}\" \
--advertise-client-urls=\"${ETCD_ADVERTISE_CLIENT_URLS}\" \
--initial-cluster=\"${ETCD_INITIAL_CLUSTER}\" \
--initial-cluster-token=\"${ETCD_INITIAL_CLUSTER_TOKEN}\" \
--initial-cluster-state=\"${ETCD_INITIAL_CLUSTER_STATE}\""
Restart=on-failure
LimitNOFILE=65536
以上操作在 node1、node2、node3 节点上都需要操作,需要修改对应的ip。

启动 etcd
依次启动 node1、node2、node3 节点的 etcd

# systemctl start etcd.service
# systemctl status etcd.service
# systemctl enable etcd.service

验证 etcd
# etcdctl ls /
# etcdctl cluster-health
member ca933ab8cfffe553 is healthy: got healthy result from http://192.168.56.101:2379
member f63afbe816fb463d is healthy: got healthy result from http://192.168.56.102:2379
member d44832212a08c43f is healthy: got healthy result from http://192.168.56.103:2379
cluster is healthy

# etcdctl member list
ca933ab8cfffe553: name=node1 peerURLs=http://192.168.56.101:2380 clientURLs=http://192.168.56.101:2379 isLeader=false
d44832212a08c43f: name=node3 peerURLs=http://192.168.56.103:2380 clientURLs=http://192.168.56.103:2379 isLeader=true
f63afbe816fb463d: name=node2 peerURLs=http://192.168.56.102:2380 clientURLs=http://192.168.56.102:2379 isLeader=false

ok，etcd 运行正常。

node1 上关掉 etcd 服务

# systemctl stop etcd
# etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
; error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

node2 上检查etcd集群信息

# etcdctl cluster-health
failed to check the health of member ca933ab8cfffe553 on http://192.168.56.101:2379: Get http://192.168.56.101:2379/health: dial tcp 192.168.56.101:2379: getsockopt: connection refused
member ca933ab8cfffe553 is unreachable: [http://192.168.56.101:2379] are all unreachable
member d44832212a08c43f is healthy: got healthy result from http://192.168.56.103:2379
member f63afbe816fb463d is healthy: got healthy result from http://192.168.56.102:2379
cluster is healthy

会看到有如下告警信息，是因为各节点的时间有差异。

Jul 11 03:03:40 node1 etcd[1657]: the clock difference against peer f63afbe816fb463d is too high [3.132011837s > 1s]
Jul 11 03:04:05 node1 etcd[1657]: the clock difference against peer d44832212a08c43f is too high [4.491729602s > 1s]
Jul 11 03:04:10 node1 etcd[1657]: the clock difference against peer f63afbe816fb463d is too high [3.132795377s > 1s]
Jul 11 03:04:35 node1 etcd[1657]: the clock difference against peer d44832212a08c43f is too high [4.492031513s > 1s]
Jul 11 03:04:40 node1 etcd[1657]: the clock difference against peer f63afbe816fb463d is too high [3.132249351s > 1s]

crontab 定时同步时间

# crontab -e
0 1 * * * /usr/sbin/ntpdate ntp.sjtu.edu.cn >> /var/log/ntpdate.log 2>&1 &
配置随OS启动

# systemctl enable etcd

安装postgresql并配置好stream
node1、node2、node3 节点上注意设置如下几个参数

synchronous_commit = on
full_page_writes = on
wal_log_hints = on
synchronous_standby_names = '*'
max_replication_slots = 10

主要是为了使用 pg_rewind,尽量不用 synchronous 方式复制数据，性能影响太大。

node1上创建复制槽，至关重要，patroni 用到了这个玩意

postgres=# create user replicator replication login encrypted password '1qaz2wsx';

postgres=# select * from pg_create_physical_replication_slot('pgsql96_node1');
postgres=# select * from pg_create_physical_replication_slot('pgsql96_node2');
postgres=# select * from pg_create_physical_replication_slot('pgsql96_node3');

node2、node3 配置stream replication

$ /usr/pgsql-9.6/bin/pg_ctl stop -m fast -D /var/lib/pgsql/9.6/main

$ cd /var/lib/pgsql/9.6/main
$ rm -rf ./*
$ /usr/pgsql-9.6/bin/pg_basebackup -h 192.168.56.101 -D /var/lib/pgsql/9.6/main -U replicator -v -P -R

$ vi recovery.conf
recovery_target_timeline = 'latest'
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=1qaz2wsx'
primary_slot_name = 'pgsql96_node1'
trigger_file = '/tmp/postgresql.trigger.5432'

$ /usr/pgsql-9.6/bin/pg_ctl start -D /var/lib/pgsql/9.6/main -o "-c config_file=/etc/postgresql/9.6/main/postgresql.conf"

注意 recovery.conf 的 primary_slot_name 在不同节点值会不同。

添加复制功能条目 pg_hba.conf

$ vi pg_hba.conf

# Database administrative login by Unix domain socket
local all postgres peer

# TYPE DATABASE USER ADDRESS METHOD

# "local" is for Unix domain socket connections only
local all all peer

# IPv4 local connections:
host all postgres 127.0.0.1/32 trust
host all all 127.0.0.1/32 md5
host all all 192.168.56.0/24 md5

# IPv6 local connections:
host all all ::1/128 md5

# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication replicator peer
host replication replicator 127.0.0.1/32 md5
host replication replicator ::1/128 md5

host replication replicator 192.168.56.101/32 md5
host replication replicator 192.168.56.102/32 md5
host replication replicator 192.168.56.103/32 md5

$ psql -c "select pg_reload_conf();"

查看复制状态

postgres=# select client_addr,
pg_xlog_location_diff(sent_location, write_location) as write_delay,
pg_xlog_location_diff(sent_location, flush_location) as flush_delay,
pg_xlog_location_diff(sent_location, replay_location) as replay_delay
from pg_stat_replication;

client_addr | write_delay | flush_delay | replay_delay
----------------+-------------+-------------+--------------
192.168.56.102 | 0 | 0 | 0
192.168.56.103 | 0 | 0 | 0
(2 row)

安装etcd
参考上一篇blog安装好etcd

下载、安装 patroni
用户也可以参考 https://www.linode.com/docs/databases/postgresql/create-a-highly-available-postgresql-cluster-using-patroni-and-haproxy/

个人觉得上篇文章中 etcd 做成单点不太合适，当然作为参考完全没有问题。

# cd /tmp
# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
# python get-pip.py
# pip install patroni[etcd,consul]

patroni的一些依赖

urllib3>=1.19.1,!=1.21
boto
psycopg2>=2.5.4
PyYAML
requests
six>=1.7
kazoo>=1.3.1
python-etcd>=0.4.3,<0.5
python-consul>=0.7.0
click>=4.1
prettytable>=0.7
tzlocal
python-dateutil
psutil
cdiff
kubernetes>=2.0.0,<=6.0.0,!=4.0.*,!=5.0.*
patroni 的配置
# which patroni
/usr/bin/patroni

# patroni --help
/usr/lib64/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
Usage: /usr/bin/patroni config.yml
Patroni may also read the configuration from the PATRONI_CONFIGURATION environment variable

错误提示 please use “pip install psycopg2-binary” instead

# pip install psycopg2-binary
1
patroni 配置文件

# mkdir -p /usr/patroni/conf
# cd /usr/patroni/conf/

# vi patroni_postgresql.yml

scope: pgsql96
namespace: /pgsql/
name: pgsql96_node1

restapi:
listen: 192.168.56.101:8008
connect_address: 192.168.56.101:8008

etcd:
host: 192.168.56.101:2379

bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: "0.0.0.0"
port: 5432
wal_level: logical
hot_standby: "on"
wal_keep_segments: 1000
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
# archive_mode: "on"
# archive_timeout: 1800s
# archive_command: gzip < %p > /data/backup/pgwalarchive/%f.gz
# recovery_conf:
# restore_command: gunzip < /data/backup/pgwalarchive/%f.gz > %p

postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.56.101:5432
data_dir: /var/lib/pgsql/9.6/data
bin_dir: /usr/pgsql-9.6/bin
# config_dir: /etc/postgresql/9.6/main
authentication:
replication:
username: replicator
password: 1qaz2wsx
superuser:
username: postgres
password: 1qaz2wsx

#watchdog:
# mode: automatic # Allowed values: off, automatic, required
# device: /dev/watchdog
# safety_margin: 5

tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false

上面的配置和后面的输出信息有细微差异，是因为当时实验完成后又对这个配置文件做了好几次修正，方便大家直接copy使用。

手动启动 patroni
参数将按以下顺序应用(运行时被赋予最高优先级)：

1、从文件加载参数postgresql.base.conf(或从自定义conf文件(如果已设置)
2、从文件加载参数postgresql.conf
3、从文件加载参数postgresql.auto.conf
4、运行时参数使用-o-name=value

node1、node2、node3 三个节点依次启动

$ patroni /usr/patroni/conf/patroni_postgresql.yml

node1 的日志如下

2018-07-11 18:17:22,402 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 18:17:22,430 INFO: no action. i am the leader with the lock
2018-07-11 18:17:32,403 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 18:17:32,432 INFO: no action. i am the leader with the lock

node2 的日志如下

2018-07-11 18:17:22,421 INFO: Lock owner: pg96_101; I am pg96_102
2018-07-11 18:17:22,421 INFO: does not have lock
2018-07-11 18:17:22,435 INFO: no action. i am a secondary and i am following a leader
2018-07-11 18:17:32,426 INFO: Lock owner: pg96_101; I am pg96_102
2018-07-11 18:17:32,426 INFO: does not have lock
2018-07-11 18:17:32,436 INFO: no action. i am a secondary and i am following a leader

node3 的日志如下

2018-07-11 18:17:22,409 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 18:17:22,410 INFO: does not have lock
2018-07-11 18:17:22,423 INFO: no action. i am a secondary and i am following a leader
2018-07-11 18:17:32,415 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 18:17:32,415 INFO: does not have lock
2018-07-11 18:17:32,425 INFO: no action. i am a secondary and i am following a leader

查看集群状态
查看 patroni 集群状态

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster | Member | Host | Role | State | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
| pg96 | pg96_101 | 192.168.56.101 | Leader | running | 0.0 |
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
| pg96 | pg96_103 | 192.168.56.103 | | running | 0.0 |
+---------+----------+----------------+--------+---------+-----------+

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml show-config pg96
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
parameters:
listen_addresses: '*'
port: 5432
use_pg_rewind: true
retry_timeout: 10
ttl: 30

查看 etcd 的信息

$ etcdctl ls /pg96/pg96/
/pg96/pg96/members
/pg96/pg96/initialize
/pg96/pg96/leader
/pg96/pg96/config
/pg96/pg96/optime

$ etcdctl get /pg96/pg96/members/pg96_101
{"conn_url":"postgres://192.168.56.101:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","timeline":1,"state":"running","role":"master","xlog_location":50378640}
$ etcdctl get /pg96/pg96/members/pg96_102
{"conn_url":"postgres://192.168.56.102:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","timeline":1,"state":"running","role":"replica","xlog_location":50378640}
$ etcdctl get /pg96/pg96/members/pg96_103
{"conn_url":"postgres://192.168.56.103:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","timeline":1,"state":"running","role":"replica","xlog_location":50378640}

$ etcdctl get /pg96/pg96/initialize
6576484813966394513

$ etcdctl get /pg96/pg96/leader
pg96_101

$ etcdctl get /pg96/pg96/config
{"ttl":30,"maximum_lag_on_failover":1048576,"retry_timeout":10,"postgresql":{"use_pg_rewind":true,"parameters":{"listen_addresses":"*","port":5432}},"loop_wait":10}

$ etcdctl get /pg96/pg96/optime/leader
50378640

connction
using jdbc:

jdbc:postgresql://node1,node2,node3/postgres?targetServerType=master

libpq starting from postgresql 10:

postgresql://node1:port,node2:port,node3:port/?target_session_attrs=read-write

配置随OS启动
# vi /etc/rc.local
su - postgres -c "/usr/bin/patroni /usr/patroni/conf/patroni_postgresql.yml >> /var/log/postgresql/patroni.log 2>&1 &"

或者配置成 patroni.service
# vi /etc/systemd/system/patroni.service

[Unit]
Description=patroni - a high-availability PostgreSQL
Documentation=https://patroni.readthedocs.io/en/latest/index.html
After=syslog.target network.target etcd.target
Wants=network-online.target

[Service]
Type=simple
User=postgres
Group=postgres
PermissionsStartOnly=true
ExecStart=/usr/bin/patroni /usr/patroni/conf/patroni_postgresql.yml
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
KillMode=process
KillSignal=SIGINT
Restart=on-abnormal
RestartSec=30s
TimeoutSec=0

[Install]
WantedBy=multi-user.target

# systemctl status patroni
# systemctl start patroni
# systemctl enable patroni

# systemctl status postgresql
# systemctl disable postgresql

# systemctl status etcd
# systemctl enable etcd

禁止 postgresql 的自启动，通过 patroni 来管理 postgresql。

总结：
个人感觉 etcd + patroni 还是相当不错的，会继续对patroni 研究下。

手动 switchover
切换前的状态

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster | Member | Host | Role | State | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
| pg96 | pg96_101 | 192.168.56.101 | | running | 0.0 |
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
+---------+----------+----------------+--------+---------+-----------+

执行手动切换

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml switchover
Master [pg96_103]: pg96_103
Candidate ['pg96_101', 'pg96_102'] []: pg96_101
When should the switchover take place (e.g. 2015-10-01T14:30) [now]: now
Current cluster topology
+---------+----------+----------------+--------+---------+-----------+
| Cluster | Member | Host | Role | State | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
| pg96 | pg96_101 | 192.168.56.101 | | running | 0.0 |
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
+---------+----------+----------------+--------+---------+-----------+
Are you sure you want to switchover cluster pg96, demoting current master pg96_103? [y/N]: y
Switchover failed, details: 503, Switchover failed

node1 的日志如下

2018-07-11 23:26:34,635 INFO: received switchover request with leader=pg96_103 candidate=pg96_101 scheduled_at=None
2018-07-11 23:26:34,645 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 22:03:55.130 CST", "timeline": 3, "xlog": {"received_location": 50385856, "replayed_timestamp": "2018-07-11 22:34:29.725 CST", "paused": false, "replayed_location": 50385856}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
2018-07-11 23:26:39,126 INFO: Lock owner: pg96_103; I am pg96_101
2018-07-11 23:26:39,126 INFO: does not have lock
2018-07-11 23:26:39,142 INFO: no action. i am a secondary and i am following a leader

node3 的日志如下

2018-07-11 23:27:06,254 INFO: Lock owner: pg96_103; I am pg96_103
2018-07-11 23:27:06,274 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 3, "xlog": {"location": 50385856}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "replication": [{"sync_state": "potential", "sync_priority": 2, "client_addr": "192.168.56.102", "state": "streaming", "application_name": "pg96_102", "usename": "replicator"}, {"sync_state": "sync", "sync_priority": 1, "client_addr": "192.168.56.101", "state": "streaming", "application_name": "pg96_101", "usename": "replicator"}], "state": "running", "role": "master", "server_version": 90609}
2018-07-11 23:27:06,364 INFO: Member pg96_101 exceeds maximum replication lag
2018-07-11 23:27:06,365 WARNING: manual failover: no healthy members found, failover is not possible
2018-07-11 23:27:06,365 INFO: Cleaning up failover key
2018-07-11 23:27:06,389 INFO: no action. i am the leader with the lock

在node3的日志输出发现 WARNING: manual failover: no healthy members found, failover is not possible
先记录下，研究明白后再补充

验证failover
node1 的 master 关闭
# systemctl stop postgresql-9.6.service
mode1 的 patroni 马上就有信息输出
.
2018-07-11 21:43:52,402 INFO: Lock owner: pg96_101; I am pg96_101
.
.
2018-07-11 21:43:52,441 INFO: no action. i am the leader with the lock
.
.
2018-07-11 21:44:02,405 WARNING: Postgresql is not running.
.
.
2018-07-11 21:44:02,406 INFO: Lock owner: pg96_101; I am pg96_101
.
.
2018-07-11 21:44:02,444 INFO: Lock owner: pg96_101; I am pg96_101
.
.
2018-07-11 21:44:02,455 INFO: starting as readonly because i had the session lock
.
.
2018-07-11 21:44:02,456 INFO: closed patroni connection to the postgresql cluster
.
.
2018-07-11 21:44:02,491 INFO: postmaster pid=11705
.
.
192.168.56.101:5432 - no response
.
.
< 2018-07-11 21:44:02.525 CST > LOG: redirecting log output to logging collector process
.
.
< 2018-07-11 21:44:02.525 CST > HINT: Future log output will appear in directory "pg_log".
.
.
192.168.56.101:5432 - accepting connections
.
.
192.168.56.101:5432 - accepting connections
.
.
2018-07-11 21:44:03,555 INFO: Lock owner: pg96_101; I am pg96_101
.
.
2018-07-11 21:44:03,555 INFO: establishing a new patroni connection to the postgres cluster
.
.
2018-07-11 21:44:03,597 INFO: promoted self to leader because i had the session lock
.
.
server promoting
.
.
2018-07-11 21:44:03,603 INFO: cleared rewind state after becoming the leader
.
看到日志输出，马上就把 master 拉起来了。
node1 的 os 掉电
节点掉电是一种极端的情况，在各种ha架构中都会模拟。
可以看到其中一个节点的patroni 很快就有信息输出
.
2018-07-11 21:49:44,632 INFO: Lock owner: pg96_101; I am pg96_103
.
.
2018-07-11 21:49:44,632 INFO: does not have lock
.
.
2018-07-11 21:49:44,642 INFO: no action. i am a secondary and i am following a leader
.
.
2018-07-11 21:49:55,140 INFO: Selected new etcd server http://192.168.56.101:2379
.
.
2018-07-11 21:49:57,643 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2072ca2b50>, u'Connection to 192.168.56.101 timed out. (connect timeout=2.5)')': /v2/keys/pg96/pg96/?recursive=true
.
.
2018-07-11 21:50:00,148 ERROR: Request to server http://192.168.56.101:2379 failed: MaxRetryError(u"HTTPConnectionPool(host=u'192.168.56.101', port=2379): Max retries exceeded with url: /v2/keys/pg96/pg96/?recursive=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2072ca2c10>, u'Connection to 192.168.56.101 timed out. (connect timeout=2.5)'))",)
.
.
2018-07-11 21:50:00,149 INFO: Reconnection allowed, looking for another server.
.
.
2018-07-11 21:50:00,149 INFO: Selected new etcd server http://192.168.56.102:2379
.
.
2018-07-11 21:50:00,172 INFO: Lock owner: pg96_101; I am pg96_103
.
.
2018-07-11 21:50:00,172 INFO: does not have lock
.
.
2018-07-11 21:50:00,191 INFO: no action. i am a secondary and i am following a leader
.
.
2018-07-11 21:50:05,137 INFO: Selected new etcd server http://192.168.56.103:2379
.
.
2018-07-11 21:50:05,141 INFO: Lock owner: pg96_101; I am pg96_103
.
.
2018-07-11 21:50:05,141 INFO: does not have lock
.
.
2018-07-11 21:50:05,146 INFO: no action. i am a secondary and i am following a leader
.
.
2018-07-11 21:50:15,060 INFO: Got response from pg96_102 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 2, "xlog": {"received_location": 50379696, "replayed_timestamp": "2018-07-11 18:03:34.386 CST", "paused": false, "replayed_location": 50379696}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
.
.
2018-07-11 21:50:15,066 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 2, "xlog": {"received_location": 50379696, "replayed_timestamp": "2018-07-11 18:03:34.386 CST", "paused": false, "replayed_location": 50379696}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
.
.
2018-07-11 21:50:15,113 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
.
.
2018-07-11 21:50:15,119 INFO: promoted self to leader by acquiring session lock
.
.
server promoting
.
.
2018-07-11 21:50:15,190 INFO: cleared rewind state after becoming the leader
.
.
2018-07-11 21:50:16,257 INFO: Lock owner: pg96_103; I am pg96_103
.
.
2018-07-11 21:50:16,318 INFO: no action. i am the leader with the lock
.
.
2018-07-11 21:50:26,254 INFO: Lock owner: pg96_103; I am pg96_103
.
.
2018-07-11 21:50:26,279 INFO: no action. i am the leader with the lock
.
看到日志输出有 server promoting。说明该节点的 slave 被提升为新的master
再次查看 patroni 集群状态
.
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| Cluster | Member | Host | Role | State | Lag in MB |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
.
.
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
果然如预期一样。这个时候再 node3 节点上查看复制情况。
.
select client_addr,
.
.
pg_xlog_location_diff(sent_location, write_location) as write_delay,
.
.
pg_xlog_location_diff(sent_location, flush_location) as flush_delay,
.
.
pg_xlog_location_diff(sent_location, replay_location) as replay_delay
.
.
from pg_stat_replication;
.
.

.
.
client_addr | write_delay | flush_delay | replay_delay
.
.
----------------+-------------+-------------+--------------
.
.
192.168.56.102 | 0 | 0 | 0
.
.
(1 row)
.
哈哈。
再启动node1后，查看信息
.
# ps -ef|grep -i etcd
.
.
etcd 996 1 2 21:57 ? 00:00:00 /usr/bin/etcd --name=node1 --data-dir=/var/lib/etcd/node1.etcd --listen-peer-urls=http://192.168.56.101:2380,http://127.0.0.1:2380 --listen-client-urls=http://192.168.56.101:2379,http://127.0.0.1:2379 --initial-advertise-peer-urls=http://192.168.56.101:2380 --advertise-client-urls=http://192.168.56.101:2379 --initial-cluster=node1=http://192.168.56.101:2380,node2=http://192.168.56.102:2380,node3=http://192.168.56.103:2380 --initial-cluster-token=etcd-cluster --initial-cluster-state=new
.
.
root 1486 1332 0 21:57 pts/0 00:00:00 grep --color=auto -i etcd
.
.

.
patroni 没有起来，需要后面设置为随OS启动。手动启动postgresql, patroni
.
$ mv recovery.done recovery.conf
.
.
$ cat recovery.conf
.
.
primary_slot_name = 'pg96_101'
.
.
standby_mode = 'on'
.
.
recovery_target_timeline = 'latest'
.
.
primary_conninfo = 'user=replicator password=1qaz2wsx host=192.168.56.103 port=5432 sslmode=prefer sslcompression=1 application_name=pg96_101'
.
.
# systemctl start postgresql-9.6.service
.
.
$ patroni /usr/patroni/conf/patroni_postgresql.yml
.
查看 patroni 集群状态后，发现node1的postgreql居然没有加进去。
.
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| Cluster | Member | Host | Role | State | Lag in MB |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
.
.
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
查看日志后提示信息为 "replication slot ““pg96_101"” does not exist”,“pg96_101”
奇怪了，前面明明创建了 pg96_101 的slot，node3的日志居然提示没有。
.
postgres=# select * from pg_replication_slots;
.
.
-[ RECORD 1 ]-------+----------
.
.
slot_name | pg96_102
.
.
plugin |
.
.
slot_type | physical
.
.
datoid |
.
.
database |
.
.
active | t
.
.
active_pid | 19347
.
.
xmin |
.
.
catalog_xmin |
.
.
restart_lsn | 0/300C058
.
.
confirmed_flush_lsn |
.
.
确实没有，汗，那就再尝试创建一个吧。
select * from pg_create_physical_replication_slot('pg96_101');
node3 的日志里提示：
.
$ tail -n 1000 postgresql-2018-07-11.csv
.
.
2018-07-11 22:14:27.247 CST,"replicator","",19982,"192.168.56.101:53204",5b4610c3.4e0e,3,"idle",2018-07-11 22:14:27 CST,5/0,0,ERROR,42704,"replication slot ""pg96_101"" does not exist",,,,,,,,,"pg96_101"
.
.
2018-07-11 22:14:27.248 CST,"replicator","",19982,"192.168.56.101:53204",5b4610c3.4e0e,4,"idle",2018-07-11 22:14:27 CST,,0,LOG,00000,"disconnection: session time: 0:00:00.005 user=replicator database= host=192.168.56.101 port=53204",,,,,,,,,"pg96_101"
.
.
2018-07-11 22:14:28.367 CST,"postgres","postgres",19929,"[local]",5b461068.4dd9,6,"SELECT",2018-07-11 22:12:56 CST,4/0,0,LOG,00000,"duration: 29.076 ms",,,,,,,,,"psql"
.
.
2018-07-11 22:14:30.777 CST,"postgres","postgres",19929,"[local]",5b461068.4dd9,7,"SELECT",2018-07-11 22:12:56 CST,4/0,0,LOG,00000,"duration: 1.060 ms",,,,,,,,,"psql"
.
.
2018-07-11 22:14:32.249 CST,,,19984,"192.168.56.101:53206",5b4610c8.4e10,1,"",2018-07-11 22:14:32 CST,,0,LOG,00000,"connection received: host=192.168.56.101 port=53206",,,,,,,,,""
.
.
2018-07-11 22:14:32.252 CST,"replicator","",19984,"192.168.56.101:53206",5b4610c8.4e10,2,"authentication",2018-07-11 22:14:32 CST,5/121,0,LOG,00000,"replication connection authorized: user=replicator",,,,,,,,,""
.
.

.
.
2018-07-11 22:14:32.315 CST,"replicator","",19984,"192.168.56.101:53206",5b4610c8.4e10,3,"streaming 0/300C138",2018-07-11 22:14:32 CST,5/0,0,LOG,00000,"standby ""pg96_101"" is now a synchronous standby with priority 1",,,,,,,,,"pg96_101"
.
.
2018-07-11 22:14:36.255 CST,"postgres","postgres",16066,"192.168.56.103:54424",5b45d88a.3ec2,1453,"SELECT",2018-07-11 18:14:34 CST,2/0,0,LOG,00000,"duration: 0.448 ms",,,,,,,,,"Patroni"
.
.
2018-07-11 22:14:46.258 CST,"postgres","postgres",16066,"192.168.56.103:54424",5b45d88a.3ec2,1454,"SELECT",2018-07-11 18:14:34 CST,2/0,0,LOG,00000,"duration: 0.568 ms",,,,,,,,,"Patroni"
.
稍等一会后，已经可以看到node1 已经加到slave里了
.
postgres=# select client_addr,
.
.
pg_xlog_location_diff(sent_location, write_location) as write_delay,
.
.
pg_xlog_location_diff(sent_location, flush_location) as flush_delay,
.
.
pg_xlog_location_diff(sent_location, replay_location) as replay_delay
.
.
from pg_stat_replication;
.
.
client_addr | write_delay | flush_delay | replay_delay
.
.
----------------+-------------+-------------+--------------
.
.
192.168.56.102 | 0 | 0 | 0
.
.
192.168.56.101 | 0 | 0 | 0
.
.
(2 rows)
.
但是用 patronictl 还是查看不到 node1的信息。
.
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| Cluster | Member | Host | Role | State | Lag in MB |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
.
.
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
等了一段时间，又ok了
.
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| Cluster | Member | Host | Role | State | Lag in MB |
.
.
+---------+----------+----------------+--------+---------+-----------+
.
.
| pg96 | pg96_101 | 192.168.56.101 | | running | 0.0 |
.
.
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
.
.
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
.
.
+---------+----------+----------------+--------+---------+-----------
.

etcd 的一些操作命令
# systemctl status etcd.service # systemctl start etcd.service # systemctl enable etcd.service
$ etcdctl cluster-health$ etcdctl ls$ etcdctl ls /pgsql/pgsql96/$ etcdctl ls --recursive --sort -p
$ etcdctl get /pgsql/pgsql96/members/pgsql96_node1$ etcdctl get /pgsql/pgsql96/initialize$ etcdctl get /pgsql/pgsql96/leader$ etcdctl get /pgsql/pgsql96/config
$ etcdctl rm --recursive /pgsql
$ etcdctl --help
NAME:
etcdctl - A simple command line client for etcd.
USAGE:
etcdctl [global options] command [command options] [arguments...]
VERSION:
2.2.5
AUTHOR:
Author - <unknown@email>

COMMANDS:
backup backup an etcd directory
cluster-health check the health of the etcd cluster
mk make a new key with a given value
mkdir make a new directory
rm remove a key or a directory
rmdir removes the key if it is an empty directory or a key-value pair
get retrieve the value of a key
ls retrieve a directory
set set the value of a key
setdir create a new or existing directory
update update an existing key with a given value
updatedir update an existing directory
watch watch a key for changes
exec-watch watch a key for changes and exec an executable
member member add, remove and list subcommands
import import a snapshot to a cluster
user user add, grant and revoke subcommands
role role add, grant and revoke subcommands
auth overall auth controls
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--debug output cURL commands which can be used to reproduce the request
--no-sync don't synchronize cluster information before sending request
--output, -o "simple" output response in the given format (`simple`, `extended` or `json`)
--discovery-srv, -D domain name to query for SRV records describing cluster endpoints
--peers, -C a comma-delimited list of machine addresses in the cluster (default: "http://127.0.0.1:4001,http://127.0.0.1:2379")
--endpoint a comma-delimited list of machine addresses in the cluster (default: "http://127.0.0.1:4001,http://127.0.0.1:2379")
--cert-file identify HTTPS client using this SSL certificate file
--key-file identify HTTPS client using this SSL key file
--ca-file verify certificates of HTTPS-enabled servers using this CA bundle
--username, -u provide username[:password] and prompt if password is not supplied.
--timeout "1s" connection timeout per request
--total-timeout "5s" timeout for the command execution (except watch)
--help, -h show help
--version, -v print the version
patroni 的一些操作命令
$ patroni /usr/patroni/conf/patroni_postgresql.yml

$ curl http://127.0.0.1:8008

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pgsql96
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml show-config pgsql96

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml edit-config pgsql96

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml restart pgsql96 pgsql96_node1

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml switchover pgsql96

$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml reinit pgsql96 pgsql96_node1

$ patronictl --help
Usage: patronictl [OPTIONS] COMMAND [ARGS]...

Options:
-c, --config-file TEXT Configuration file
-d, --dcs TEXT Use this DCS
-k, --insecure Allow connections to SSL sites without certs
--help Show this message and exit.

Commands:
configure Create configuration file
dsn Generate a dsn for the provided member, defaults to a dsn of...
edit-config Edit cluster configuration
failover Failover to a replica
flush Flush scheduled events
list List the Patroni members for a given Patroni
pause Disable auto failover
query Query a Patroni PostgreSQL member
reinit Reinitialize cluster member
reload Reload cluster member configuration
remove Remove cluster from DCS
restart Restart cluster member
resume Resume auto failover
scaffold Create a structure for the cluster in DCS
show-config Show cluster configuration
switchover Switchover to a replica
version Output version of patronictl command or a running Patroni...
max_connections,
max_locks_per_transaction,
wal_level,
max_wal_senders,
max_prepared_transactions,
max_replication_slots,
max_worker_processes
this parameter can not be redefined locally

本篇blog介绍下 etcd + patroni 发生切换时使用 callback 来重新设定 master 的 vip。
主要是方便自有机房或托管的，云环境貌似不能绑定固定的vip。
patroni 的一些参数
官方文档描述在callback时又这几个状态：
on_reload: run this script when configuration reload is triggered.
on_restart: run this script when the cluster restarts.
on_role_change: run this script when the cluster is being promoted or demoted.
on_start: run this script when the cluster starts.
on_stop: run this script when the cluster stops.

# su - postgres
$ vi /usr/patroni/conf/patroni_postgresql.yml

postgresql:
callbacks:
on_start: /usr/patroni/conf/patroni_callback.sh
on_stop: /usr/patroni/conf/patroni_callback.sh
on_role_change: /usr/patroni/conf/patroni_callback.sh

patroni_callback.sh
这个脚本的作用就是，当本地postgresql变为 master 时，就绑定vip，变为slave时，就删除vip。
# cd /usr/patroni/conf/# vi patroni_callback.sh
#!/bin/bashreadonly cb_name=$1readonly role=$2readonly scope=$3
function usage() {
echo "Usage: $0 <on_start|on_stop|on_role_change> <role> <scope>";
exit 1;
}
echo "this is patroni callback $cb_name $role $scope"
case $cb_name in
on_stop)
sudo ip addr del 192.168.56.100/24 dev enp0s8 label enp0s8:1
#sudo arping -q -A -c 1 -I enp0s8 192.168.56.100
sudo iptables -F
;;
on_start)
;;
on_role_change)
if [[ $role == 'master' ]]; then
sudo ip addr add 192.168.56.100/24 brd 192.168.56.255 dev enp0s8 label enp0s8:1
sudo arping -q -A -c 1 -I enp0s8 192.168.56.100
sudo iptables -F
elif [[ $role == 'slave' ]]||[[ $role == 'replica' ]]||[[ $role == 'logical' ]]; then
sudo ip addr del 192.168.56.100/24 dev enp0s8 label enp0s8:1
#sudo arping -q -A -c 1 -I enp0s8 192.168.56.100
sudo iptables -F
fi
;;
*)
usage
;;esac

修改ip后，一定要使用 arping
配置 sudo
# visudo postgres ALL=(ALL) NOPASSWD:ALL

更改权限
# chown -R postgres:postgres /usr/patroni/conf/*# ls -ltotal 8
-rwxr--r-x 1 postgres postgres 768 Aug 8 18:59 patroni_callback.sh
-rw-r--r-- 1 postgres postgres 1616 Aug 8 18:44 patroni_postgresql.yml

原文出处：https://www.cnblogs.com/kelvin19840813/p/11918401.html

来源：oschina

链接：https://my.oschina.net/u/4257943/blog/3246437

标签

YML

XLog

timeline

Got

psycopg