PostgreSQL高可用repmgr

发表于： 2023-08-24 09:09:29
分类：网络资源
浏览：浏览(118)

repmgr简介

repmgr 是一个开源工具套件，用于管理 PostgreSQL 服务器集群中的复制和故障转移。它通过设置备用服务器、监控复制以及执行故障转移或手动切换操作等管理任务的工具增强了 PostgreSQL 的内置热备用功能。官方主页

repmgr构架

repmgr流复制管理工具对集群每个节点都有自己的repmgr.conf配置文件，用来记录本节点的ID、节点名称、连接信息、数据库PGDATA目录等配置参数。在配置好这些参数后就可以通过repmgr命令实现对集群节点的“一键式”部署。部署完成后，每个节点都有自己的repmgrd守护进程来监控节点数据库状态，且每个节点维护自己的元数据表，用于记录所有集群节点的信息。其中主节点守护进程主要用来监控本节点数据库服务状态，备节点守护进程主要用来监控主节点和本节点数据库服务状态。可以手动提升standby为primary和自动切换primary和standby角色。

下面我们就从零开始基于repmgr工具部署一套1主2从的高可用环境，并测试手动切换的场景。

环境准备

主机名	IP地址	操作系统版本	数据库版本	repmgr版本	角色
pg1	192.168.17.71	CentOS7.9	14.9	5.3.3	primary
pg2	192.168.17.72	CentOS7.9	14.9	5.3.3	standby
pg3	192.168.17.73	CentOS7.9	14.9	5.3.3	standby

下载软件地址

https://ftp.postgresql.org/pub/source/v14.9/postgresql-14.9.tar.gz

https://www.repmgr.org/download/repmgr-5.3.3.tar.gz

操作系统准备

安装CentOS7.9这里省略

关闭防火墙

systemctl stop firewalld
systemctl disable firewalld

3台主机都要操作

[root@pg1 ~]# systemctl stop firewalld
[root@pg1 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

[root@pg2 ~]# systemctl stop firewalld
[root@pg2 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

[root@pg3 ~]# systemctl stop firewalld
[root@pg3 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service

关闭SELINUX

setenforce 0
sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

3台主机都要操作

[root@pg1 ~]# setenforce 0
[root@pg1 ~]# sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

[root@pg2 ~]# setenforce 0
[root@pg2 ~]# sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

[root@pg3 ~]# setenforce 0
[root@pg3 ~]# sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

配置yum源

sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
         -e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
         -i.bak \
         /etc/yum.repos.d/CentOS-*.repo

3台主机都要操作

[root@pg1 ~]# sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
>          -e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
>          -i.bak \
>          /etc/yum.repos.d/CentOS-*.repo
[root@pg1 ~]#

[root@pg2 ~]# sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
>          -e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
>          -i.bak \
>          /etc/yum.repos.d/CentOS-*.repo
[root@pg2 ~]#

[root@pg3 ~]# sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
>          -e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
>          -i.bak \
>          /etc/yum.repos.d/CentOS-*.repo
[root@pg3 ~]#

配置时间同步

yum install -y ntpdate
ntpdate time.windows.com && hwclock -w

这里我们3台主机简单配置一下时间同步操作略

安装所需依赖的包

yum install -y readline readline-devel zlib zlib-devel gettext gettext-devel openssl openssl-devel pam pam-devel libxml2 libxml2-devel libxslt libxslt-devel perl perl-devel tcl-devel uuid-devel gcc gcc-c++ make flex bison perl-ExtUtils*

3台主机都要操作，操作略

设置hosts文件

cat >> /etc/hosts << EOF

192.168.17.71 pg1
192.168.17.72 pg2 
192.168.17.73 pg3

EOF

3台主机都要操作

[root@pg1 ~]# cat >> /etc/hosts << EOF
>
> 192.168.17.71 pg1
> 192.168.17.72 pg2
> 192.168.17.73 pg3
>
> EOF
[root@pg1 ~]# 

[root@pg2 ~]# cat >> /etc/hosts << EOF
>
> 192.168.17.71 pg1
> 192.168.17.72 pg2
> 192.168.17.73 pg3
>
> EOF
[root@pg2 ~]# 

[root@pg3 ~]# cat >> /etc/hosts << EOF
>
> 192.168.17.71 pg1
> 192.168.17.72 pg2
> 192.168.17.73 pg3
>
> EOF
[root@pg3 ~]#

添加用户

groupadd postgres
useradd -g postgres postgres
echo "postgres"|passwd --stdin postgres

3台主机都要操作

[root@pg1 ~]# groupadd postgres
[root@pg1 ~]# useradd -g postgres postgres
[root@pg1 ~]# echo "postgres"|passwd --stdin postgres
Changing password for user postgres.
passwd: all authentication tokens updated successfully.
[root@pg1 ~]#

[root@pg2 ~]# groupadd postgres
[root@pg2 ~]# useradd -g postgres postgres
[root@pg2 ~]# echo "postgres"|passwd --stdin postgres
Changing password for user postgres.
passwd: all authentication tokens updated successfully.
[root@pg2 ~]#

[root@pg3 ~]# groupadd postgres
[root@pg3 ~]# useradd -g postgres postgres
[root@pg3 ~]# echo "postgres"|passwd --stdin postgres
Changing password for user postgres.
passwd: all authentication tokens updated successfully.
[root@pg3 ~]#

创建目录

mkdir -p /opt/pg14 /opt/pgdata /opt/pgarchive
chown -R postgres:postgres /opt
chmod 0755 /opt/pg14 
chmod 0700 /opt/pgdata /opt/pgarchive

3台主机都要操作

[root@pg1 ~]# mkdir -p /opt/pg14 /opt/pgdata /opt/pgarchive
[root@pg1 ~]# chown -R postgres:postgres /opt
[root@pg1 ~]# chmod 0755 /opt/pg14
[root@pg1 ~]# chmod 0700 /opt/pgdata /opt/pgarchive
[root@pg1 ~]#

[root@pg2 ~]# mkdir -p /opt/pg14 /opt/pgdata /opt/pgarchive
[root@pg2 ~]# chown -R postgres:postgres /opt
[root@pg2 ~]# chmod 0755 /opt/pg14
[root@pg2 ~]# chmod 0700 /opt/pgdata /opt/pgarchive
[root@pg2 ~]#

[root@pg3 ~]# mkdir -p /opt/pg14 /opt/pgdata /opt/pgarchive
[root@pg3 ~]# chown -R postgres:postgres /opt
[root@pg3 ~]# chmod 0755 /opt/pg14
[root@pg3 ~]# chmod 0700 /opt/pgdata /opt/pgarchive
[root@pg3 ~]#

源码编译软件

安装PostgreSQL和repmgr

把软件包postgresql-14.9.tar.gz和repmgr-5.3.3.tar.gz上传到/home/postgres目录下

su - postgres
tar zxf postgresql-14.9.tar.gz
cd postgresql-14.9/
./configure --prefix=/opt/pg14
gmake world && gmake install-world
cd ..
tar zxf repmgr-5.3.3.tar.gz
cd repmgr-5.3.3/
./configure PG_CONFIG=/opt/pg14/bin/pg_config
make && make install

3台主机都要操作，操作略

配置环境变量

cd ~
cat >> /home/postgres/.bash_profile <<EOF

#PostgreSQL settings
export PGPORT=5432
export PGUSER=postgres
export PGHOME=/opt/pg14
export PGDATA=/opt/pgdata
export LD_LIBRARY_PATH=\$PGHOME/lib:\$LD_LIBRARY_PATH
export PATH=\$PGHOME/bin:\$PATH

EOF
source /home/postgres/.bash_profile
postgres --version
repmgr --version

3台主机都要操作

[postgres@pg1 repmgr-5.3.3]$ cd ~
[postgres@pg1 ~]$ cat >> /home/postgres/.bash_profile <<EOF
>
> #PostgreSQL settings
> export PGPORT=5432
> export PGUSER=postgres
> export PGHOME=/opt/pg14
> export PGDATA=/opt/pgdata
> export LD_LIBRARY_PATH=\$PGHOME/lib:\$LD_LIBRARY_PATH
> export PATH=\$PGHOME/bin:\$PATH
>
> EOF
[postgres@pg1 ~]$ source /home/postgres/.bash_profile
[postgres@pg1 ~]$ postgres --version
postgres (PostgreSQL) 14.9
[postgres@pg1 ~]$ repmgr --version
repmgr 5.3.3
[postgres@pg1 ~]$

[postgres@pg2 repmgr-5.3.3]$ cd ~
[postgres@pg2 ~]$ cat >> /home/postgres/.bash_profile <<EOF
>
> #PostgreSQL settings
> export PGPORT=5432
> export PGUSER=postgres
> export PGHOME=/opt/pg14
> export PGDATA=/opt/pgdata
> export LD_LIBRARY_PATH=\$PGHOME/lib:\$LD_LIBRARY_PATH
> export PATH=\$PGHOME/bin:\$PATH
>
> EOF
[postgres@pg2 ~]$ source /home/postgres/.bash_profile
[postgres@pg2 ~]$ postgres --version
postgres (PostgreSQL) 14.9
[postgres@pg2 ~]$ repmgr --version
repmgr 5.3.3
[postgres@pg2 ~]$

[postgres@pg3 repmgr-5.3.3]$ cd ~
[postgres@pg3 ~]$ cat >> /home/postgres/.bash_profile <<EOF
>
> #PostgreSQL settings
> export PGPORT=5432
> export PGUSER=postgres
> export PGHOME=/opt/pg14
> export PGDATA=/opt/pgdata
> export LD_LIBRARY_PATH=\$PGHOME/lib:\$LD_LIBRARY_PATH
> export PATH=\$PGHOME/bin:\$PATH
>
> EOF
[postgres@pg3 ~]$ source /home/postgres/.bash_profile
[postgres@pg3 ~]$ postgres --version
postgres (PostgreSQL) 14.9
[postgres@pg3 ~]$ repmgr --version
repmgr 5.3.3
[postgres@pg3 ~]$

配置ssh互信

ssh-keygen -t rsa
for i in pg{1..3}; do echo ">>> $i";ssh-copy-id $i;done

3台主机都要操作，输入密码postgres

[postgres@pg1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/postgres/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/postgres/.ssh/id_rsa.
Your public key has been saved in /home/postgres/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:eCoDf0PPOzm8VWaagAsxx3nfGUtSWmV7JBoJAg+PQ50 postgres@pg1
The key's randomart image is:
+---[RSA 2048]----+
|      +o....+o+ .|
|     o *E. +.+ + |
|    o * + o + . .|
|     + = . + + . |
|  . . + S . B    |
|   o o * . *     |
|    + =.o.+      |
|     + .=o       |
|        o+       |
+----[SHA256]-----+
[postgres@pg1 ~]$ for i in pg{1..3}; do echo ">>> $i";ssh-copy-id $i;done
>>> pg1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg1 (192.168.17.71)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg1'"
and check to make sure that only the key(s) you wanted were added.

>>> pg2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg2 (192.168.17.72)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg2'"
and check to make sure that only the key(s) you wanted were added.

>>> pg3
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg3 (192.168.17.73)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg3's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg3'"
and check to make sure that only the key(s) you wanted were added.

[postgres@pg1 ~]$

[postgres@pg2 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/postgres/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/postgres/.ssh/id_rsa.
Your public key has been saved in /home/postgres/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:eJZh5ISEk4NxN3YLKhoU9n9kDaTVz8JF+tS5lzFLIAE postgres@pg2
The key's randomart image is:
+---[RSA 2048]----+
| +oo.+B==Eo+..   |
|o o.=++Boo..o o  |
|. ...+ o*o+. o + |
| o .. oo =oo  o =|
|.    ...S .. . + |
|      .o      .  |
|                 |
|                 |
|                 |
+----[SHA256]-----+
[postgres@pg2 ~]$ for i in pg{1..3}; do echo ">>> $i";ssh-copy-id $i;done
>>> pg1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg1 (192.168.17.71)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg1'"
and check to make sure that only the key(s) you wanted were added.

>>> pg2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg2 (192.168.17.72)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg2'"
and check to make sure that only the key(s) you wanted were added.

>>> pg3
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg3 (192.168.17.73)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg3's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg3'"
and check to make sure that only the key(s) you wanted were added.

[postgres@pg2 ~]$

[postgres@pg3 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/postgres/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/postgres/.ssh/id_rsa.
Your public key has been saved in /home/postgres/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:PDQrCDLUUtVv04enNE5PhhYTlVJ3gFds42EQ6r0IeVk postgres@pg3
The key's randomart image is:
+---[RSA 2048]----+
| .o....    .+*==o|
|.. .   .   +o.o++|
|o o     + ..*Eo.o|
| o . . o *oO+= . |
|    . . So*+O.   |
|       . .oo...  |
|           . .   |
|                 |
|                 |
+----[SHA256]-----+
[postgres@pg3 ~]$ for i in pg{1..3}; do echo ">>> $i";ssh-copy-id $i;done
>>> pg1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg1 (192.168.17.71)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg1'"
and check to make sure that only the key(s) you wanted were added.

>>> pg2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg2 (192.168.17.72)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg2'"
and check to make sure that only the key(s) you wanted were added.

>>> pg3
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'pg3 (192.168.17.73)' can't be established.
ECDSA key fingerprint is SHA256:TU9s7eIMfhdN7CEZ8rcEHCCcQsNGJOHQVmwGzMrPBbE.
ECDSA key fingerprint is MD5:71:5d:3d:a6:d4:31:d1:e1:69:6b:2a:14:18:9f:02:be.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@pg3's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pg3'"
and check to make sure that only the key(s) you wanted were added.

[postgres@pg3 ~]$

初始化数据库(只需主库)

initdb -D/opt/pgdata -EUTF8 -Upostgres -W

只需主库操作

[postgres@pg1 ~]$ initdb -D/opt/pgdata -EUTF8 -Upostgres -W
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

Enter new superuser password:
Enter it again:

fixing permissions on existing directory /opt/pgdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Shanghai
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /opt/pgdata -l logfile start

[postgres@pg1 ~]$

主库修改参数文件

cat >> /opt/pgdata/postgresql.conf <<EOF

#add by custom
listen_addresses = '*'
port = 5432
max_connections = 2000
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /opt/pgarchive/%f && cp %p /opt/pgarchive/%f'
max_wal_senders = 10
wal_keep_size = 512
hot_standby = on
wal_log_hints = on
shared_preload_libraries = 'repmgr'

EOF

仅主库操作

[postgres@pg1 ~]$ cat >> /opt/pgdata/postgresql.conf <<EOF
>
> #add by custom
> listen_addresses = '*'
> port = 5432
> max_connections = 2000
> wal_level = replica
> archive_mode = on
> archive_command = 'test ! -f /opt/pgarchive/%f && cp %p /opt/pgarchive/%f'
> max_wal_senders = 10
> wal_keep_size = 512
> hot_standby = on
> wal_log_hints = on
> shared_preload_libraries = 'repmgr'
>
> EOF
[postgres@pg1 ~]$

主库修改认证文件

cat >> /opt/pgdata/pg_hba.conf <<EOF

#add by repmgr
local   all             all                                     trust
host    repmgr          repmgr          192.168.17.71/32        trust
host    repmgr          repmgr          192.168.17.72/32        trust
host    repmgr          repmgr          192.168.17.73/32        trust
host    all             all             0/0                     md5
host    all             all             ::/0                    md5

#forbid self-replication its own IP
local   replication     all                                     reject
host    replication     all             127.0.0.0/8             reject
host    replication     all             ::1/128                 reject

#allow any standby connection
host    replication     repuser         0.0.0.0/0               trust

EOF

仅在主库操作

[postgres@pg1 ~]$ cat >> /opt/pgdata/pg_hba.conf <<EOF
>
> #add by repmgr
> local   all             all                                     trust
> host    repmgr          repmgr          192.168.17.71/32        trust
> host    repmgr          repmgr          192.168.17.72/32        trust
> host    repmgr          repmgr          192.168.17.73/32        trust
> host    all             all             0/0                     md5
> host    all             all             ::/0                    md5
>
> #forbid self-replication its own IP
> local   replication     all                                     reject
> host    replication     all             127.0.0.0/8             reject
> host    replication     all             ::1/128                 reject
>
> #allow any standby connection
> host    replication     repuser         0.0.0.0/0               trust
>
> EOF
[postgres@pg1 ~]$

启动主库

pg_ctl -D /opt/pgdata -l logfile start

仅在主库操作

[postgres@pg1 ~]$ pg_ctl -D /opt/pgdata -l logfile start
waiting for server to start.... done
server started
[postgres@pg1 ~]$

配置repmgr集群环境

创建repmgr库和用户、创建replication用户

psql
create user repmgr superuser password 'repmgr';
alter user repmgr set search_path to repmgr, "$user", public;
create database repmgr owner repmgr;
create user repuser replication;
\l

仅在主库操作

[postgres@pg1 ~]$ psql
psql (14.9)
Type "help" for help.

postgres=# create user repmgr superuser password 'repmgr';
CREATE ROLE
postgres=# alter user repmgr set search_path to repmgr, "$user", public;
ALTER ROLE
postgres=# create database repmgr owner repmgr;
CREATE DATABASE
postgres=# create user repuser replication;
CREATE ROLE
postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 repmgr    | repmgr   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)

postgres=#

创建repmgr配置文件

cat >> /opt/pg14/repmgr.conf <<EOF

#add by repmgr
node_id = 1
node_name = 'pg1'
conninfo = 'host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3'
data_directory = '/opt/pgdata'
replication_user = 'repuser'
replication_type = 'physical'
repmgr_bindir = '/opt/pg14/bin'
pg_bindir = '/opt/pg14/bin'


EOF

3台主机都要操作，配置文件根据主机写不同的node_id和IP地址

[postgres@pg1 ~]$ cat >> /opt/pg14/repmgr.conf <<EOF
>
> #add by repmgr
> node_id = 1
> node_name = 'pg1'
> conninfo = 'host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3'
> data_directory = '/opt/pgdata'
> replication_user = 'repuser'
> replication_type = 'physical'
> repmgr_bindir = '/opt/pg14/bin'
> pg_bindir = '/opt/pg14/bin'
>
>
> EOF
[postgres@pg1 ~]$

[postgres@pg2 ~]$ cat >> /opt/pg14/repmgr.conf <<EOF
>
> #add by repmgr
> node_id = 2
> node_name = 'pg2'
> conninfo = 'host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3'
> data_directory = '/opt/pgdata'
> replication_user = 'repuser'
> replication_type = 'physical'
> repmgr_bindir = '/opt/pg14/bin'
> pg_bindir = '/opt/pg14/bin'
>
>
> EOF
[postgres@pg2 ~]$

[postgres@pg3 ~]$ cat >> /opt/pg14/repmgr.conf <<EOF
>
> #add by repmgr
> node_id = 3
> node_name = 'pg3'
> conninfo = 'host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3'
> data_directory = '/opt/pgdata'
> replication_user = 'repuser'
> replication_type = 'physical'
> repmgr_bindir = '/opt/pg14/bin'
> pg_bindir = '/opt/pg14/bin'
>
>
> EOF
[postgres@pg3 ~]$

注册主节点、查看元数据

repmgr -f /opt/pg14/repmgr.conf primary register
psql -d repmgr
SELECT * FROM repmgr.nodes;

仅在主库操作

[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[postgres@pg1 ~]$ psql -d repmgr
psql (14.9)
Type "help" for help.

repmgr=# SELECT * FROM repmgr.nodes;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                                         conninfo                                         | repluser | slot_name |      config_file
---------+------------------+--------+-----------+---------+----------+----------+------------------------------------------------------------------------------------------+----------+-----------+-----------------------
       1 |                  | t      | pg1       | primary | default  |      100 | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3 | repuser  |           | /opt/pg14/repmgr.conf
(1 row)

repmgr=#

克隆standby、启动数据库并注册

repmgr -h 192.168.17.71 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf standby clone
pg_ctl -D /opt/pgdata start
repmgr -f /opt/pg14/repmgr.conf standby register

在2台standby从库上分别操作

[postgres@pg2 ~]$ repmgr -h 192.168.17.71 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf standby clone
NOTICE: destination directory "/opt/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.17.71 user=repmgr port=5432 dbname=repmgr
DETAIL: current installation size is 33 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/opt/pgdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /opt/pg14/bin/pg_basebackup -l "repmgr base backup"  -D /opt/pgdata -h 192.168.17.71 -p 5432 -U repuser -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /opt/pgdata start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[postgres@pg2 ~]$ pg_ctl -D /opt/pgdata start
waiting for server to start....2023-08-22 22:06:06.735 CST [20664] LOG:  starting PostgreSQL 14.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2023-08-22 22:06:06.736 CST [20664] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-08-22 22:06:06.737 CST [20664] LOG:  listening on IPv6 address "::", port 5432
2023-08-22 22:06:06.737 CST [20664] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-08-22 22:06:06.739 CST [20665] LOG:  database system was interrupted; last known up at 2023-08-22 22:05:54 CST
2023-08-22 22:06:06.743 CST [20665] LOG:  entering standby mode
2023-08-22 22:06:06.744 CST [20665] LOG:  redo starts at 0/2000028
2023-08-22 22:06:06.744 CST [20665] LOG:  consistent recovery state reached at 0/2000100
2023-08-22 22:06:06.744 CST [20664] LOG:  database system is ready to accept read-only connections
2023-08-22 22:06:06.750 CST [20669] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
 done
server started
[postgres@pg2 ~]$ repmgr -f /opt/pg14/repmgr.conf standby register
INFO: connecting to local node "pg2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered
[postgres@pg2 ~]$

[postgres@pg3 ~]$ repmgr -h 192.168.17.71 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf standby clone
NOTICE: destination directory "/opt/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.17.71 user=repmgr port=5432 dbname=repmgr
DETAIL: current installation size is 33 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/opt/pgdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /opt/pg14/bin/pg_basebackup -l "repmgr base backup"  -D /opt/pgdata -h 192.168.17.71 -p 5432 -U repuser -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /opt/pgdata start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[postgres@pg3 ~]$ pg_ctl -D /opt/pgdata start
waiting for server to start....2023-08-22 22:08:09.997 CST [20715] LOG:  starting PostgreSQL 14.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2023-08-22 22:08:09.998 CST [20715] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-08-22 22:08:09.998 CST [20715] LOG:  listening on IPv6 address "::", port 5432
2023-08-22 22:08:09.999 CST [20715] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-08-22 22:08:10.001 CST [20716] LOG:  database system was interrupted; last known up at 2023-08-22 22:08:02 CST
2023-08-22 22:08:10.004 CST [20716] LOG:  entering standby mode
2023-08-22 22:08:10.005 CST [20716] LOG:  redo starts at 0/4000028
2023-08-22 22:08:10.006 CST [20716] LOG:  consistent recovery state reached at 0/4000100
2023-08-22 22:08:10.007 CST [20715] LOG:  database system is ready to accept read-only connections
2023-08-22 22:08:10.013 CST [20720] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
 done
server started
[postgres@pg3 ~]$ repmgr -f /opt/pg14/repmgr.conf standby register
INFO: connecting to local node "pg3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "pg3" (ID: 3) successfully registered
[postgres@pg3 ~]$

查看集群状态及数据库元数据

repmgr -f /opt/pg14/repmgr.conf cluster show
psql -d repmgr
SELECT * FROM repmgr.nodes;

在任意节点查询都可以，至此集群已经搭建完毕

[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
[postgres@pg1 ~]$ psql -d repmgr
psql (14.9)
Type "help" for help.

repmgr=# SELECT * FROM repmgr.nodes;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                                         conninfo                                         | repluser | slot_name |      config_file
---------+------------------+--------+-----------+---------+----------+----------+------------------------------------------------------------------------------------------+----------+-----------+-----------------------
       1 |                  | t      | pg1       | primary | default  |      100 | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3 | repuser  |           | /opt/pg14/repmgr.conf
       2 |                1 | t      | pg2       | standby | default  |      100 | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3 | repuser  |           | /opt/pg14/repmgr.conf
       3 |                1 | t      | pg3       | standby | default  |      100 | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3 | repuser  |           | /opt/pg14/repmgr.conf
(3 rows)

repmgr=#

切换测试

测试手动提升standby为primary

关闭主库模拟宕机，手动提升 standby(pg2) 节点为 primary 节点，集群其它standby重新指向新primary(pg2)，将宕机原primary(pg1)降级为standby节点。

#主库操作pg1停库模拟宕机
su - postgres
repmgr -f /opt/pg14/repmgr.conf cluster show
psql
create table t (id int);
insert into t values(1);
select * from t;
\q
pg_ctl stop
#从库操作pg2手动提升为 primary节点
repmgr -f /opt/pg14/repmgr.conf cluster show
repmgr -f /opt/pg14/repmgr.conf standby promote
repmgr -f /opt/pg14/repmgr.conf cluster show
psql
insert into t values(2);
\q
#其他从库操作pg3重新指向新primary(pg2)
repmgr -f /opt/pg14/repmgr.conf standby follow
repmgr -f /opt/pg14/repmgr.conf cluster show
#原主库操作pg1降级为standby节点
repmgr -h 192.168.17.72 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf node rejoin --force-rewind
repmgr -f /opt/pg14/repmgr.conf cluster show

具体切换操作如下

[root@pg1 ~]# su - postgres
Last login: Wed Aug 23 09:29:11 CST 2023 on pts/0
[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
[postgres@pg1 ~]$ psql
psql (14.9)
Type "help" for help.

postgres=# create table t (id int);
CREATE TABLE
postgres=# insert into t values(1);
INSERT 0 1
postgres=# select * from t;
 id
----
  1
(1 row)

postgres=# \q
[postgres@pg1 ~]$ pg_ctl stop
waiting for server to shut down....2023-08-23 09:55:22.466 CST [1257] LOG:  received fast shutdown request
2023-08-23 09:55:22.466 CST [1257] LOG:  aborting any active transactions
2023-08-23 09:55:22.466 CST [1257] LOG:  background worker "logical replication launcher" (PID 1265) exited with exit code 1
2023-08-23 09:55:22.466 CST [1259] LOG:  shutting down
2023-08-23 09:55:23.292 CST [1355] FATAL:  the database system is shutting down
2023-08-23 09:55:23.452 CST [1257] LOG:  database system is shut down
 done
server stopped
[postgres@pg1 ~]$ 


[root@pg2 ~]# su - postgres
Last login: Wed Aug 23 09:29:55 CST 2023 on pts/0
[postgres@pg2 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+---------------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | ? unreachable | ?        | default  | 100      |          | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | standby |   running     | ? pg1    | default  | 100      | 1        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running     | ? pg1    | default  | 100      | 1        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)
  - node "pg1" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "pg2" (ID: 2)'s upstream node "pg1" (ID: 1)
  - unable to determine if node "pg2" (ID: 2) is attached to its upstream node "pg1" (ID: 1)
  - unable to connect to node "pg3" (ID: 3)'s upstream node "pg1" (ID: 1)
  - unable to determine if node "pg3" (ID: 3) is attached to its upstream node "pg1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@pg2 ~]$ repmgr -f /opt/pg14/repmgr.conf standby promote
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  pg3 (node ID: 3)
NOTICE: promoting standby to primary
DETAIL: promoting server "pg2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg2" (ID: 2) was successfully promoted to primary
[postgres@pg2 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | - failed  | ?        | default  | 100      |          | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | primary | * running |          | default  | 100      | 2        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | ? pg1    | default  | 100      | 1        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)
  - unable to connect to node "pg3" (ID: 3)'s upstream node "pg1" (ID: 1)
  - unable to determine if node "pg3" (ID: 3) is attached to its upstream node "pg1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@pg2 ~]$ psql
psql (14.9)
Type "help" for help.

postgres=# insert into t values(2);
INSERT 0 1
postgres=# \q
[postgres@pg2 ~]$ 

[root@pg3 ~]# su - postgres
Last login: Wed Aug 23 09:30:04 CST 2023 on pts/0
[postgres@pg3 ~]$ repmgr -f /opt/pg14/repmgr.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: local node 3 can attach to follow target node 2
DETAIL: local node's recovery point: 0/70000A0; follow target node's fork point: 0/70000A0
NOTICE: setting node 3's upstream to node 2
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "pg2" (ID: 2)
[postgres@pg3 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | - failed  | ?        | default  | 100      |          | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | primary | * running |          | default  | 100      | 2        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@pg3 ~]$


[postgres@pg1 ~]$ repmgr -h 192.168.17.72 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf node rejoin --force-rewind
NOTICE: rejoin target is node "pg2" (ID: 2)
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/pg14/bin/pg_rewind -D '/opt/pgdata' --source-server='host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3'"
NOTICE: 0 files copied to /opt/pgdata
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/opt/pg14/bin/pg_ctl  -w -D '/opt/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[postgres@pg1 ~]$ psql
psql (14.9)
Type "help" for help.

postgres=# select * from t;
 id
----
  1
  2
(2 rows)

postgres=# \q
[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | primary | * running |          | default  | 100      | 2        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
[postgres@pg1 ~]$

测试手动将备用服务器提升为主服务器

在standby从库(pg1)上执行switchover命令提升为主库，切换后手动将其他 standby 节点(pg2,pg3)指向新的 primary 节点(pg1)

#pg1上操作
repmgr -f /opt/pg14/repmgr.conf cluster show
repmgr -f /opt/pg14/repmgr.conf standby switchover
#pg2上操作
repmgr -f /opt/pg14/repmgr.conf standby follow
#pg3上操作
repmgr -f /opt/pg14/repmgr.conf standby follow
repmgr -f /opt/pg14/repmgr.conf cluster show

具体操作如下

[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | primary | * running |          | default  | 100      | 2        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf standby switchover
NOTICE: executing switchover on node "pg1" (ID: 1)
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  pg3 (node ID: 3)
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "pg1" (ID: 1) will be promoted to primary; current primary "pg2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "pg2" (ID: 2)
NOTICE: issuing CHECKPOINT on node "pg2" (ID: 2)
DETAIL: executing server command "/opt/pg14/bin/pg_ctl  -D '/opt/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/8000028
NOTICE: promoting standby to primary
DETAIL: promoting server "pg1" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg1" (ID: 1) was successfully promoted to primary
NOTICE: node "pg1" (ID: 1) promoted to primary, node "pg2" (ID: 2) demoted to standby
NOTICE: switchover was successful
DETAIL: node "pg1" is now primary and node "pg2" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
[postgres@pg1 ~]$ 

[postgres@pg2 ~]$ repmgr -f /opt/pg14/repmgr.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/8000EB8, follow target lsn is 0/8000EB8
NOTICE: setting node 2's upstream to node 1
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "pg1" (ID: 1)
[postgres@pg2 ~]$

[postgres@pg3 ~]$ repmgr -f /opt/pg14/repmgr.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/8001018, follow target lsn is 0/8001018
NOTICE: setting node 3's upstream to node 1
WARNING: node "pg3" attached in state "startup"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "pg1" (ID: 1)
[postgres@pg3 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 3        | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | standby |   running | pg1      | default  | 100      | 3        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg1      | default  | 100      | 3        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
[postgres@pg3 ~]$

自动切换设置

修改3台repmgr.conf，启动后台守护进程repmgrd，查看repmgr和repmgrd进程

cat >> /opt/pg14/repmgr.conf <<EOF


#add by autofailover

failover='automatic'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10 
promote_command='/opt/pg14/bin/repmgr standby promote -f /opt/pg14/repmgr.conf --log-to-file'
follow_command='/opt/pg14/bin/repmgr standby follow -f /opt/pg14/repmgr.conf --log-to-file --upstream-node-id=%n'


EOF
repmgrd -f /opt/pg14/repmgr.conf
ps f -u postgres

repmgr -f /opt/pg14/repmgr.conf cluster event --event=repmgrd_start

具体操作如下

[postgres@pg1 ~]$ cat >> /opt/pg14/repmgr.conf <<EOF
>
>
> #add by autofailover
>
> failover='automatic'
> connection_check_type=ping
> reconnect_attempts=6
> reconnect_interval=10
> promote_command='/opt/pg14/bin/repmgr standby promote -f /opt/pg14/repmgr.conf --log-to-file'
> follow_command='/opt/pg14/bin/repmgr standby follow -f /opt/pg14/repmgr.conf --log-to-file --upstream-node-id=%n'
>
>
> EOF
[postgres@pg1 ~]$ repmgrd -f /opt/pg14/repmgr.conf
[2023-08-23 11:18:31] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-08-23 11:18:31] [INFO] connecting to database "host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3"
[postgres@pg1 ~]$ INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-08-23 11:18:31] [NOTICE] starting monitoring of node "pg1" (ID: 1)
[2023-08-23 11:18:31] [INFO] "connection_check_type" set to "ping"
[2023-08-23 11:18:31] [NOTICE] monitoring cluster primary "pg1" (ID: 1)
[2023-08-23 11:18:31] [INFO] child node "pg2" (ID: 2) is attached
[2023-08-23 11:18:31] [INFO] child node "pg3" (ID: 3) is attached
[postgres@pg1 ~]$ ps f -u postgres
   PID TTY      STAT   TIME COMMAND
  9108 pts/1    S      0:00 -bash
  9152 pts/1    R+     0:00  \_ ps f -u postgres
  1278 pts/0    S+     0:00 -bash
  9130 ?        S      0:00 repmgrd -f /opt/pg14/repmgr.conf
  1378 ?        Ss     0:00 /opt/pg14/bin/postgres -D /opt/pgdata
  1380 ?        Ss     0:00  \_ postgres: checkpointer
  1381 ?        Ss     0:00  \_ postgres: background writer
  1382 ?        Ss     0:00  \_ postgres: stats collector
  1446 ?        Ss     0:00  \_ postgres: walwriter
  1447 ?        Ss     0:00  \_ postgres: autovacuum launcher
  1448 ?        Ss     0:00  \_ postgres: archiver last was 000000020000000000000008.partial
  1449 ?        Ss     0:00  \_ postgres: logical replication launcher
  1459 ?        Ss     0:00  \_ postgres: walsender repuser 192.168.17.72(38920) streaming 0/800B8D8
  1472 ?        Ss     0:00  \_ postgres: walsender repuser 192.168.17.73(34826) streaming 0/800B8D8
  9128 ?        Ss     0:00  \_ postgres: repmgr repmgr 192.168.17.71(49700) idle
[postgres@pg1 ~]$ 

[postgres@pg2 ~]$ cat >> /opt/pg14/repmgr.conf <<EOF
>
>
> #add by autofailover
>
> failover='automatic'
> connection_check_type=ping
> reconnect_attempts=6
> reconnect_interval=10
> promote_command='/opt/pg14/bin/repmgr standby promote -f /opt/pg14/repmgr.conf --log-to-file'
> follow_command='/opt/pg14/bin/repmgr standby follow -f /opt/pg14/repmgr.conf --log-to-file --upstream-node-id=%n'
>
>
> EOF
[postgres@pg2 ~]$ repmgrd -f /opt/pg14/repmgr.conf
[2023-08-23 11:21:00] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-08-23 11:21:00] [INFO] connecting to database "host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3"
[postgres@pg2 ~]$ INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-08-23 11:21:00] [NOTICE] starting monitoring of node "pg2" (ID: 2)
[2023-08-23 11:21:00] [INFO] "connection_check_type" set to "ping"
[2023-08-23 11:21:00] [INFO] monitoring connection to upstream node "pg1" (ID: 1)

[postgres@pg2 ~]$ ps f -u postgres
   PID TTY      STAT   TIME COMMAND
  1303 pts/1    S      0:00 -bash
  9135 pts/1    R+     0:00  \_ ps f -u postgres
  1233 pts/0    S+     0:00 -bash
  9121 ?        S      0:00 repmgrd -f /opt/pg14/repmgr.conf
  1588 ?        Ss     0:00 /opt/pg14/bin/postgres -D /opt/pgdata
  1589 ?        Ss     0:00  \_ postgres: startup recovering 000000030000000000000008
  1590 ?        Ss     0:00  \_ postgres: checkpointer
  1591 ?        Ss     0:00  \_ postgres: background writer
  1592 ?        Ss     0:00  \_ postgres: stats collector
  1593 ?        Ss     0:01  \_ postgres: walreceiver streaming 0/800BAB0
  9119 ?        Ss     0:00  \_ postgres: repmgr repmgr 192.168.17.72(44720) idle
[postgres@pg2 ~]$ 

[postgres@pg3 ~]$ cat >> /opt/pg14/repmgr.conf <<EOF
>
>
> #add by autofailover
>
> failover='automatic'
> connection_check_type=ping
> reconnect_attempts=6
> reconnect_interval=10
> promote_command='/opt/pg14/bin/repmgr standby promote -f /opt/pg14/repmgr.conf --log-to-file'
> follow_command='/opt/pg14/bin/repmgr standby follow -f /opt/pg14/repmgr.conf --log-to-file --upstream-node-id=%n'
>
>
> EOF
[postgres@pg3 ~]$ repmgrd -f /opt/pg14/repmgr.conf
[2023-08-23 11:21:37] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-08-23 11:21:37] [INFO] connecting to database "host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3"
[postgres@pg3 ~]$ INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-08-23 11:21:37] [NOTICE] starting monitoring of node "pg3" (ID: 3)
[2023-08-23 11:21:37] [INFO] "connection_check_type" set to "ping"
[2023-08-23 11:21:37] [INFO] monitoring connection to upstream node "pg1" (ID: 1)
[postgres@pg3 ~]$ ps f -u postgres
   PID TTY      STAT   TIME COMMAND
  1348 pts/1    S      0:00 -bash
 12456 pts/1    R+     0:00  \_ ps f -u postgres
  1233 pts/0    S+     0:00 -bash
  8951 ?        S      0:01 repmgrd -f /opt/pg14/repmgr.conf
  1254 ?        Ss     0:01 /opt/pg14/bin/postgres
  1255 ?        Ss     0:00  \_ postgres: startup recovering 000000030000000000000008
  1256 ?        Ss     0:00  \_ postgres: checkpointer
  1257 ?        Ss     0:00  \_ postgres: background writer
  1258 ?        Ss     0:00  \_ postgres: stats collector
  8911 ?        Ss     0:03  \_ postgres: walreceiver streaming 0/800C370
  8949 ?        Ss     0:01  \_ postgres: repmgr repmgr 192.168.17.73(39552) idle
[postgres@pg3 ~]$




[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster event --event=repmgrd_start
 Node ID | Name | Event         | OK | Timestamp           | Details
---------+------+---------------+----+---------------------+------------------------------------------------------
 3       | pg3  | repmgrd_start | t  | 2023-08-23 11:21:37 | monitoring connection to upstream node "pg1" (ID: 1)
 2       | pg2  | repmgrd_start | t  | 2023-08-23 11:21:00 | monitoring connection to upstream node "pg1" (ID: 1)
 1       | pg1  | repmgrd_start | t  | 2023-08-23 11:18:31 | monitoring cluster primary "pg1" (ID: 1)

[postgres@pg1 ~]$

自动failover

停止pg1数据库，过1分钟后主库自动切换到pg2

#pg1操作停止数据库
pg_ctl stop
#pg2上观察日志
#pg3查看集群状态
repmgr -f /opt/pg14/repmgr.conf cluster show
#pg1从新加入集群变成standby
repmgr -h 192.168.17.72 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf node rejoin --force-rewind

具体操作如下

[postgres@pg1 ~]$ pg_ctl stop
waiting for server to shut down....2023-08-23 13:49:06.894 CST [1239] LOG:  received fast shutdown request
2023-08-23 13:49:06.895 CST [1239] LOG:  aborting any active transactions
2023-08-23 13:49:06.895 CST [1264] FATAL:  terminating connection due to administrator command
2023-08-23 13:49:06.895 CST [1285] FATAL:  terminating connection due to administrator command
2023-08-23 13:49:06.895 CST [1278] FATAL:  terminating connection due to administrator command
2023-08-23 13:49:06.895 CST [1239] LOG:  background worker "logical replication launcher" (PID 1247) exited with exit code 1
2023-08-23 13:49:06.896 CST [1241] LOG:  shutting down
2023-08-23 13:49:07.021 CST [1354] FATAL:  the database system is shutting down
2023-08-23 13:49:07.027 CST [1239] LOG:  database system is shut down
 done
server stopped
[postgres@pg1 ~]$ 

[postgres@pg2 ~]$ 2023-08-23 13:49:06.041 CST [1219] LOG:  replication terminated by primary server
2023-08-23 13:49:06.041 CST [1219] DETAIL:  End of WAL reached on timeline 3 at 0/D0000A0.
2023-08-23 13:49:06.041 CST [1219] FATAL:  could not send end-of-streaming message to primary: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
        no COPY in progress
2023-08-23 13:49:06.041 CST [1215] LOG:  invalid record length at 0/D0000A0: wanted 24, got 0
2023-08-23 13:49:06.044 CST [1252] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
[2023-08-23 13:49:07] [WARNING] unable to ping "host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3"
[2023-08-23 13:49:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:07] [WARNING] unable to connect to upstream node "pg1" (ID: 1)
[2023-08-23 13:49:07] [INFO] checking state of node "pg1" (ID: 1), 1 of 6 attempts
[2023-08-23 13:49:07] [WARNING] unable to ping "user=repmgr password=repmgr connect_timeout=3 dbname=repmgr host=192.168.17.71 port=5432 fallback_application_name=repmgr"
[2023-08-23 13:49:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:07] [INFO] sleeping up to 10 seconds until next reconnection attempt
2023-08-23 13:49:11.047 CST [1253] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-08-23 13:49:16.051 CST [1254] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
[2023-08-23 13:49:17] [INFO] checking state of node "pg1" (ID: 1), 2 of 6 attempts
[2023-08-23 13:49:17] [WARNING] unable to ping "user=repmgr password=repmgr connect_timeout=3 dbname=repmgr host=192.168.17.71 port=5432 fallback_application_name=repmgr"
[2023-08-23 13:49:17] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:17] [INFO] sleeping up to 10 seconds until next reconnection attempt
2023-08-23 13:49:21.054 CST [1255] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-08-23 13:49:26.058 CST [1256] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
[2023-08-23 13:49:27] [INFO] checking state of node "pg1" (ID: 1), 3 of 6 attempts
[2023-08-23 13:49:27] [WARNING] unable to ping "user=repmgr password=repmgr connect_timeout=3 dbname=repmgr host=192.168.17.71 port=5432 fallback_application_name=repmgr"
[2023-08-23 13:49:27] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:27] [INFO] sleeping up to 10 seconds until next reconnection attempt
2023-08-23 13:49:31.062 CST [1257] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-08-23 13:49:36.067 CST [1258] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
[2023-08-23 13:49:37] [INFO] checking state of node "pg1" (ID: 1), 4 of 6 attempts
[2023-08-23 13:49:37] [WARNING] unable to ping "user=repmgr password=repmgr connect_timeout=3 dbname=repmgr host=192.168.17.71 port=5432 fallback_application_name=repmgr"
[2023-08-23 13:49:37] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:37] [INFO] sleeping up to 10 seconds until next reconnection attempt
2023-08-23 13:49:41.071 CST [1259] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-08-23 13:49:46.076 CST [1260] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
[2023-08-23 13:49:47] [INFO] checking state of node "pg1" (ID: 1), 5 of 6 attempts
[2023-08-23 13:49:47] [WARNING] unable to ping "user=repmgr password=repmgr connect_timeout=3 dbname=repmgr host=192.168.17.71 port=5432 fallback_application_name=repmgr"
[2023-08-23 13:49:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:47] [INFO] sleeping up to 10 seconds until next reconnection attempt
2023-08-23 13:49:51.080 CST [1261] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-08-23 13:49:56.084 CST [1262] FATAL:  could not connect to the primary server: connection to server at "192.168.17.71", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
[2023-08-23 13:49:57] [INFO] checking state of node "pg1" (ID: 1), 6 of 6 attempts
[2023-08-23 13:49:57] [WARNING] unable to ping "user=repmgr password=repmgr connect_timeout=3 dbname=repmgr host=192.168.17.71 port=5432 fallback_application_name=repmgr"
[2023-08-23 13:49:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-23 13:49:57] [WARNING] unable to reconnect to node "pg1" (ID: 1) after 6 attempts
[2023-08-23 13:49:57] [INFO] 1 active sibling nodes registered
[2023-08-23 13:49:57] [INFO] 3 total nodes registered
[2023-08-23 13:49:57] [INFO] primary node  "pg1" (ID: 1) and this node have the same location ("default")
[2023-08-23 13:49:57] [INFO] local node's last receive lsn: 0/D0000A0
[2023-08-23 13:49:57] [INFO] checking state of sibling node "pg3" (ID: 3)
[2023-08-23 13:49:57] [INFO] node "pg3" (ID: 3) reports its upstream is node 1, last seen 52 second(s) ago
[2023-08-23 13:49:57] [INFO] standby node "pg3" (ID: 3) last saw primary node 52 second(s) ago
[2023-08-23 13:49:57] [INFO] last receive LSN for sibling node "pg3" (ID: 3) is: 0/D0000A0
[2023-08-23 13:49:57] [INFO] node "pg3" (ID: 3) has same LSN as current candidate "pg2" (ID: 2)
[2023-08-23 13:49:57] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2023-08-23 13:49:57] [NOTICE] promotion candidate is "pg2" (ID: 2)
[2023-08-23 13:49:57] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2023-08-23 13:49:57] [INFO] promote_command is:
  "/opt/pg14/bin/repmgr standby promote -f /opt/pg14/repmgr.conf --log-to-file"
[2023-08-23 13:49:57] [WARNING] 1 sibling nodes found, but option "--siblings-follow" not specified
[2023-08-23 13:49:57] [DETAIL] these nodes will remain attached to the current primary:
  pg3 (node ID: 3)
[2023-08-23 13:49:57] [NOTICE] promoting standby to primary
[2023-08-23 13:49:57] [DETAIL] promoting server "pg2" (ID: 2) using pg_promote()
2023-08-23 13:49:57.499 CST [1215] LOG:  received promote request
2023-08-23 13:49:57.499 CST [1215] LOG:  redo done at 0/D000028 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 315.46 s
2023-08-23 13:49:57.499 CST [1215] LOG:  last completed transaction was at log time 2023-08-23 13:48:25.608586+08
[2023-08-23 13:49:57] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
2023-08-23 13:49:57.500 CST [1215] LOG:  selected new timeline ID: 4
2023-08-23 13:49:57.517 CST [1215] LOG:  archive recovery complete
2023-08-23 13:49:57.518 CST [1214] LOG:  database system is ready to accept connections
[2023-08-23 13:49:58] [NOTICE] STANDBY PROMOTE successful
[2023-08-23 13:49:58] [DETAIL] server "pg2" (ID: 2) was successfully promoted to primary
[2023-08-23 13:49:58] [INFO] checking state of node 2, 1 of 6 attempts
[2023-08-23 13:49:58] [NOTICE] node 2 has recovered, reconnecting
[2023-08-23 13:49:58] [INFO] connection to node 2 succeeded
[2023-08-23 13:49:58] [INFO] original connection is still available
[2023-08-23 13:49:58] [INFO] 1 followers to notify
[2023-08-23 13:49:58] [NOTICE] notifying node "pg3" (ID: 3) to follow node 2
INFO:  node 3 received notification to follow node 2
[2023-08-23 13:49:58] [INFO] switching to primary monitoring mode
[2023-08-23 13:49:58] [NOTICE] monitoring cluster primary "pg2" (ID: 2)
[2023-08-23 13:50:04] [NOTICE] new standby "pg3" (ID: 3) has connected
[2023-08-23 13:52:22] [NOTICE] new standby "pg1" (ID: 1) has connected
[2023-08-23 13:54:58] [INFO] monitoring primary node "pg2" (ID: 2) in normal state
[2023-08-23 13:59:59] [INFO] monitoring primary node "pg2" (ID: 2) in normal state

[postgres@pg2 ~]$


[postgres@pg3 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | primary | - failed  | ?        | default  | 100      |          | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | primary | * running |          | default  | 100      | 4        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg2      | default  | 100      | 3        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@pg3 ~]$


[postgres@pg1 ~]$ repmgr -h 192.168.17.72 -U repmgr -p 5432 -d repmgr -f /opt/pg14/repmgr.conf node rejoin --force-rewind
NOTICE: rejoin target is node "pg2" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 4 forked off current database system timeline 3 before current recovery point 0/E000028
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/pg14/bin/pg_rewind -D '/opt/pgdata' --source-server='host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3'"
NOTICE: 0 files copied to /opt/pgdata
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/opt/pg14/bin/pg_ctl  -w -D '/opt/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[postgres@pg1 ~]$
[postgres@pg1 ~]$ repmgr -f /opt/pg14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 4        | host=192.168.17.71 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 2  | pg2  | primary | * running |          | default  | 100      | 4        | host=192.168.17.72 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
 3  | pg3  | standby |   running | pg2      | default  | 100      | 4        | host=192.168.17.73 port=5432 user=repmgr password=repmgr dbname=repmgr connect_timeout=3
[postgres@pg1 ~]$

赞(0)
踩(0)

免责声明：

1、本站资源由自动抓取工具收集整理于网络。

2、本站不承担由于内容的合法性及真实性所引起的一切争议和法律责任。

3、电子书、小说等仅供网友预览使用，书籍版权归作者或出版社所有。

4、如作者、出版社认为资源涉及侵权，请联系本站，本站将在收到通知书后尽快删除您认为侵权的作品。

5、如果您喜欢本资源，请您支持作者，购买正版内容。

6、资源失效，请下方留言，欢迎分享资源链接

文章评论

共0条评论