PostgreSQL 12如何配置流复制主备环境以及执行主备切换注意事项
Contents
一 不再需要配置recovery.conf文件
从PostgreSQL 12开始,在执行通过流复制来配置主备数据库的时候,不再需要配置额外配置recovery.conf文件了。取而代之的是在备库环境的$PGDATA路径下配置一个standby.signal文件,注意该文件是1个普通的文本文件,内容为空。理解起来就是,该文件是一个标识文件。如果备库通过执行pg_ctl promote提升为主库的话,那么该文件将自动消失。
二 备库执行基础备份时新的命令行选项-R
PostgreSQL 9.4.10版本:
[postgres@ppasdev ~]$ pg_basebackup -V
pg_basebackup (PostgreSQL) 9.4.10
[postgres@ppasdev ~]$ pg_basebackup --help
pg_basebackup takes a base backup of a running PostgreSQL server.
Usage:
pg_basebackup [OPTION]...
Options controlling the output:
-D, --pgdata=DIRECTORY receive base backup into directory
-F, --format=p|t output format (plain (default), tar)
-r, --max-rate=RATE maximum transfer rate to transfer data directory
(in kB/s, or use suffix "k" or "M")
-R, --write-recovery-conf
write recovery.conf after backup
-T, --tablespace-mapping=OLDDIR=NEWDIR
relocate tablespace in OLDDIR to NEWDIR
-x, --xlog include required WAL files in backup (fetch mode)
-X, --xlog-method=fetch|stream
include required WAL files with specified method
PostgreSQL 12.8版本:
[postgres@ecs-db2 ~]$ pg_basebackup -V
pg_basebackup (PostgreSQL) 12.8
[postgres@ecs-db2 ~]$ pg_basebackup --help
pg_basebackup takes a base backup of a running PostgreSQL server.
Usage:
pg_basebackup [OPTION]...
Options controlling the output:
-D, --pgdata=DIRECTORY receive base backup into directory
-F, --format=p|t output format (plain (default), tar)
-r, --max-rate=RATE maximum transfer rate to transfer data directory
(in kB/s, or use suffix "k" or "M")
-R, --write-recovery-conf
write configuration for replication
-T, --tablespace-mapping=OLDDIR=NEWDIR
relocate tablespace in OLDDIR to NEWDIR
--waldir=WALDIR location for the write-ahead log directory
-X, --wal-method=none|fetch|stream
include required WAL files with specified method
-z, --gzip compress tar output
-Z, --compress=0-9 compress tar output with given compression level
从对比中,可以看到在9.4.10版本中,-R选项用于创建recovery.conf文件,而在12.8版本中,-R则是用于创建用于replication的配置文件,其实就是生成$PGDATA/standby.signal文件。
三 如何生成standby.signal文件
两种方式,一种是在备库执行基础备份的时候,加上-R选项用于自动创建$PGDATA/standby.signal文件:
[postgres@ecs-db2 pg12.8]$ pwd
/postgres/pg12.8
[postgres@ecs-db2 pg12.8]$ ll
total 20
drwxrwxr-x 2 postgres postgres 4096 Oct 19 15:01 bin
drwxrwxr-x 6 postgres postgres 4096 Oct 19 15:01 include
drwxrwxr-x 4 postgres postgres 4096 Oct 19 15:01 lib
drwxrwxr-x 8 postgres postgres 4096 Oct 19 15:01 share
[postgres@ecs-db2 pg12.8]$ pg_basebackup -h 172.22.3.4 -p 5432 -U replica --password -X stream -Fp --progress -D $PGDATA -R
Password:
24597/24597 kB (100%), 1/1 tablespace
[postgres@ecs-db2 pg12.8]$ ll data
total 120
-rw------- 1 postgres postgres 224 Oct 20 15:11 backup_label
drwx------ 5 postgres postgres 4096 Oct 20 15:11 base
drwx------ 2 postgres postgres 4096 Oct 20 15:11 global
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_commit_ts
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_dynshmem
-rw------- 1 postgres postgres 4867 Oct 20 15:11 pg_hba.conf
-rw------- 1 postgres postgres 1636 Oct 20 15:11 pg_ident.conf
drwx------ 4 postgres postgres 4096 Oct 20 15:11 pg_logical
drwx------ 4 postgres postgres 4096 Oct 20 15:11 pg_multixact
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_notify
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_replslot
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_serial
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_snapshots
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_stat
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_subtrans
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_tblspc
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_twophase
-rw------- 1 postgres postgres 3 Oct 20 15:11 PG_VERSION
drwx------ 3 postgres postgres 4096 Oct 20 15:11 pg_wal
drwx------ 2 postgres postgres 4096 Oct 20 15:11 pg_xact
-rw------- 1 postgres postgres 337 Oct 20 15:11 postgresql.auto.conf
-rw------- 1 postgres postgres 26718 Oct 20 15:11 postgresql.conf
-rw------- 1 postgres postgres 0 Oct 20 15:11 standby.signal
[postgres@ecs-db2 pg12.8]$
方式2:如果在备库上执行pg_basebackup对主库进行备份的时候,没有使用-R选项的话,我们可以在备库的$PGDATA路径下,touch standby.signal就好了。
记住:该文件只是一个标识文件,它的存在就是告诉数据库,当我们执行pg_ctl start启动的时候,当前库的角色是standby,不是primary角色。
四执行stream主备配置流程
主体思路跟PostgreSQL 11及以前版本的配置流程大同小异,甚至是更简单一些了。
1 主库创建流复制的用户
postgres=# CREATE ROLE replica login replication encrypted password 'replica';
CREATE ROLE
postgres=#
2 主库修改pg_hba.conf文件,允许备库IP通过复制用户访问数据库
[postgres@ecs-db1 data]$ tail pg_hba.conf
host all all 127.0.0.1/32 trust
host all all 0.0.0.0/0 md5
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
host replication replica 172.22.3.5/32 md5
[postgres@ecs-db1 data]$
最后一行,添加了replica用户可以从备库IP 172.22.3.5访问主库。
3 备库上执行对于主库的基础备份
[postgres@ecs-db2 pg12.8]$ pg_basebackup -h 172.22.3.4 -p 5432 -U replica --password -X stream -Fp --progress -D $PGDATA -R
Password:
24598/24598 kB (100%), 1/1 tablespace
[postgres@ecs-db2 pg12.8]$
注意,备份选项上带有-R选项。
4 备库就可以执行pg_ctl start启动了
这时,就可以看到备库服务器上自动生成了standby.signal文件。同时,也看到在$PGDATA路径下,数据库自动帮我们配置了关于流复制的主库的信息:
[postgres@ecs-db2 pg12.8]$ cat data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replica password=replica host=172.22.3.4 port=5432 sslmode=disable sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
[postgres@ecs-db2 pg12.8]$
当然了,如果我们没有使用-R来备份主库的话。我们完全可以在备库上手工创建standby.signal文件,然后手工编辑postgresql.auto.conf,并在其内容中配置主库的信息。
5 备库数据库进程信息
[postgres@ecs-db2 ~]$ ps -ef|grep postgres
postgres 8911 1 0 Oct20 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 8912 8911 0 Oct20 ? 00:00:00 postgres: startup recovering 000000010000000000000003
postgres 8913 8911 0 Oct20 ? 00:00:00 postgres: checkpointer
postgres 8914 8911 0 Oct20 ? 00:00:00 postgres: background writer
postgres 8915 8911 0 Oct20 ? 00:00:00 postgres: stats collector
postgres 8916 8911 0 Oct20 ? 00:00:32 postgres: walreceiver streaming 0/3000148
root 18015 18014 0 16:13 pts/0 00:00:00 su - postgres
postgres 18016 18015 0 16:13 pts/0 00:00:00 -bash
postgres 18055 18016 0 16:13 pts/0 00:00:00 ps -ef
postgres 18056 18016 0 16:13 pts/0 00:00:00 grep --color=auto postgres
[postgres@ecs-db2 ~]$
备库上,可以看到walreceiver进程,正在读取日志streaming 0/3000148,执行恢复recovering 000000010000000000000003。
6 主库数据库进程信息
[postgres@ecs-db1 ~]$ ps -ef|grep postgres
postgres 3708 1 0 Oct20 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 3710 3708 0 Oct20 ? 00:00:00 postgres: checkpointer
postgres 3711 3708 0 Oct20 ? 00:00:00 postgres: background writer
postgres 3712 3708 0 Oct20 ? 00:00:00 postgres: walwriter
postgres 3713 3708 0 Oct20 ? 00:00:00 postgres: autovacuum launcher
postgres 3714 3708 0 Oct20 ? 00:00:00 postgres: stats collector
postgres 3715 3708 0 Oct20 ? 00:00:00 postgres: logical replication launcher
postgres 4174 3708 0 Oct20 ? 00:00:00 postgres: walsender replica 172.22.3.5(57926) streaming 0/3000148
root 30410 30409 0 16:11 pts/1 00:00:00 su - postgres
postgres 30411 30410 0 16:11 pts/1 00:00:00 -bash
postgres 30868 30411 0 16:13 pts/1 00:00:00 ps -ef
postgres 30869 30411 0 16:13 pts/1 00:00:00 grep --color=auto postgres
[postgres@ecs-db1 ~]$
主库上看到,后台进程walsender,正在向replica 172.22.3.5(57926) streaming 0/3000148推送日志信息。
7 主库查看数据库复制信息
[postgres@ecs-db1 ~]$ psql -xc "select * from pg_stat_replication"
-[ RECORD 1 ]----+------------------------------
pid | 4174
usesysid | 16384
usename | replica
application_name | walreceiver
client_addr | 172.22.3.5
client_hostname |
client_port | 57926
backend_start | 2021-10-20 15:45:37.817863+08
backend_xmin |
state | streaming
sent_lsn | 0/3000148
write_lsn | 0/3000148
flush_lsn | 0/3000148
replay_lsn | 0/3000148
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2021-10-21 16:11:58.438541+08
[postgres@ecs-db1 ~]$
五 主备切换及注意事项
如果因为意外或故障导致主库不可用的情况下,可以直接将备库提升为主库对外提供服务。然后视具体情况看原来的主库是否需要重建,或者是否待故障恢复之后,可以直接作为新的备库,然后从新的主库(原备库)同步数据。
下面是模拟切换步骤:
1 主库停止,模拟故障
[postgres@ecs-db1 ~]$ ps -ef|grep postgres
postgres 3708 1 0 Oct20 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 3710 3708 0 Oct20 ? 00:00:00 postgres: checkpointer
postgres 3711 3708 0 Oct20 ? 00:00:00 postgres: background writer
postgres 3712 3708 0 Oct20 ? 00:00:00 postgres: walwriter
postgres 3713 3708 0 Oct20 ? 00:00:00 postgres: autovacuum launcher
postgres 3714 3708 0 Oct20 ? 00:00:00 postgres: stats collector
postgres 3715 3708 0 Oct20 ? 00:00:00 postgres: logical replication launcher
postgres 4174 3708 0 Oct20 ? 00:00:00 postgres: walsender replica 172.22.3.5(57926) streaming 0/3000148
postgres 4747 30411 0 16:42 pts/1 00:00:00 ps -ef
postgres 4748 30411 0 16:42 pts/1 00:00:00 grep --color=auto postgres
root 30410 30409 0 16:11 pts/1 00:00:00 su - postgres
postgres 30411 30410 0 16:11 pts/1 00:00:00 -bash
[postgres@ecs-db1 ~]$ pg_ctl status
pg_ctl: server is running (PID: 3708)
/postgres/pg12.8/bin/postgres
[postgres@ecs-db1 ~]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@ecs-db1 ~]$ ps -ef|grep postgres
postgres 4843 30411 0 16:43 pts/1 00:00:00 ps -ef
postgres 4844 30411 0 16:43 pts/1 00:00:00 grep --color=auto postgres
root 30410 30409 0 16:11 pts/1 00:00:00 su - postgres
postgres 30411 30410 0 16:11 pts/1 00:00:00 -bash
[postgres@ecs-db1 ~]$ pg_ctl status
pg_ctl: no server running
[postgres@ecs-db1 ~]$
通过pg_ctl stop -m fast停止原来的主库之后,数据库后台进程都没有了。
2 备库提升为新主库,对外提供服务
[postgres@ecs-db2 ~]$ ps -ef|grep postgres
postgres 8911 1 0 Oct20 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 8912 8911 0 Oct20 ? 00:00:00 postgres: startup recovering 000000010000000000000003
postgres 8913 8911 0 Oct20 ? 00:00:00 postgres: checkpointer
postgres 8914 8911 0 Oct20 ? 00:00:00 postgres: background writer
postgres 8915 8911 0 Oct20 ? 00:00:00 postgres: stats collector
root 18015 18014 0 16:13 pts/0 00:00:00 su - postgres
postgres 18016 18015 0 16:13 pts/0 00:00:00 -bash
postgres 25404 18016 0 16:43 pts/0 00:00:00 ps -ef
postgres 25405 18016 0 16:43 pts/0 00:00:00 grep --color=auto postgres
[postgres@ecs-db2 ~]$ pg_ctl status
pg_ctl: server is running (PID: 8911)
/postgres/pg12.8/bin/postgres
[postgres@ecs-db2 ~]$ pg_ctl promote
waiting for server to promote.... done
server promoted
[postgres@ecs-db2 ~]$ ps -ef|grep postgres
postgres 8911 1 0 Oct20 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 8913 8911 0 Oct20 ? 00:00:00 postgres: checkpointer
postgres 8914 8911 0 Oct20 ? 00:00:00 postgres: background writer
postgres 8915 8911 0 Oct20 ? 00:00:00 postgres: stats collector
root 18015 18014 0 16:13 pts/0 00:00:00 su - postgres
postgres 18016 18015 0 16:13 pts/0 00:00:00 -bash
postgres 25622 8911 0 16:44 ? 00:00:00 postgres: walwriter
postgres 25623 8911 0 16:44 ? 00:00:00 postgres: autovacuum launcher
postgres 25624 8911 0 16:44 ? 00:00:00 postgres: logical replication launcher
postgres 25628 18016 0 16:44 pts/0 00:00:00 ps -ef
postgres 25629 18016 0 16:44 pts/0 00:00:00 grep --color=auto postgres
[postgres@ecs-db2 ~]$
重要1:启动备库为新主库的命令是pg_ctl promote。
提升备库为主库之后,可以看到,后台进程中不再有startup recovering,以及walreceiver streaming进程了。同时,多了postgres: walwriter 写进程。
重要2:$PGDATA/standby.signal文件自动消失了。这是告诉PostgreSQL,我现在不再是备库了,我的身份是主库了。
3 新主库修改pg_hba.conf文件
修改新主库(原备库172.22.3.5)的$PGDATA/pg_hba.conf文件,在其中添加允许新备库(原主库172.22.3.4)可以通过replica用户访问数据库的条目信息。
host replication all 172.22.3.101/32 md5
如果不做这一步配置的话,将来启动原主库为新备库的时候,可能会遇到下述错误。
2021-10-21 17:13:20.464 CST [11394] FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "172.22.3.101", user "replica", SSL off
2021-10-21 17:13:20.466 CST [11395] FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "172.22.3.101", user "replica", SSL off
注意:这里的172.22.3.101是原主库上配置的1个浮动IP地址,绑定在eth0:1设备上。如果主从环境的数据库没有配置浮动IP的话,则这里的IP地址,应该直接填原主库的实际IP地址。
4 原主库新建$PGDATA/standby.signal文件
[postgres@ecs-db1 data]$ touch standby.signal
[postgres@ecs-db1 data]$ pwd
/postgres/pg12.8/data
[postgres@ecs-db1 data]$ ll standby.signal
-rw-rw-r-- 1 postgres postgres 0 Oct 21 16:54 standby.signal
[postgres@ecs-db1 data]$
注意:这一步骤非常非常重要,如果不配置该文件的话,那么原来的主库一旦重新启动话,就将成为了1个新的独立主库,脱离了主从数据库环境。
5 原主库修改$PGDATA/postgresql.auto.conf文件
注意,应该用单引号,而不是双引号。否则遇到下述错误。
[postgres@ecs-db1 data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo="user=replica password=replica host=172.22.3.5 port=5432"
[postgres@ecs-db1 data]$
[postgres@ecs-db1 data]$ pg_ctl start -l ~/pg.log
waiting for server to start.... stopped waiting
pg_ctl: could not start server
Examine the log output.
[postgres@ecs-db1 data]$ tailf ~/pg.log
数据库启动日志
2021-10-21 09:07:06.512 GMT [10045] LOG: syntax error in file "/postgres/pg12.8/data/postgresql.auto.conf" line 3, near token """
2021-10-21 09:07:06.512 GMT [10045] FATAL: configuration file "postgresql.auto.conf" contains errors
修改$PGDATA/postgresql.auto.conf配置文件为下述正确的格式:
[postgres@ecs-db1 data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo='user=replica password=replica host=172.22.3.5 port=5432'
[postgres@ecs-db1 data]$
6 启动原主库,变为新备库
[postgres@ecs-db1 data]$ pg_ctl start -l ~/pg.log
waiting for server to start.... done
server started
[postgres@ecs-db1 data]$ ps -ef|grep postgres
root 8116 8115 0 16:58 pts/0 00:00:00 su - postgres
postgres 8118 8116 0 16:58 pts/0 00:00:00 -bash
root 8598 8597 0 17:00 pts/2 00:00:00 su - postgres
postgres 8600 8598 0 17:00 pts/2 00:00:00 -bash
postgres 11368 8118 0 17:13 pts/0 00:00:00 tailf pg.log
postgres 11389 1 0 17:13 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 11390 11389 0 17:13 ? 00:00:00 postgres: startup recovering 000000020000000000000003
postgres 11391 11389 0 17:13 ? 00:00:00 postgres: checkpointer
postgres 11392 11389 0 17:13 ? 00:00:00 postgres: background writer
postgres 11393 11389 0 17:13 ? 00:00:00 postgres: stats collector
postgres 11440 11389 0 17:13 ? 00:00:00 postgres: walreceiver streaming 0/3013AC8
postgres 12545 30411 0 17:18 pts/1 00:00:00 ps -ef
postgres 12546 30411 0 17:18 pts/1 00:00:00 grep --color=auto postgres
root 30410 30409 0 16:11 pts/1 00:00:00 su - postgres
postgres 30411 30410 0 16:11 pts/1 00:00:00 -bash
[postgres@ecs-db1 ~]$ tailf pg.log
2021-10-21 17:13:45.488 CST [11440] LOG: fetching timeline history file for timeline 2 from primary server
2021-10-21 17:13:45.493 CST [11440] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2021-10-21 17:13:45.493 CST [11440] LOG: replication terminated by primary server
2021-10-21 17:13:45.493 CST [11440] DETAIL: End of WAL reached on timeline 1 at 0/30001C0.
2021-10-21 17:13:45.494 CST [11390] LOG: new target timeline is 2
2021-10-21 17:13:45.494 CST [11440] LOG: restarted WAL streaming at 0/3000000 on timeline 2
2021-10-21 17:13:45.539 CST [11390] LOG: redo starts at 0/30001C0
这样,就完成了一次主从数据库环境的切换操作了。
六小结和参考链接
- 随着新版本的发行,数据库的配置和使用也越来越简单顺手了。
- 备库提升为主库的命令:pg_ctl promote;
- 新主库(原备库)的pg_hba.conf文件,要开放允许流复制访问数据库的信息给原主库的IP地址;
- 原主库配置为新备库的时候,务必要创建$PGDATA/standby.signal文件;
- 原主库配置为新备库的时候,务必要修改$PGDATA/postgresql.auto.conf文件,添加主库primary_conninfo的信息;
参考:https://www.percona.com/blog/2019/10/11/how-to-set-up-streaming-replication-in-postgresql-12/
https://www.2ndquadrant.com/en/blog/replication-configuration-changes-in-postgresql-12
延伸阅读之前写的一篇:PostgreSQL主从同步配置,切换步骤