Streaming Replication

From Software Heritage Wiki
Revision as of 09:20, 15 June 2018 by Ardumont (talk | contribs) (First iteration)
Jump to navigation Jump to search

Following steps are described how to setup the streaming replication between a master and slave.

Master

Master information

Example db:

  • host: somerset.internal.softwareheritage.org
  • port: 5434
  • db: softwareheritage-indexer

Replication user needed

Make sure a replication user exists on the master server

   su - postgres
   createuser -p 5434 replicator --replication
   psql -p 5434 template1
   alter user replicator with password '<precious-secret>';

Note: To retrieve the password.

   swhpass ls infra/somerset/postgres/5434/replicator

Postgres configuration adaptation

pghba.conf

File: /etc/postgresql/10/indexer/pghba.conf

   # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
   host    replication     replicator      192.168.200.0/21        md5	# Azure

postgresql.conf

File: /etc/postgresql/10/indexer/postgresql.conf

   wal_level = replica      # or higher, logical is fine
   wal_keep_segments = 500  # in logfile segments, 16MB each; 0 disables
   wal_compression = on     # enable compression of full-page writes

Note:

  • walcompression is an optimization which reduces I/O and network
 traffic at the expense of CPU time. It is not required.
  • walkeepsegments needs to be set to a reasonable value in order to
 keep enough wal data between the initial database backup and its
 subsequent regular startup on the slave. Slave startup can fail at
 first, so make sure to have at least a couple hours of slack in case
 things go wrong.

Restart db

Make the db aware of those changes:

   # to determine the main pid of the concerned db
   sudo systemctl status postgres@10-indexer.service
   # Actually make the db aware
   # kill -HUP <pid-of-db>
   # or
   sudo systemctl restart postgres@10-indexer.service

Slave

Package Dependencies

Install client for connection part, and server for replication one.

   # client part and server
   sudo apt install postgresql-client-10 postgresl-10

Postgres setup

Make sure the connection is ok.

As postgres user:

   su - postgres
   touch .pgpass
   cat > .pgpass <<EOF
   somerset.internal.softwareheritage.org:5434:replication:replicator:<precious-secret>
   EOF
   chmod 0600 .pgpass
   # should permit connection

Create replica cluster

This is a necessary and temporary step to permit to initialize configuration files that would need editing down the line.

   # drop installed main cluster as this won't be used
   sudo pg_dropcluster --stop 10 main
   # prepare directories in data volume
   sudo mkdir /srv/softwareheritage/postgres
   sudo chown postgres:postgres /srv/softwareheritage/postgres
     # create the cluster configuration
   sudo -u postgres env LC_ALL=C.UTF-8 \
       pg_createcluster 10 replica \
           -D /srv/softwareheritage/postgres/10/replica \
           -p 5434

Make sure the slave can connect to the master

Edit the local configuration according to the master. The slave must have the at least the same boundaries for some configuration options:

file: /etc/postgresql/10/replica/postgresql.conf

   max_connections = 128      # same as master somerset.internal.softwareheritage.org:5434:softwareheritage-indexer
   max_worker_processes = 10  # same as master somerset.internal.softwareheritage.org:5434:softwareheritage-indexer

Note: You will be notified in logs when trying to start the db that some setup are not correctly set. Some form of:

   ardumont@dbreplica1 $ sudo journalctl -xef -u postgresql@10-replica.service
   ...
   Jun 15 08:13:24 dbreplica1 postgresql@10-replica[654]: 2018-06-15 08:13:24.425 UTC [664] FATAL:  hot standby is not possible because max_worker_processes = 8 is a lower setting than on the ma
   ster server (its value was 10)
   ...

Retrieve backup

From the replica node, as postgres user:

   postgres@dbreplica1 $ pg_basebackup --progress --write-recovery-conf \
           -h somerset.internal.softwareheritage.org -p 5434 -U replicator \
           -D /srv/softwareheritage/postgres/10/somerset-swh-indexer.backup

Note that this step could fail if your .pgpass is not setup correctly.

Install backup as db

Move the backup files to the regular pgdata location and start the slave server:

   postgres@dbreplica1 $ cd /srv/softwareheritage/postgres/10
   # Remove 'temporary' cluster
   postgres@dbreplica1 $ rm -rf replica
   # Rename the backup as replica
   postgres@dbreplica1 $ mv somerset-swh-indexer.backup replica
   ardumont@dbreplica1 $ sudo systemctl status postgresql@10-replica.service

Start cluser

Now that everything is ready, start the cluster. This will start by redoing the necessary steps to synchronize the dbs.

   sudo systemctl start postgresql@10-replica.service