Difference between revisions of "Streaming Replication"

From Software Heritage Wiki
Jump to navigation Jump to search
m (Fix _, sub issues)
(Move page to the intranet (was wrongly created here, no credentials leaked or anything anyway))
Line 1: Line 1:
Following steps are described how to setup the streaming replication
+
Move away to [https://intranet.softwareheritage.org/index.php?title=Streaming_Replication intranet].
between a master and slave.
 
 
 
= Master =
 
== Master information ==
 
Example db:
 
 
 
* host: somerset.internal.softwareheritage.org
 
 
 
* port: 5434
 
 
 
* db: softwareheritage-indexer
 
 
 
== Replication user needed ==
 
Make sure a replication user exists on the master server
 
 
 
    su - postgres
 
    createuser -p 5434 replicator --replication
 
    psql -p 5434 template1
 
    alter user replicator with password '<precious-secret>';
 
 
 
Note: To retrieve the password.
 
 
 
    swhpass ls infra/somerset/postgres/5434/replicator
 
 
 
== Postgres configuration adaptation ==
 
=== pg_hba.conf ===
 
File: /etc/postgresql/10/indexer/pg_hba.conf
 
 
 
    # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
 
    host    replication    replicator      192.168.200.0/21        md5 # Azure
 
 
 
=== postgresql.conf ===
 
File: /etc/postgresql/10/indexer/postgresql.conf
 
 
 
    wal_level = replica      # or higher, logical is fine
 
    wal_keep_segments = 500  # in logfile segments, 16MB each; 0 disables
 
    wal_compression = on    # enable compression of full-page writes
 
 
 
Note:
 
 
 
* wal_compression is an optimization which reduces I/O and network
 
  traffic at the expense of CPU time. It is not required.
 
 
 
* wal_keep_segments needs to be set to a reasonable value in order to
 
  keep enough wal data between the initial database backup and its
 
  subsequent regular startup on the slave. Slave startup can fail at
 
  first, so make sure to have at least a couple hours of slack in case
 
  things go wrong.
 
 
 
== Restart db ==
 
Make the db aware of those changes:
 
 
 
    # to determine the main pid of the concerned db
 
    sudo systemctl status postgres@10-indexer.service
 
    # Actually make the db aware
 
    # kill -HUP <pid-of-db>
 
    # or
 
    sudo systemctl restart postgres@10-indexer.service
 
 
 
= Slave =
 
== Package Dependencies ==
 
Install client for connection part, and server for replication one.
 
 
 
    # client part and server
 
    sudo apt install postgresql-client-10 postgresl-10
 
 
 
== Postgres setup ==
 
Make sure the connection is ok.
 
 
 
As postgres user:
 
 
 
    su - postgres
 
    touch .pgpass
 
    cat > .pgpass <<EOF
 
    somerset.internal.softwareheritage.org:5434:replication:replicator:<precious-secret>
 
    EOF
 
    chmod 0600 .pgpass
 
    # should permit connection
 
 
 
== Create replica cluster ==
 
This is a necessary and temporary step to permit to initialize
 
configuration files that would need editing down the line.
 
 
 
    # drop installed main cluster as this won't be used
 
    sudo pg_dropcluster --stop 10 main
 
    # prepare directories in data volume
 
    sudo mkdir /srv/softwareheritage/postgres
 
    sudo chown postgres:postgres /srv/softwareheritage/postgres
 
      # create the cluster configuration
 
    sudo -u postgres env LC_ALL=C.UTF-8 \
 
        pg_createcluster 10 replica \
 
            -D /srv/softwareheritage/postgres/10/replica \
 
            -p 5434
 
 
 
== Make sure the slave can connect to the master ==
 
Edit the local configuration according to the master.  The slave must
 
have the at least the same boundaries for some configuration options:
 
 
 
file: /etc/postgresql/10/replica/postgresql.conf
 
 
 
    max_connections = 128      # same as master somerset.internal.softwareheritage.org:5434:softwareheritage-indexer
 
    max_worker_processes = 10  # same as master somerset.internal.softwareheritage.org:5434:softwareheritage-indexer
 
 
 
Note: You will be notified in logs when trying to start the db that
 
some setup are not correctly set.  Some form of:
 
 
 
    ardumont@dbreplica1 $ sudo journalctl -xef -u postgresql@10-replica.service
 
    ...
 
    Jun 15 08:13:24 dbreplica1 postgresql@10-replica[654]: 2018-06-15 08:13:24.425 UTC [664] FATAL:  hot standby is not possible because max_worker_processes = 8 is a lower setting than on the ma
 
    ster server (its value was 10)
 
    ...
 
 
 
== Retrieve backup ==
 
From the replica node, as postgres user:
 
 
 
    postgres@dbreplica1 $ pg_basebackup --progress --write-recovery-conf \
 
            -h somerset.internal.softwareheritage.org -p 5434 -U replicator \
 
            -D /srv/softwareheritage/postgres/10/somerset-swh-indexer.backup
 
 
 
Note that this step could fail if your .pgpass is not setup correctly.
 
 
 
== Install backup as db ==
 
Move the backup files to the regular pgdata location and start the
 
slave server:
 
 
 
    postgres@dbreplica1 $ cd /srv/softwareheritage/postgres/10
 
    # Remove 'temporary' cluster
 
    postgres@dbreplica1 $ rm -rf replica
 
    # Rename the backup as replica
 
    postgres@dbreplica1 $ mv somerset-swh-indexer.backup replica
 
    ardumont@dbreplica1 $ sudo systemctl status postgresql@10-replica.service
 
 
 
== Start cluser ==
 
Now that everything is ready, start the cluster.  This will start by
 
redoing the necessary steps to synchronize the dbs.
 
 
 
    sudo systemctl start postgresql@10-replica.service
 

Revision as of 09:38, 15 June 2018

Move away to intranet.