HA setup

This page describes how to deploy an Inmanta server in a HA setup and how to perform a failover when required.

Setup a HA PostgreSQL cluster

The Inmanta server stores its state in a PostgreSQL database. As such, the PostgreSQL database should be deployed in a high available setup, to ensure the availability of the Inmanta server. This page describes how to setup a two node PosgreSQL cluster, consisting of a master node and a warm standby. The master node performs synchronous replication to the standby node. When the master node fails, the standby can be promoted to the new master node by performing a manual action.

Prerequisites

  • Master node: The master node has been setup according to step 2 and step 3 of the Inmanta installation documentation.

  • Standby node: The standby node should only have a PostgreSQL installation, so only step 2 of the Inmanta installation documentation should be executed.

Configure the master node

Login on the master node and perform the following changes in the /var/opt/rh/rh-postgresql10/lib/pgsql/data/postgresql.conf file:

# Adjust the listen address as such that the standby node
# can connect to the master node.
listen_addresses = '*'

# Increase the wal_level to the required level for data replication
wal_level = replica

# Only report success to the client when the transaction has been
# flushed to permanent storage
synchronous_commit = on

# Force synchronous replication to the standby node. The application_name
# uniquely identifies the standby instance and can be freely chosen as long
# as it only consists of printable ASCII characters.
synchronous_standby_names = 'inmanta'

# Make sure that no queries can be executed on the standby
# node while it is in recovery mode.
hot_standby = off

Execute the commands mentioned below on the master node. These commands do two thing:

  • They create a replication user with replication and login privileges. The standby node will use this user to connect to the master node.

  • They create a new replication slot, named replication. This replication slot will make sure that sufficient data is retained on the master node to synchronize the standby node with the master node.

$ sudo su - postgres -c 'scl enable rh-postgresql10 -- psql'
$ CREATE USER replication WITH REPLICATION LOGIN PASSWORD '<password-replication-user>';
$ SELECT * FROM pg_create_physical_replication_slot('replication');
$ \q

Add the lines mentioned below to the /var/opt/rh/rh-postgresql10/lib/pgsql/data/pg_hba.conf file. This will make sure that the replication user can be used to setup a replication connection from the standby node to the master. Since, the standby node can become the master node, both hosts should be add to the file.

host    replication     replication      <ip-master-node>/32        md5
host    replication     replication      <ip-standby-node>/32       md5

Restart the rh-postgresql10-postgresql service to activate the configuration changes.

$ sudo systemctl restart rh-postgresql10-postgresql

Configure the standby node

The standby gets configured by creating a backup of the master node and restoring it on the standby node. The commands mentioned below create a backup in the /tmp/backup directory. This command will prompt for the password of the replication user. By setting the -R option, a recovery.conf file will be add to the backup. This will make sure that the PostgreSQL server starts as a standby node.

$ sudo su - postgres -c 'scl enable rh-postgresql10 -- pg_basebackup -h <ip-master-node> -U replication -X stream -R -D /tmp/backup -S replication -W'

On the standby node, clear the content of the /var/opt/rh/rh-postgresql10/lib/pgsql/data directory and replace it with the content of the backup created on the master node. The recovery.conf file needs to be adjusted as such that it has the application_name parameter in the primary_conninfo setting. This application_name should match the name configured in the synchronous_standby_names setting of the postgresql.conf file of the master node.

standby_mode = 'on'
primary_conninfo = 'user=replication password=<password> host=<ip-master-node> port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any application_name=inmanta'
primary_slot_name = 'replication'

Comment out, the synchronous_standby_names setting in the postgresql.conf file of the standby node. This will ensure that the standby node acts fully independently when it is promoted to a master node. Finally, start and enable the PostgreSQL service on the standby node.

$ sudo systemctl start rh-postgresql10-postgresql
$ sudo systemctl enable rh-postgresql10-postgresql

Failover PostgreSQL

This section describes the action required to recover from a failed PostgreSQL master node.

Promote a standby node to the new master node

When the master node fails, the standby node can be promoted to become the new master node. After this failover, the new master will acts as a fully independent instance, i.e. no replication will happen to a standby instance.

Execute the following command on the standby instance to promote it to a new master node:

$ sudo su - postgres -c 'scl enable rh-postgresql10 -- pg_ctl promote -D /var/opt/rh/rh-postgresql10/lib/pgsql/data/'

This command will rename the recovery.conf file to recovery.done. The old master node can be reconfigured to become the new standby node, by executing the step described in the next section.

Add a standby node to a newly promoted master node

This section explains how a standby can be add to a master node, which was created from a promoted standby node.

First, add a replication slot on the new master node by executing following commands:

$ sudo su - postgres -c 'scl enable rh-postgresql10 -- psql'
$ SELECT * FROM pg_create_physical_replication_slot('replication');
$ \q

Then, configure the new standby instance by following the step mentioned in Configure the standby node. When the standby is up, the master node perform asynchronous replication to the standby node. The master node needs to be reconfigured to perform synchronous replication. This is done by adding the line mentioned below the postgresql.conf file of the master node. The application_name has to match the application_name set in the recovery.conf file of the standby node.

synchronous_standby_names = 'inmanta'

Finally, reload the configuration of the master node using the following command:

$ sudo systemctl reload rh-postgresql10-postgresql

Failover an Inmanta server

This section describes different ways to failover an Inmanta server.

Failover an Inmanta server to the warm standby PostgreSQL instance

This section describes how to failover an Inmanta server to a new PostgreSQL master node when the previous master node has failed.

First, stop the orchestrator by stopping the inmanta-server service.

$ sudo systemctl stop inmanta-server

Promote the standby node to a master node by following the procedure mentioned in Section Promote a standby node to the new master node. When the promotion is finished, the Inmanta server can be reconfigured to start using the new master node. Do this by adjusting database.host setting the /etc/inmanta/inmanta.d/database.cfg file:

[database]
host=<ip-address-new-master-node>
name=inmanta
username=inmanta
password=<password>

Now, start the Inmanta orchestrator again:

$ sudo systemctl start inmanta-server

Start a new orchestrator on warm standby PostgreSQL instance

This section describes what should be done to recover when the Inmanta server and the PostgreSQL master node fail simultaneously. It is also possible to failover the Inmanta server when the PostgreSQL master node has not failed.

Before starting the failover process, it’s important to ensure that the original Inmanta server is fully disabled. This is required to prevent the situation where two orchestrators are performing configuration changes on the same infrastructure simultaneously. Disabling the Inmanta orchestrator can be done by stopping the machine running the Inmanta server or disabling the inmanta-server service using the following commands:

$ sudo systemctl stop inmanta-server
$ sudo systemctl disable inmanta-server

The following step should only be executed when the PostgreSQL master node has failed.

Next, promote the standby PostgreSQL node to the new master node using the procedure in Section Promote a standby node to the new master node. When the (new) master node is up, a new Inmanta server can be installed according the procedure mention in the Install Inmanta section. In the /etc/inmanta/inmanta.d/database.cfg configuration file, the database.host setting should contain the IP address of the new PostgreSQL master node.

When the Inmanta server is up and running, a recompile should be done for each existing configuration model.