Introduction
Streaming replication is the cornerstone of modern PostgreSQL high‑availability solutions. Unlike file‑based WAL shipping, streaming sends WAL records over a TCP socket in real time, allowing a standby to stay close to the primary’s state. This post walks through every step you need to set up streaming replication, from basic configuration to advanced fail‑over strategies, and ties the discussion into broader DBA practices such as performance tuning, backup automation, and cross‑platform insights (Oracle DBA, SQL Server, RMAN, Data Guard, ASM).
Architecture Overview
In a typical streaming replication topology, one primary node writes WAL records and streams them to one or more standby nodes. Standbys apply WAL to stay in sync, and can be promoted to primary on failure. The key components are:
- Primary
postgresql.conf(streaming settings) - Standby
recovery.conf(now a set of parameters inpostgresql.conforstandby.signal) - Connection authentication via
pg_hba.conf - Optional
pg_basebackupor third‑party tools for initial data copy - Monitoring tools (e.g.,
pg_stat_replication,pg_rewind)
Prerequisites
- Hardware & OS – Ensure each node meets the RAM, CPU, and disk I/O requirements for your workload. SSDs reduce WAL flush latency.
- Networking – A dedicated, low‑latency network link between primary and standbys. Port 5432 (or custom) must be open.
- PostgreSQL version – 9.6+ supports streaming replication out of the box. For advanced features (logical replication, hot standby promotion with
pg_auto_failover), use 10+. - Credentials – Create a replication role:
CREATE ROLE replicator WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'strongpass';
- Backup strategy – Have a recent base backup before enabling replication. Use
pg_basebackupor a tool likepgBackRestfor database backup automation.
Setting Up the Primary
On the primary server, adjust postgresql.conf to enable WAL streaming and replication:
listen_addresses = '*'n
wal_level = replican
max_wal_senders = 5n
wal_keep_size = 1GBn
synchronous_commit = onn
Explanation:
wal_levelmust bereplica(orlogicalfor logical replication).max_wal_sendersdefines the number of concurrent replication connections.wal_keep_sizekeeps enough WAL for standbys that lag.synchronous_commitcan be set toonif you want transactions to wait until replication acknowledges receipt.
Authentication
Update pg_hba.conf to allow the replication role from standby IPs:
# TYPE DATABASE USER ADDRESS METHOD
host replication replicator 192.168.1.20/32 md5
Reload the configuration:
SELECT pg_reload_conf();
Preparing the Standby
Standby nodes need a fresh copy of the primary’s data directory. The simplest method is pg_basebackup:
pg_basebackup -h primary_host -D /var/lib/postgresql/12/main -U replicator -Fp -Xs -P
After the copy completes, create standby.signal (or set primary_conninfo in postgresql.conf):
primary_conninfo = 'host=primary_host port=5432 user=replicator password=strongpass sslmode=prefer'
Optionally, enable synchronous_standby_names on the primary to control which standbys are considered synchronous.
Starting the Standby
Restart PostgreSQL on the standby:
sudo systemctl restart postgresql
Verify the replication status from the primary:
SELECT pid, usename, application_name, state, sync_state
FROM pg_stat_replication;
You should see a row for each standby with state = streaming and sync_state = async (or sync).
Monitoring & Metrics
- WAL lag –
pg_stat_replicationshowswrite_lagandflush_lag. - Disk usage – Monitor
wal_keep_sizeandpg_waldirectory growth. - Health checks – Use
pg_isreadyor external tools (Prometheus exporter,pgWatch). - Alerting – Configure alerts when lag > X ms or
pg_stat_activityindicates long‑running transactions.
Fail‑over Strategies
Manual Promotion
On a standby, run:
SELECT pg_promote();
This stops WAL replay and makes the standby the new primary. You must also update clients to point to the new primary and ensure the former primary is shut down or reconfigured as standby.
Automatic Fail‑over with pg_auto_failover
For zero‑downtime fail‑over, install pg_auto_failover and create a cluster:
psql -c 'SELECT pg_autoctl create cluster 12 mycluster';
Follow the pg_autoctl documentation to set up node registration, election, and monitoring.
High Availability Patterns
- Primary‑Standby Pair – Single primary + one or more asynchronous standbys.
- Multi‑Master (Logical Replication) – Use
wal_level=logicalandpglogicalto sync across nodes. - Distributed Clusters (Citus) – Scale writes across shards, while replication ensures data locality.
- Cloud‑Native (RDS, CloudSQL) – Leverage managed fail‑over but still configure
pg_stat_replicationfor custom monitoring.
Performance Tuning for Replication
- WAL Buffers – Increase
wal_bufferson primary to reduce WAL flush frequency. - Checkpoint Settings – Tweak
checkpoint_timeout,max_wal_size, andmin_wal_sizeto balance durability vs. performance. - I/O Scheduling – Use
deadlineornoopon SSDs;cfqon spinning disks. - Network Compression – Use
wal_compression = onto reduce bandwidth, at the cost of CPU. - Connection Pooling – Deploy PgBouncer or Pgpool-II on standbys to reduce connection overhead for read‑only queries.
Backup and Point‑in‑Time Recovery (PITR)
Replication is often the first step toward a robust backup strategy. Combine streaming replication with:
- Base Backups – Periodic
pg_basebackuporpgBackRestsnapshots. - WAL Archiving –
archive_mode=onandarchive_commandto store WAL segments externally. - Point‑in‑Time Recovery – Rewind the standby to a specific time using
pg_waldumporpg_rewind. - Automation – Integrate with CI/CD pipelines or configuration management tools.
Security Considerations
- Encryption – Enable TLS for replication connections by setting
ssl=onandssl_ca_fileon both primary and standby. - Least Privilege – Use a dedicated replication role with
REPLICATIONprivilege only. - Firewall Rules – Restrict replication port to known IPs.
- Monitoring Credentials – Avoid using superuser credentials for monitoring tools.
- Audit Logging – Enable
log_connectionsandlog_disconnectionsto track replication sessions.
Common Pitfalls & Troubleshooting
- “Replication slots are full” – Increase
wal_keep_sizeor usereplication_slotto manage WAL retention. - Lag spikes – Check network latency, disk I/O bottlenecks, or long transactions on the primary.
- Authentication failures – Verify
pg_hba.confand the replication role password. - Promote error: “primary is already in recovery” – The node is still applying WAL; run
pg_is_in_recovery()to confirm. - Missing
standby.signal– PostgreSQL 12+ uses this file; older versions userecovery.conf.
Best Practices Checklist
- Use dedicated replication slots to avoid WAL backlog.
- Keep primary and standby on the same OS version to avoid compatibility issues.
- Implement regular fail‑over drills to validate promotion scripts.
- Maintain consistent timezone settings across nodes.
- Document the replication topology and recovery procedures.
- Integrate replication metrics into central monitoring dashboards.
Conclusion
Streaming replication in PostgreSQL is a mature, flexible mechanism that can be combined with backup automation, performance tuning, and robust fail‑over solutions to build highly available, resilient databases. By following the steps outlined above and adhering to best‑practice guidelines, you can ensure that your PostgreSQL workloads remain online, consistent, and ready to scale. The same principles—configuration hygiene, monitoring, and automation—apply whether you manage on‑prem databases, cloud‑hosted instances, or hybrid environments alongside Oracle, SQL Server, and other enterprise systems.
If you found this guide helpful, subscribe to our newsletter and follow us on LinkedIn for more DBA insights.


