AWS Primary Kubernetes Platform with Azure Disaster Recovery

Designed and deployed a multi-cloud resilience pattern for a Kubernetes-based multi-tier application, with the production runtime hosted on AWS and a warm-standby disaster recovery stack maintained on Azure.

AWSEKSALBExternalDNSRDS PostgreSQLElastiCacheS3AzureAKSAzure Front DoorAzure Database for PostgreSQLAzure Cache for RedisBlob StorageKey VaultHelmRoute 53

Back to Projects Discuss a Similar Project

Technical Implementation

Ran the primary application stack on Amazon EKS with ALB Ingress Controller, ExternalDNS, Amazon RDS for PostgreSQL, ElastiCache for Redis, and S3 for shared object storage so the transactional path stayed close to the client's main application estate on AWS.
Built the Azure DR environment on AKS with Azure Container Registry, Azure Database for PostgreSQL Flexible Server, Azure Cache for Redis, Blob Storage, and Key Vault, keeping the Kubernetes manifests common through Helm values and environment overlays instead of maintaining a separate application definition per cloud.
Replicated container images from ECR to ACR, configured PostgreSQL logical replication from RDS PostgreSQL to Azure Database for PostgreSQL, and synchronized object assets from S3 to Blob Storage on a scheduled basis so the Azure environment remained warm and recoverable without being used as an active runtime.
Implemented failover using Route 53 health checks and DNS failover records pointing to the AWS ALB as primary and Azure Front Door as secondary, then validated the DR design through cutover rehearsals that promoted the Azure PostgreSQL instance, redeployed the AKS release with production values, and confirmed application health, ingress routing, and queue-drain behavior before switching traffic.

Client Delivery & Handover

The delivery was run jointly with the client application, platform, and operations teams because the work crossed cloud networking, Kubernetes operations, database replication, and release engineering. The client team participated in design reviews, pipeline implementation, and DR rehearsals rather than only reviewing the end state. Handover included cloud-by-cloud architecture diagrams, DR runbooks, DNS failover procedures, replication operating notes, AKS and EKS support guidance, and rehearsal sessions for both platform operators and support leads so the failover process could be repeated without external help.

Outcome

The client retained AWS as the primary operating environment while gaining a documented and tested cross-cloud recovery path that reduced dependence on a single cloud provider during high-impact incidents.

Project Snapshot