AWS Primary Kubernetes Platform with Azure Disaster Recovery

Designed and deployed a multi-cloud resilience pattern for a customer-facing multi-tier web application composed of a static frontend, Kubernetes-hosted APIs and workers, PostgreSQL, Redis, and object storage. The client needed provider-level disaster recovery rather than only regional resilience, with the production runtime hosted on AWS and a warm-standby recovery stack maintained on Azure.

AWSEKSAWS Load Balancer ControllerCloudflare DNSCloudflare CDNCloudflare Load BalancingCloudflare WAFRDS PostgreSQLElastiCacheS3AzureAKSNGINX IngressAzure Database for PostgreSQLAzure Cache for RedisBlob StorageKey VaultHelm

Back to Projects Discuss a Similar Project

Architecture Diagram

Technical Implementation

Defined the application as a static web frontend served from S3 and cached at the edge by Cloudflare, with API and background worker services running on Amazon EKS behind an ALB managed by the AWS Load Balancer Controller. PostgreSQL, Redis, and object storage remained separate stateful tiers so the runtime model reflected a real multi-tier application rather than a single cluster-hosted service.
Built the Azure DR environment with Blob Storage for the static frontend and synchronized object assets, NGINX Ingress on AKS for API entry, Azure Database for PostgreSQL Flexible Server, Azure Cache for Redis, and Key Vault, keeping the Kubernetes manifests common through Helm values and environment overlays instead of maintaining a separate application definition per cloud.
Replicated container images from ECR to ACR, configured PostgreSQL logical replication from RDS PostgreSQL to Azure Database for PostgreSQL, and synchronized object assets from S3 to Blob Storage on a scheduled basis so the Azure environment remained warm and recoverable without being used as an active runtime.
Placed Cloudflare in front of both clouds for authoritative DNS, CDN caching, health-based traffic steering, WAF policy enforcement, and DDoS protection. Cloudflare served the S3 frontend and AWS ALB as the primary web and API origins, then failed over to Blob Storage and the AKS ingress endpoint when the AWS side was intentionally withdrawn during DR rehearsals. This moved the public control plane out of AWS without removing the need for per-cluster ingress and origin load balancing inside each cloud.

Client Delivery & Handover

The delivery was run jointly with the client application, platform, and operations teams because the work crossed cloud networking, edge security, Kubernetes operations, database replication, and release engineering. The client team participated in design reviews, pipeline implementation, and DR rehearsals rather than only reviewing the end state. Handover included cloud-by-cloud architecture diagrams, Cloudflare failover and security policy procedures, replication operating notes, AKS and EKS support guidance, and rehearsal sessions for both platform operators and support leads so the failover process could be repeated without external help.

Outcome

The client retained AWS as the primary operating environment while gaining a documented and tested cross-cloud recovery path that reduced dependence on AWS for both application hosting and public traffic steering during high-impact incidents.

Project Snapshot