Zero-Downtime Database Migrations at Scale
How we migrated 2TB of PostgreSQL data across 50+ microservices without a single second of downtime. The tooling, the playbook, and the lessons learned.
I build scalable infrastructure and AI-powered systems that run at scale. Passionate about developer experience, cloud-native architectures, and open-source.
3.5+ years building cloud-native infrastructure, Kubernetes platforms, and Java microservices in production at scale.
Working at Incedo Inc. on-site for Feedzai — a market leader in AI-driven financial fraud prevention protecting the world's largest banks through real-time risk management and AML solutions. Domain expertise in AML and Transaction Fraud for Banking (TFB) across transaction processing, event enrichment, and multi-tenancy for key global clients.
Production systems and engineering initiatives from 3.5+ years at Feedzai's fraud detection platform.
End-to-end Blue-Green and rolling deployment pipeline for a production AI fraud detection platform on AWS EKS. Achieved 97%+ success rate across 40+ releases with zero downtime and full rollback ownership.
Led the full production migration from standalone Docker to Kubernetes using Helm Charts and Kubernetes Operators. Shifted config management from Ansible playbooks to ConfigMaps/Secrets with GitLab CI automation — zero production disruption.
Implemented Java microservice features for a multi-tenant financial fraud detection system: database operations, webhook integrations, REST API configurations, and JWT-based authentication. Reduced reference data delete effort by 80%.
Set up and maintained the full observability stack for production Kubernetes workloads — metrics with Prometheus, dashboards with Grafana, and log aggregation with Loki for proactive incident detection and RCA.
Monitored and maintained ETL pipelines for reliable data transfer to AWS S3. Resolved failures and data inconsistencies through log analysis, cron-job debugging, and root cause analysis with CloudWatch.
Systematic approach to analysing and resolving production-critical incidents — OOM kills, disk exhaustion, ETL failures, log rotation, and PostgreSQL indexing issues — using Linux tooling, grep, awk, ps, and curl.
Tools and technologies I use to ship production systems — from infrastructure to intelligent applications.
Industry-recognised credentials across cloud, Kubernetes, and AI — with two more in progress.
CNCF / Linux Foundation
Amazon Web Services
Anthropic
Anthropic
Anthropic
Online Course
Recognition, measurable outcomes, and milestones from 3.5+ years in production engineering.
Received twice at Incedo Inc. for outstanding performance and direct recognition from the client Feedzai for exceptional contributions across deployment ownership, incident management, and feature delivery.
Jan 2024
Owned 40+ Blue-Green and rolling Kubernetes deployments over 3 years with a 97%+ success rate. Ensured zero-downtime releases with proactive client communication and full rollback ownership.
Jan 2024
Implemented Java microservice functionality enabling client-owned execution flows, reducing reference data entity delete operation effort by 80% through optimised REST API design.
Jan 2023
Drove the full migration from standalone Docker to Kubernetes using Helm Charts and Operators. Shifted configuration management from Ansible to ConfigMaps/Secrets with GitLab CI automation — zero production disruption.
Jan 2023
Analysed and resolved 10+ production-critical incidents including OOM kills, disk exhaustion, ETL pipeline failures, log rotation issues, and PostgreSQL indexing degradation. Communicated findings to stakeholders.
Jan 2024
Led and mentored a team of 3 junior engineers, owning the full engineering lifecycle from sprint planning and development through to deployment, production release, and client communication.
Jan 2023
Graduated with a CGPA of 9.1 in Computer Science Engineering from SRM Institute of Science and Technology, Chennai (2018–2022).
Jun 2022
Deep-dives, war stories, and lessons from building production systems.
How we migrated 2TB of PostgreSQL data across 50+ microservices without a single second of downtime. The tooling, the playbook, and the lessons learned.
A complete guide to designing, deploying, and operating a Kubernetes platform for 50+ engineering teams. Covers GitOps, observability, cost management, and developer experience.
After running LLM-powered features for 200K daily users for 6 months, here's what I wish I knew about latency, cost, caching, and failure modes.
We ran all three in production for 18 months. Here's the honest comparison: throughput, operational overhead, ecosystem maturity, and when to use each.
Open to new opportunities, interesting projects, and good conversations.