🚩 The Breaking Point

FinVault had crossed 8,000 registered users but their monolithic PHP portal — originally built in 2019 — was not designed for this scale. During peak market hours (9:15 AM–11:30 AM IST), the server would hit 100% CPU utilization, causing session timeouts and incomplete portfolio loads. Their biggest enterprise client threatened to terminate the contract after losing ₹3.2 Lakhs due to a delayed sell signal caused by stale data.

The core architectural flaw: all data — live prices, NAV, portfolio P&L — was fetched via a single cron job every 45 minutes. There was no event-driven mechanism, no caching strategy, and no horizontal scalability plan.

🔍 Discovery & Audit

System Area Issue Found Impact
Database Layer N+1 queries, no indexing Page load 14s avg
Data Freshness 45-min cron polling Stale investment data
Frontend Architecture jQuery + inline CSS spaghetti Zero mobile support
Scalability Single VPS, no load balancing Crashes at 500+ users

🛠️ The Architecture Rebuild

Phase 1 — Decouple the Monolith

We split the application into 4 independent microservices: Auth Service (JWT), Portfolio Engine, Market Data Streamer, and Notification Service. Each service communicated via a Redis Pub/Sub message queue, eliminating direct database chaining and allowing independent scaling.

Phase 2 — Real-Time Data Pipeline

Replaced 45-minute cron polling with a WebSocket-based live stream from NSE/BSE market data APIs. A Node.js event emitter layer processed and broadcast price ticks to connected clients in under 800ms. Redis cached the last 100 ticks per symbol — serving 95% of requests without touching PostgreSQL.

Phase 3 — Next.js Frontend

The entire UI was rebuilt in Next.js 14 with App Router. Portfolio charts used Chart.js with real-time dataset updates via WebSocket listeners. Server Components handled initial data hydration; Client Components handled live tick updates — achieving an LCP of 1.1 seconds on mobile.

Phase 4 — Infrastructure & DevOps

Deployed on Docker containers with Nginx reverse proxy on a 3-node cluster. GitHub Actions CI/CD pipeline ran automated tests on every push with zero-downtime blue-green deployments. Cloudflare CDN handled static asset serving with edge caching across 12 global PoPs.

📈 Outcome: 60 Days Post-Launch

Metric Before After Delta
Data Latency 45 minutes 800ms -99.97%
Peak Concurrent Users 500 (crashes) 12,000 (stable) +2300%
Page Load Time (avg) 14.2s 1.1s -92%
Client Retention (30-day) 38% 82% +116%
System Uptime 94.2% 99.97% +5.77 pts