Ankit Maurya
Distributed systems, AI and architecture learnings
- Backend engineer focused on solving complex problems by building scalable, reliable distributed systems. Experience across e-commerce, payments, and banking with high-throughput, low-latency production systems.
- At Walmart, leading backend development for large-scale pricing and fulfillment systems. Built a distributed rule-engine and re-architected the platform into scalable microservices handling millions of requests per minute.
- Previously at Global Payments and Newgen, worked on real-time IVR automation and enterprise banking platforms, focusing on event-driven architectures, data systems, and reliability at scale.
- Currently building production-grade GenAI applications and AI agents, integrating intelligent systems into backend architectures in a scalable and practical way.
- Core focus on AI and distributed system design, scalability, and backend architecture—approaching problems from first principles with emphasis on trade-offs, system behavior, and long-term maintainability.
- Open to conversations around building and scaling meaningful software systems.
Engineering learnings from production
These insights are drawn from my production experience shaped by working alongside some of the best engineering minds in the industry.
Partitioning is a product decision.
Your key choice defines ordering guarantees, consumer parallelism, backfills, and what “correctness” means under load.
Model for reads; pay for mistakes later.
Cassandra rewards predictable access paths and punishes “flexible queries” with hotspots, tombstones, and slow repairs.
Cache correctness is a spectrum.
Choose bounded staleness and predictable failure modes over “perfect invalidation” that becomes operational debt.
Async systems need “replay safety.”
Retries happen. Make handlers idempotent, encode dedupe, and expose backpressure before latency becomes an outage.
SLOs prevent “alert-driven” engineering.
Define what matters to users, then pick signals that explain failures. Everything else becomes noise and burnout.
Consistency is a workflow, not a toggle.
Use invariants, audits, and repair tools. The safest distributed system assumes partial failure and drift.
Research papers I read
Papers that have shaped my understanding of distributed systems and AI.
Learning Projects
POCs to understand distributed systems and AI.
Writing
Notes on engineering trade-offs, reliability, and system design.
Reachable Here
Say Hi !!
I’m most effective on systems that need strong fundamentals, architecture thinking and complex problem solving. I love to build systems that are scalable, reliable, efficient and solving usecase at scale.