8+ years · Senior Backend Engineer

Ankit Maurya
Distributed systems, AI and architecture learnings

Backend engineer focused on solving complex problems by building scalable, reliable distributed systems. Experience across e-commerce, payments, and banking with high-throughput, low-latency production systems.
At Walmart, leading backend development for large-scale pricing and fulfillment systems. Built a distributed rule-engine and re-architected the platform into scalable microservices handling millions of requests per minute.
Previously at Global Payments and Newgen, worked on real-time IVR automation and enterprise banking platforms, focusing on event-driven architectures, data systems, and reliability at scale.
Currently building production-grade GenAI applications and AI agents, integrating intelligent systems into backend architectures in a scalable and practical way.
Core focus on AI and distributed system design, scalability, and backend architecture—approaching problems from first principles with emphasis on trade-offs, system behavior, and long-term maintainability.
Open to conversations around building and scaling meaningful software systems.

Contact

Distributed systems Event-driven architecture Reactive Systems Kafka / Streaming Cassandra / NoSQL Caching strategies Low latency systems Fault tolerance Observability AI agents RAG systems LLMOps Vector search Prompt Engineering Finetuning Cloud (AWS, GCP) Problem Solving System Design Data Structures

Engineering learnings from production

These insights are drawn from my production experience shaped by working alongside some of the best engineering minds in the industry.

View all learnings

Kafka

Ordering · Throughput · Hot keys

Partitioning is a product decision.

Your key choice defines ordering guarantees, consumer parallelism, backfills, and what “correctness” means under load.

Cassandra

Query-first · Compaction · Tombstones

Model for reads; pay for mistakes later.

Cassandra rewards predictable access paths and punishes “flexible queries” with hotspots, tombstones, and slow repairs.

Caching

Staleness · Stampedes · Invalidation

Cache correctness is a spectrum.

Choose bounded staleness and predictable failure modes over “perfect invalidation” that becomes operational debt.

Pipelines

Retries · Idempotency · Backpressure

Async systems need “replay safety.”

Retries happen. Make handlers idempotent, encode dedupe, and expose backpressure before latency becomes an outage.

Reliability

SLOs · Error budgets · Triage

SLOs prevent “alert-driven” engineering.

Define what matters to users, then pick signals that explain failures. Everything else becomes noise and burnout.

Data

Consistency · Reconciliation · Audits

Consistency is a workflow, not a toggle.

Use invariants, audits, and repair tools. The safest distributed system assumes partial failure and drift.

Research papers I read

Papers that have shaped my understanding of distributed systems and AI.

View all papers

Learning Projects

POCs to understand distributed systems and AI.

View all projects

Writing

Notes on engineering trade-offs, reliability, and system design.

View all writing

Reachable Here

Say Hi !!

ankit.maurya01@zohomail.in

GitHub

github.com/AnkitKiet

www.linkedin.com/in/maurya-ankit94

Quick note

I’m most effective on systems that need strong fundamentals, architecture thinking and complex problem solving. I love to build systems that are scalable, reliable, efficient and solving usecase at scale.

Ankit Maurya Distributed systems, AI and architecture learnings