Suman Prasad

Hypothetical Document Embeddings (HyDE): Smarter Retrieval in RAG

Suman Prasad — Tue, 31 Mar 2026 14:54:29 GMT

Most RAG systems work like this: Take user query → convert to embedding → search → generate answer

But here’s the issue: User queries are often too short, too vague, and missing context.

And because of that, retrieval is not always accurate. So what if, instead of searching with a weak query

We first expand it into a rich document?

That’s exactly what HyDE (Hypothetical Document Embeddings) does.

What is HyDE?

HyDE is a retrieval technique where we:

Generate a hypothetical document from the user query
Convert that document into embeddings
Use it to search for better context

Where does HyDE fit in RAG?

A typical RAG pipeline:

Indexing → Store documents as embeddings
Retrieval → Find relevant data
Generation → Produce answer

Here, HyDE improves the retrieval step

Instead of Query → Search, we do Query → Generate document → Search

How HyDE Works (Step-by-Step)

Step 1: Generate a Hypothetical Document

We use the LLM’s internal knowledge to expand the query:

Step 2: Convert to Embeddings

Step 3: Perform Semantic Search

Since the input is rich, retrieval becomes: more aligned, more meaningful

Step 4: Generate Final Response

Now the model answers using: original query, high-quality retrieved context

Why HyDE Works So Well?

In normal conditions, the RAG, the query is short, which has weak embeddings and average retrieval.

In HyDE, the generated doc is rich, so there is strong embedding and better retrieval.

Essentially, we are expanding the query, adding the hidden context, and improving semantic matching.

When Should You Use HyDE?

When queries are too short or vague, the domain is complex, retrieval quality is inconsistent.

Final Thought

RAG is not just about storing embeddings.

It’s about how you search.

HyDE shifts the thinking from:

“Search what user said” to “Search what user meant.”

If you found this useful, I write simple blogs on:

GenAI Systems, backend engineering, system design

Follow along to catch more.

Reciprocal Rank Fusion: Making RAG Retrieval Smarter

Suman Prasad — Mon, 30 Mar 2026 17:05:58 GMT

Most RAG systems follow a simple idea:

Take the user query → search similar data → generate response

But here’s the problem: what if the user query is incomplete or ambiguous?

You might retrieve:

partially relevant data
or completely miss important context

This is where Reciprocal Rank Fusion (RRF) comes in

What is Reciprocal Rank Fusion?

Reciprocal Rank Fusion is a retrieval technique that combines results from multiple queries and ranks them intelligently. Instead of relying on just one query, we:

Generate multiple variations of the same query
Retrieve documents for each variation
Rank documents based on their importance across all queries

If a document appears frequently across different queries and ranks higher, it is probably more relevant.

Where does RRF fit in RAG?

A typical RAG pipeline has three steps:

Indexing → Store data as embeddings
Retrieval → Find relevant data
Generation → Produce final answer

RRF is applied in the retrieval phase. Instead of one query → one retrieval, we do multiple queries → multiple retrievals → ranked fusion

How RRF Works ?

Step 1: Generate Query Variations

We take the original user query and create similar versions

Step 2: Parallel Retrieval

Each query runs independently

Step 3: Rank Documents (Core of RRF)

Instead of merging blindly, we score documents based on rank positions.

RRF Formula: Score = ∑ (1 / (k + rank))

rank = position in the result list

k = constant (usually 60)

Step 4: Select Top Documents

Step 5: Generate Final Answer

Why RRF Improves Results?

In normal RAG: One query → limited view → limited context

In RRF: Multiple perspectives → richer context → better answer

You are essentially:

exploring different angles of the same question
merging the best information
prioritizing what matters most

Final Thought

RAG is not just about embeddings.

It’s about how smart your retrieval is.

Techniques like:

Query decomposition (Chain of Thought)
Query expansion (RRF)

If you found this useful, I write simple blogs on:

GenAI Systems, backend engineering, system design

Follow along to catch more.

Chain of Thought in RAG: Making Queries Smarter, Not Harder

Suman Prasad — Thu, 26 Mar 2026 17:08:52 GMT

When building RAG systems, one common problem shows up quickly:

The user asks one big question... but the system struggles to retrieve the right context. Because most user queries are too abstract.

So what are the steps involved in building a scalable and reliable distributed system?

The system would perform much better if that query were broken into smaller, focused questions.

That's exactly where Chain of Thought (CoT) comes in.

What is Chain of Thought (CoT)?

Chain of Thought is a technique where a complex query is broken into smaller, logical steps, and each step is processed one after another.

Instead of solving everything in one go, the system:

breaks the query

solves each part

uses previous results as context

gradually builds a better answer

Instead of jumping to the final answer instantly, reason step by step.

What CoT Matters in RAG Systems?

A typical RAG pipeline has three main steps:

Indexing → storing data as embeddings
Retrieval → fetching relevant information
Generation → producing the final answer

Usually, generation is not the problem; It's in retrieval

Let's say if the query is vague or broad, the retrieval step returns:

weak context
irrelevant chunks
incomplete information

And the final answer suffers.

This is the place where Chain of Thought helps.

What CoT Matters in RAG Systems?

CoT is applied in the retrieval stage.

Instead of sending one large query to the vector database, we:

Break the query into sub-queries
Process them sequentially
Use previous outputs to improve the next retrieval

So instead of:

We do:

How Chain of Thought Works (Step by Step)

Step 1: Break the query

Example: User Query → How does a scalable RAG system handle large traffic and ensure accurate responses?

LLM break into:

What is scalability in RAG systems?
How does retrieval work in RAG?
How do we improve retrieval accuracy?

Step 2: Process First Sub-query

Take the first sub-query

generate embeddings
perform a semantic search
retrieve relevant chunks
generate a response

Step 3: Pass Context Forward

The output of step 1 becomes the context for step 2. So instead of starting fresh every time, the system builds knowledge step by step.

Step 4: Repeat Sequentially

Continue this process:

Each sub-query uses previous responses
Context becomes richer
Retrieval becomes more precise

Step 5: Final Output

The response generated from the last sub-query becomes the final answer.

Example With Code

import os
import json
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("GROQ_API_KEY"),
    base_url="https://api.groq.com/openai/v1"
)
system_prompt = """
You are an AI assistant who is expert in breaking down complex problems and then resolve the user query.
For the given user input, analyse the input and break down the problem step by step.
Atleast thing 5-6 steps on how to solve the problem before solving it down.
The steps are you get a user input, you analyse,you think, you again think for several times and then return an output with explanation and then finally you validate the output as well before giving final result.
Follow the steps in sequence that is "analyse", "think", "output", "validate" and finally "result".
Rules:
1. Follow the strict JSON output as per Output schema.
2. Always perform one step at a time and wait for next input
3. Carefully analyse the user query

Output Format:
{{ step: "string", content: "string" }}
Example: 
Input: What is 2 + 2.
Output: {{ step: "analyse", content: "Alright! The user is interested in maths query and he is asking a basic arithmetic operation"}}
Output: {{ step: "think", content: "To perform the addition i must go from left t right and add all the operands.}}
Output: {{ step: "output", content: "4" }}
Output: {{ step: "validate", content: "seems like 4 is correct ans for 2 + 2" }}
Output: {{ step: "result", content: "2 + 2 = 4 and that is calculated by adding all numbers" }}
"""
messages = [
    {"role": "system", "content": system_prompt},
]
query = input("> ")
messages.append({"role": "user", "content": query})

while True:
    resonse = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        response_format={"type": "json_object"},
        messages=messages
    )
    parsed_response = json.loads(resonse.choices[0].message.content)
    messages.append({"role": "assistant", "content":         json.dumps(parsed_response)})
    if parsed_response.get("step") != "output":
        print(f"{parsed_response.get('step')}:     {parsed_response.get('content')}")
        continue
    print(f"output: {parsed_response.get('content')}")
    break

> what is 2 + 10 * 5
analyse: Alright! The user is interested in maths query and has a query with a mix of addition and multiplication operation with a BODMAS(Brackets,Order, Division, Multiplication, Addition, Subtraction) application required, where the first non bracketed operator is multiplication.
think: To solve this, I need to follow the BODMAS rule and perform operations from left to right. First, I will perform the multiplication and then add the result to 2. I need to consider 10 being multiplied by 5 to get the correct intermediate result before moving to next operation which is addition with 2.
think: First, I must consider multiplication. 10 * 5 = 50, which is the result of the first operation in the given expression. Now, I proceed to the addition of result 50 and 2.
output: 52

Why This Improves Accuracy

It reduces abstraction. Instead of asking one vague question, it focuses on smaller parts, retrieves focused information, and builds a layered context, which leads to better semantic search, more relevant chunks, and higher-quality responses.

When Should You Use CoT in RAG?

Chain of Thought is useful when queries are complex, questions involve multiple steps, context needs to be built gradually, and retrieval quality is poor.

Final Thoughts

RAG systems don't fail because generation is weak. They fail because retrieval is shallow.

Chain of Thought fixes this by:

breaking queries into meaningful steps
increasing context depth
improving semantic retrieval

If you found this useful, I write simple blogs on:

GenAI Systems, backend engineering, system design

Follow along to catch more.

Understanding Maintainability in System Design

Suman Prasad — Sun, 01 Mar 2026 04:39:42 GMT

When people talk about system design, the conversation usually focuses on scalability or reliability.

But in reality, many systems don’t fail because they cannot scale.
They fail because they become too painful to maintain.

A system that is hard to understand, modify, or operate will eventually slow down development and create constant operational problems.

This is why maintainability is a critical property of well-designed systems.

Why Maintainability Matters

Most of the cost of software does not come from writing the first version.

It comes from maintaining it over time.

Maintenance includes many things:

fixing bugs
adding new features
upgrading dependencies
improving performance
adapting to new requirements
keeping the system running smoothly

In many organizations, engineers spend far more time maintaining systems than building them from scratch.

This is why a system that is easy to maintain can significantly improve long-term productivity.

The Legacy System Problem

Many developers have experienced working on what people often call a legacy system.

These systems usually have characteristics like:

complicated code structure
poor documentation
unclear design decisions
tightly coupled components

Because of this, even small changes become risky.

Developers often hesitate to modify such systems because a simple update might break something unexpected.

The goal of good system design is to avoid creating tomorrow’s legacy system today.

Three Principles of Maintainable Systems

Maintainable systems usually share three key qualities:

Operability
Simplicity
Evolvability

Together, these principles make systems easier to run, understand, and modify.

Operability: Making Systems Easy to Run

Software does not run itself.

Operations teams (or DevOps engineers) are responsible for keeping systems healthy and running smoothly.

Their tasks include:

monitoring system health
diagnosing failures
managing deployments
applying security patches
planning infrastructure capacity
handling configuration changes

If a system is difficult to operate, even small issues can take hours to diagnose.

Good systems, therefore, aim to make the daily life of operators easier.

How Systems Improve Operability

A well-designed system usually provides several operational capabilities.

Monitoring and Metrics

Operators need visibility into system behavior.

Typical metrics include:

error rates
request latency
CPU and memory usage
traffic patterns

Monitoring allows teams to detect problems early before users are affected.

Automation Support

Routine operational tasks should be automated.

Examples include:

deployments
scaling infrastructure
backups
system recovery

Automation reduces human error and improves consistency.

Safe Maintenance

Sometimes operators must take machines offline for maintenance.

A well-designed system should allow this without affecting users.

This often requires load balancing and redundancy so traffic can be redirected.

Good Documentation

Clear documentation helps teams understand:

How the system works
How to troubleshoot problems
How to deploy new versions

Without documentation, even simple operational tasks become difficult.

Simplicity: Managing Complexity

As systems grow, complexity naturally increases.

More services are added.
More dependencies appear.
More interactions occur between components.

If complexity is not controlled, the system eventually becomes very difficult to understand.

Engineers sometimes describe such systems as a "big ball of mud".

Signs of a Complex System

A system suffering from excessive complexity often shows patterns like:

tightly coupled modules
tangled dependencies
inconsistent naming
excessive special cases
hidden assumptions in code

These issues make systems fragile and slow down development.

Essential vs Accidental Complexity

Not all complexity is bad.

There are two types of complexity in software systems.

Essential Complexity

This comes from the actual problem the system is trying to solve.

For example:

handling financial transactions
managing user authentication
processing large datasets

This complexity cannot be removed.

Accidental Complexity

This comes from poor design decisions.

Examples include:

unnecessary abstractions
confusing APIs
poorly structured code
duplicated logic

The goal of good system design is to reduce accidental complexity as much as possible.

Abstraction: The Key Tool for Simplicity

One of the most powerful tools for managing complexity is abstraction.

Abstraction hides internal implementation details and exposes a simpler interface.

We see this concept everywhere in software.

Examples include:

Programming languages hiding machine instructions
SQL hides low-level storage details
APIs hiding internal service logic

By hiding complexity, abstraction makes systems easier to understand and maintain.

However, designing good abstractions requires careful thinking.

Poor abstractions can sometimes create even more complexity.

Evolvability: Designing Systems That Can Change

One thing is certain in software development.

Requirements will change.

Over time, systems must adapt to:

new user needs
evolving business goals
growing datasets
new technologies
regulatory requirements

A system that cannot adapt easily becomes obsolete.

This ability to adapt is called evolvability.

Code-Level vs System-Level Change

Many development practices improve changeability at the code level.

Examples include:

refactoring
automated testing
test-driven development

These techniques make it easier to modify small pieces of code safely.

However, large systems also need to evolve at the architectural level.

For example, a company might redesign how data is stored or how services communicate.

Such changes require systems that are simple and modular enough to evolve.

Why Simplicity Enables Evolvability

Simplicity and evolvability are closely connected.

Simple systems are easier to:

understand
modify
extend
debug

Complex systems make changes risky because engineers cannot easily predict the consequences.

This is why simplicity is one of the strongest foundations for long-term maintainability.

Final Thoughts

Building software is only the beginning of a system’s lifecycle.

The real challenge lies in running and evolving that system over time.

Maintainable systems are built with three goals in mind:

making operations manageable
keeping system design simple
enabling future changes

When systems achieve these qualities, they remain productive and adaptable even as requirements grow and technology evolves.

Series Summary

This concludes the three-part series on important properties of well-designed systems:

1️⃣ Reliability — systems that continue working despite faults
2️⃣ Scalability — systems that handle growing demand
3️⃣ Maintainability — systems that remain easy to operate and evolve

Together, these principles form the foundation of strong system design.

Understanding Scalability in System Design

Suman Prasad — Sat, 21 Feb 2026 16:13:17 GMT

Modern systems rarely fail because they are badly written.

More often, they fail because they cannot handle growth.

A system that works perfectly with 1,000 users may completely collapse when the user base reaches 1 million. This is why scalability becomes one of the most important topics in system design.

In this article, we'll understand what scalability means, how engineers measure system load, and the different ways systems grow to handle increasing demand.

What is Scalability?

In simple terms, scalability is the ability of a system to handle growth.

Growth can happen in different ways:

More users are joining the system.
More requests are being sent to the system.
More data is being stored.
Higher traffic during peak hours.

A scalable system should be able to continue performing well even when demand increases.

For example:

Imagine an e-commerce platform during a festive sale.

If the number of users suddenly increases from 10,000 to 1 million, the system should still:

process order
update inventory
show product pages quickly

If it cannot handle this increase, the system is not scalable.

Understanding System Load

Before we talk about scaling, we must first understand what kind of load the system in handling.

Different systems measure load in different ways.

Some common load parameters include:

Requests per second
Number of active users
Read vs write operations
Cache hit rate
Amount of stored data

These metrics help engineers understand where the system is under pressure.

For example:

A video streaming platform may focus on bandwidth and concurrent users
A messaging app may care about messages per second
A social network may track timeline requests per second

Understanding the correct metric is the first step toward designing scalable systems.

Real-World Example: Twitter (Now X) Timeline Problem

One of the most famous scalability challenges comes from social media platforms like Twitter.

Two common operations happen on such systems:

Posting a tweet
Viewing a user's timeline

At first glance, posting tweets seems simple. But the real challenge lies in distributing that tweet to millions of followers.

Let’s look at two different ways to design the timeline system.

Approach 1: Compute Timeline When User Reads

In this design, the system calculates the timeline only when the user opens it.

Steps:

Find all users the current user follows
Fetch their recent tweets
Merge and sort them

Example query:

SELECT tweets.*, users.*
FROM tweets
JOIN users ON tweets.sender_id = users.id
JOIN follows ON follows.followee_id = users.id
WHERE follows.follower_id = current_user

Advantage

Writes are cheap because the system only stores tweets once.

Problem

Reads become very expensive.

If millions of users open their timelines at the same time, the system must run millions of complex queries.

This approach struggles when the read traffic is extremely high.

Approach 2: Precompute Timeline When Tweet is Posted

Another design flips the logic.

Instead of computing timelines when users read them, the system prepares the timeline when a tweet is created.

Steps:

User posts a tweet
The system copies that tweet into the timeline of each follower

Now when a user opens their timeline, the data is already prepared.

Advantage

Reading timelines becomes extremely fast.

Problem

Writes become expensive.

Imagine:

Average user has 75 followers
If 4,000 tweets are posted per second

The system now performs:

4,000 × 75 = 300,000 writes per second

For celebrities with millions of followers, a single tweet could generate millions of database writes.

Hybrid Design Used in Practice

Real systems rarely use just one approach.

Instead, they combine both.

Typical strategy:

Normal users → Fan-out on write
Celebrities → Compute on read

This reduces the load caused by huge follower counts.

The key takeaway here is:

Scalability solutions depend heavily on usage patterns.

There is no universal design that works for every system.

Measuring System Performance

When traffic increases, engineers usually ask two questions.

Question 1

If load increases but resources stay the same,
how does system performance change?

Question 2

If load increases,
how many additional resources are needed?

To answer these questions, we measure system performance using two main metrics.

Throughput

Throughput measures how much work the system can process.

Example:

records processed per second
tasks completed per minute

Throughput is commonly used in batch processing systems like data pipelines.

Response Time

Response time measures how long a user waits for a response.

This includes:

processing time
network delay
waiting in queues

In most web systems, response time is the most important user-facing metric.

Latency vs Response Time

People often mix these terms, but they are slightly different.

Latency

The time a request waits before processing starts.

Response Time

Total time from request to response.

Response Time = Latency + Processing Time + Network Delay

Users care about response time, because that represents how long they actually wait.

Why Average Response Time is Misleading

Many engineers make the mistake of measuring average response time.

But averages hide slow requests.

Example:

If most requests take 100 ms but a few take 5 seconds, the average may still look fine.

However, those slow requests create a bad user experience.

This is why engineers rely on percentiles.

Understanding Percentiles

Percentiles show how slow the worst requests are.

Common metrics include:

Percentile	Meaning
50th	Median response time
95th	Slow requests
99th	Very slow edge cases

Large tech companies often monitor the 99th percentile latency to ensure even rare slow requests are under control.

The Tail Latency Problem

Modern systems often depend on multiple services.

Example:

A single request may involve:

authentication service
recommendation engine
database
payment service

The overall response must wait for the slowest service.

This problem is known as tail latency amplification.

Even if most services are fast, one slow component can delay the entire request.

Methods to Handle Growing Load

Once engineers understand the load, they decide how to scale the system.

There are two common approaches.

Vertical Scaling (Scale Up)

This means upgrading a machine with more resources.

Example:

more CPU
more RAM
faster disks

Advantages

Simple to implement.

Limitations

Machines cannot grow infinitely.
Eventually, hardware upgrades become extremely expensive

Horizontal Scaling (Scale Out)

Instead of upgrading one machine, the system adds more machines.

The workload is distributed across multiple servers.

This architecture is often called shared-nothing architecture, because each machine works independently.

Advantages

Can support very large systems.

Challenges

More operational complexity.

Hybrid Scaling in Real Systems

Most real systems combine both strategies.

For example:

a few powerful machines
combined with distributed clusters

This allows systems to handle both heavy workloads and large data volumes.

Elastic Scaling vs Manual Scaling

Scaling can happen automatically or manually.

Elastic Scaling

Infrastructure automatically adds or removes servers depending on traffic.

Common in cloud platforms.

Manual Scaling

Engineers decide when to add servers.

Simpler but slower to respond to sudden traffic spikes.

Stateless vs Stateful Systems

Scaling also depends on whether a service is stateless or stateful.

Stateless Services

These services do not store user data locally.

Examples:

API servers
web servers

They are easy to scale — just add more instances.

Stateful Systems

These store persistent data.

Examples:

databases
storage systems

Scaling them requires data partitioning or replication, which adds complexity.

Final Thoughts

Scalability is not about predicting the future perfectly.
It is about designing systems that can grow when needed.

Good scalable systems start by understanding:

system load
usage patterns
performance metrics

There is no universal architecture that works everywhere.

Each system must be designed based on how users interact with it and how the workload behaves.

In the next part of this series, we’ll explore another critical property of good systems — Maintainability.

Why Systems Fail - And How Reliable Systems Survive

Suman Prasad — Wed, 11 Feb 2026 17:03:01 GMT

When a system goes down, users don’t care whether it was a server crash, a bug, or a configuration mistake.

All they see is one thing:

“The app is not working.”

That moment when users can’t use the system is what truly matters. And preventing that moment is what reliability is all about.

Reliability is not about building systems that never break.

That’s impossible.

Reliability is about building systems that continue to work even when parts of them fail.

Reliability Is About Trust

Every system, big or small, makes a promise to its users.

A payment app promises that money will move safely.
A photo app promises that memories won’t disappear.
A business tool promises that work won’t be lost.

When that promise breaks, users don’t just get frustrated - they lose trust.

And trust, once lost, is very hard to win back.

That’s why reliability matters far beyond critical systems like airplanes or hospitals.
It matters just as much for everyday products.

Faults vs Failures - A Small Difference That Matters

Inside any system, problems are constantly happening.

A disk might stop working.
A server might crash.
A network might slow down.
A bug might get triggered.

These are faults.

But a fault is not the same as a failure.

These are faults.

But a fault is not the same as a failure.

A failure is when the user feels the impact.
When the app stops responding.
When data becomes unavailable.
When something breaks from the user’s point of view.

Good systems accept that faults will happen.
Their goal is simple: Don’t let internal problems become user-visible failures.

You Can’t Avoid Problems — You Can Only Prepare for Them

No matter how carefully you design a system, things will go wrong.

Hardware wears out.
Software has bugs.
Humans make mistakes.

So the real question is not:

“How do we stop failures from ever happening?”

The real question is:

“How do we make sure the system survives when they do?”

This mindset leads to something called fault tolerance.

A fault-tolerant system expects trouble. It detects issues early. It recovers quickly. And it keeps serving users.

Some companies even go a step further. They intentionally break their own systems to test them.

Why?
Because if your system only works in perfect conditions, it’s not reliable.

Hardware Problems Are Normal

In large systems, hardware failure is not rare — it’s routine.

Disks fail.
Machines lose power.
Network connections drop.

In environments with thousands of machines, something breaks almost every day.

Modern systems don’t try to make each machine perfect.
Instead, they assume: Some machines will fail — design around that reality.

That’s why systems use:

Multiple servers
Redundant storage
Backup power
Replicated data

The goal is simple: if one part stops working, another part takes over.

Software Errors Are Harder to Predict

Hardware problems are random.

Software problems are different.

A single hidden bug can affect every server at the same time.

Sometimes these bugs stay invisible for years.
Then one day, under a rare condition, they trigger and cause widespread issues.

Even worse, software failures can create chain reactions.

One service slows down → Another waits → Queues build up → Timeouts increase → More services fail.

And suddenly, a small issue becomes a major outage.

This is why strong system design, testing, and monitoring are essential.
Not to remove all bugs, but to catch them early and contain the damage.

Human Mistakes Cause the Most Outages

Surprisingly, the biggest cause of system failures isn’t hardware or software.

It’s people.

A wrong configuration, a mistaken deployment, or a command executed in the wrong environment.

These small errors can bring down large systems.

Good teams don’t try to eliminate human mistakes completely.
Instead, they build systems that are safer to operate.

That means:

Safe testing environments
Easy rollback options
Gradual deployments
Clear monitoring

So when something goes wrong, recovery is fast.

Reliability Is a Responsibility

It’s easy to think reliability only matters for “critical” systems.

But even simple applications carry responsibility.

If a system loses financial data, it causes stress.
If it loses business records, it causes damage.
If it loses personal memories, it causes emotional loss.

Even if an app isn’t life-critical, reliability still matters deeply to the people using it.

The Reality of Trade-Offs

Not every system can be extremely reliable from day one.

Startups, prototypes, and early-stage products often focus more on speed than perfection.
And that’s okay, but it should always be a conscious decision. Because improving reliability later becomes harder if it wasn’t considered early.

A Simple Way to Remember Reliability

At its core, reliability is about one thing:

When something breaks, does the system still work?

Reliable systems:

Expect faults
Absorb failures
Recover quickly
Protect user trust

And that’s what separates fragile systems from strong ones.

If you found this useful, I write simple, practical blogs on backend systems, databases, and system design.
Follow along to catch the next post in this series - we’ll explore Scalability next.

Exploring REST: Beyond Basic HTTP APIs Explained

Suman Prasad — Wed, 04 Feb 2026 05:46:00 GMT

When people hear REST, they often think it simply means “an HTTP endpoint that returns JSON.”

But REST is much more than that. It’s a way of designing interactions between clients and servers, not a library or a framework.

Let’s break REST down in a simple and practical way.

What REST actually is?

REST stands for Representational State Transfer.

It is a set of architectural principles that describes how a client and a server should communicate.

REST:

Does not force you to use a specific language
Does not enforce a framework
Does not dictate how data is stored internally

It only defines how resources are identified, accessed, and represented.

Everything Is a Resource

At the heart of REST is the idea of a resource.

A resource is any meaningful object in your system, such as:

A user
An order
A product
A comment

Each resource is identified using a unique identifier, commonly a URL when REST is implemented over HTTP.

For example:

/users/42
/orders/105
/products/9

These URLs represent things, not actions.

Actions Are Separate From Resources

In REST, actions are not part of the URL.

Instead, the action is expressed using the operation applied to the resource.

Think in terms of:

Fetching a resource
Creating a resource
Updating a resource
Removing a resource

This separation makes APIs predictable and easy to reason about.

What is Representation?

A resource itself is abstract.

What the client receives is a representation of that resource.

The same resource can be represented in different formats:

JSON
XML
CSV

The client can request the format it understands, and the server responds if it supports it.

This allows REST APIs to serve:

Web apps
Mobile apps
Other backend services

without changing the core resource model.

REST is Not Tied to HTTP

One important thing many people miss:

REST is not bound to HTTP.

REST only cares that:

Resources are clearly identified
Actions are well-defined
Representations are transferred between the client and server

In theory, REST can work over:

HTTP
Messaging Systems
Even non-network interfaces

However, in practice, REST fits extremely well with HTTP.

Why REST Works So Well With HTTP

HTTP already provides everything REST needs:

Clear operations, Resource addressing, Status reporting

This natural alignment is why REST over HTTP became so popular.

Example

GET /students/1

This means:

/students/1 → identifies the resource
GET → specifies the action

The client asks for the current state of the resource, and the server responds with a representation.

Why REST Over HTTP Is Widely Used

One major reason REST over HTTP dominates is tooling.

You get a lot for free:

Easy testing with tools like curl or Postman
Built-in caching via proxies and CDNs
Load balancing at the network layer
Monitoring and tracing support
Transport-level security using HTTPS

These existing tools reduce the effort required to build and operate APIs at scale.

Common Downsides of REST Over HTTP

REST over HTTP is powerful, but it’s not perfect.

Some real-world limitations include:

Extra overhead from text-based payloads
Repeated serialization and deserialization
Verb limitations in certain environments
Inefficiency for chatty or streaming workloads
Tight coupling to HTTP semantics

Because of these trade-offs, REST is not always the best choice for every use case.

When REST Is a Good Fit

REST works very well when:

You are exposing public APIs
Clients are diverse (web, mobile, services)
Caching is important
Simplicity and readability matter
Requests are stateless

This is why REST remains dominant for most web-facing systems.

Final Thoughts

REST is not about exposing endpoints — It’s about modeling systems around resources and representations.

When used correctly, REST leads to APIs that are:

Easy to understand
Easy to consume
Easy to scale

But like any architectural style, REST is a tool — not a rule.

Choosing it should be based on system needs, not trends.

If you enjoyed this, I write simple blogs on backend systems, databases, and system design.
You can follow me here to catch the next one.

Difference between Sharding and Partitioning

Suman Prasad — Sat, 31 Jan 2026 06:01:26 GMT

Sharding vs Partitioning: What’s the Real Difference?

As applications grow, databases often become the first bottleneck. Queries slow down, writes queue up, and suddenly the system that worked fine yesterday starts struggling today.

Two common techniques used to scale databases are Partitioning and Sharding.

They sound similar, are often used together, and are frequently confused — but they solve slightly different problems.

Let’s break them down in a simple, practical way.

Why Do Databases Need to Scale?

A database usually starts its life on a single machine. That machine has limited CPU, memory, disk, and network capacity.

As usage increases, the database experiences:

More write traffic
More read traffic
More stored data

At first, we try vertical scaling — upgrading the machine. But hardware has limits. When one machine can no longer handle the load, we need a different approach.

That’s where horizontal scaling enters the picture.

Horizontal Scaling in Databases

Horizontal scaling means distributing data across multiple database servers so that no single machine becomes a bottleneck.

Instead of one database handling everything, multiple databases share the load.

This is the foundation on which both partitioning and sharding are built.

What Is Partitioning?

Partitioning is about splitting data into smaller logical pieces.

All partitions may still live on:

The same database server, or
Different servers

But conceptually, the data is divided.

Example: Table Partitioning

Imagine a users table with millions of rows. Instead of storing everything together, the database can split it like:

Users with IDs 1–1M
Users with IDs 1M–2M
Users with IDs 2M–3M

Each chunk is a partition.

The database knows where each partition lives and routes queries accordingly.

Key Points About Partitioning

It is mainly a data organization technique
Often managed by the database engine
Improves query performance and manageability
Does not always imply multiple machines

What Is Sharding?

Sharding is about distributing data across multiple database servers.

Each server stores only a subset of the total data and handles queries for that subset.

That server is called a shard.

Example: User-Based Sharding

Suppose you have:

Shard A → users with IDs ending in 0–4
Shard B → users with IDs ending in 5–9

Each shard:

Stores different data
Handles its own reads and writes
Scales independently

Key Points About Sharding

Sharding is an architectural decision
Each shard is usually a separate database instance
Enables true horizontal scaling
Requires routing logic in the application or middleware

How Sharding and Partitioning Work Together

A common real-world setup:

The database is sharded across machines
Each shard internally uses partitions to manage its data

For example:

3 shards (3 database servers)
Each shard has 4 partitions

So the system has:

3 shards
12 partitions total

Advantages of Sharding

Sharding unlocks capabilities that a single database cannot provide:

Handles very high read and write traffic
Increases total storage capacity
Improves fault isolation
Enables independent scaling per shard

Challenges of Sharding

Sharding comes with trade-offs:

Operational complexity increases
Cross-shard queries are expensive
Transactions across shards are harder
Rebalancing shards is non-trivial

This is why sharding is usually adopted only when necessary.

When Should You Use What?

When to use Partitioning

Tables are large
Queries need optimization
You want better data organization

When to use Sharding

One database cannot handle the load
You need horizontal scalability
The system has reached hardware limits

Final Thoughts

Partitioning helps databases stay efficient.
Sharding helps systems grow beyond a single machine.

Most scalable systems use both, but only after carefully understanding the trade-offs.

Microservices

Suman Prasad — Fri, 30 Jan 2026 05:07:02 GMT

Microservices are everywhere today. Almost every modern system design discussion eventually reaches the question:

“Should we move to microservices?”

Before answering that, it’s important to understand what microservices really are, how they differ from monoliths, and when they actually make sense.

What Are Microservices?

In simple terms, microservices are small, independent services that focus on one business capability and communicate over a network.

Each Service:

Has a clear responsibility
Can be developed, deployed, and scaled independently
Exposes functionality via APIs

For example, in an e-commerce platform:

One service handles order
One handles Payments
One handles Notifications
One handles Analytics

What Is a Monolith?

A monolith is a single application where all features live in one codebase and are deployed together.

In a monolith system:

Payment logic
Notification logic
User management
Analytics

are all part of the same application and run as one unit.

This is how most products start, and that’s not a bad thing.

Why Monoliths Are a Good Starting Point?

Monoliths are often underestimated. They are actually great for early-stage systems.

Advantages of a monolith:

Easy to build and understand
Simple testing and debugging
One deployment pipeline
Faster initial development
Easier local setup for developers

For a small team or a new product, a monolith helps move fast.

Problems With Large Monoliths

As the system grows, monoliths start showing cracks.

Common issues:

Code becomes tightly coupled
A small change requires redeploying the whole system
A bug in one module can affect everything
Scaling one feature means scaling the entire application
Large codebases slow down development

At this stage, teams start thinking about microservices.

Moving From Monolith to Microservices

Migrating to microservices is not a one-shot rewrite

It is a gradual process.

A common approach:

Identify a well-defined business area (e.g., Payments)
Extract it into a separate service
Expose it via an API
Repeat for other parts over time

This way, the monolith slowly shrinks while services grow.

Key Characteristics of Microservices

Well-designed microservices share some common traits:

Autonomous: Each service can be developed and deployed independently.
Business-focused: Services are designed around business needs, not technical layers.
Loosely coupled: Services communicate through APIs, not shared databases.
Independently scalable: Heavy-load services can be scaled without touching others.

Advantages of Microservices

Faster development: Small teams can work independently.
Better scalability: Only the required service is scaled.
Technology flexibility: Each service can use the most suitable tech stack.
Fault isolation: A failing service can be isolated using patterns like circuit breakers.
Reusability: Services can be reused across different applications.

When Do Microservices Make Sense?

Microservices are a good fit when:

The system is large and growing
Teams are becoming bottlenecks
Different parts scale very differently
Independent deployments are required
System reliability is critical

Final Thoughts

Microservices are about structuring systems around business capabilities, not just splitting code into smaller pieces.

Starting with a monolith and evolving into microservices is often the most practical path. The goal is not to follow trends, but to build systems that are maintainable, scalable, and reliable.

Microservices are not about complexity--they are about managing complexity correctly.

Decoding ACID Properties

Suman Prasad — Mon, 26 Jan 2026 17:25:50 GMT

Databases are used in systems where correctness really matters, like payment, bookings, inventory, user data, and more.

To make sure data stays reliable even under failures and heavy concurrency, databases follow a set of guarantees known as ACID:

Atomicity
Consistency
Isolation
Durability

Let’s understand each of them using realistic but simple examples, starting with what goes wrong when they are missing.

Atomicity

What does atomicity mean?

Atomicity ensures that a transaction is treated as one individual unit.

Either all its operations succeed, or none of them are applied.

What is the problem without Atomicity?

Imagine an online wallet system.

A transaction does two things:

Deduct Rs 500 from the user’s wallet
Add Rs 500 to the merchant’s wallet

Now, try to imagine that the user’s wallet is debited and the system crashes before crediting the merchant, then what will be the scenario? The scenario will be Users loses money, and the merchant never receives it. This partial update creates incorrect data.

Correct Behavior (With Atomicity)

With atomicity, if both updates succeed, then the transaction commits, and if any step fails, then everything is rolled back. So either both wallets are updated, or no wallet is changed at all. This guarantees correctness even during crashes or errors.

Consistency

In simple terms, data must always follow rules

What Consistency Means?

Consistency ensures that database rules are never violated. Every successful transaction moves the database from one valid state to another valid state.

What is the problem without Consistency?

Consider a library system in which the rule says, “ A book cannot be issued if available copies are zero. ”

Now try to imagine that

Available copies = 0
transaction still issues the book

So the result will be

Copies become -1
Data no longer makes sense

Correct Behavior (With Consistency)

The database checks rules (constraints) before committing. If the rule is violated, the transaction is rejected. So either the book is issued correctly, or the transaction fails, and the data remains unchanged. The database never allows invalid data.

Isolation

In simple terms, safe concurrency

What Does Isolation Mean?

Isolation ensures that multiple transactions running at the same time do not interfere with each other.

Each transaction behaves as if it were running alone.

What is the problem without Isolation?

Consider a concert ticket system that has a total seats 100 and two users try to book the last seat at the same time.

So, without isolation, both transactions read “ 1 seat available “ and both book successfully

Result

101 tickets sold
System oversells

Correct Behavior (With Isolation)

With isolation, the first transaction locks the seat and the second transaction waits; only one booking succeeds. So the other transaction either fails or sees updated data and stops. This prevents race conditions and data corruption.

Durability

In simple terms, data survives crashes

What Does Durability Mean?

Durability guarantees that once a transaction is committed, its changes will not be lost, even if the system crashes immediately after.

What is the problem without Durability?

Imagine placing an order on an e-commerce website.

Payment succeeds
Order confirmation is shown
Server crashes before data is saved to disk

After restart:

Order is missing
Payment exists but order does not.

This is not acceptable.

Correct Behavior (With Durability)

With durability changes are written to non-volatile storage (disk). Transaction logs are flushed before commit. On restart, the database replays logs and restores state. So even after a power failure, crash, or restart, the committed order still exists.

ACID properties are not independent; they support each other. Atomicity prevents partial updates, Consistency ensures rules are respected, Isolation protects concurrent execution, and Durability preserves committed data. Removing even one of them can lead to serious data issues.

Final Thoughts

ACID properties are not theoretical concepts

They solve real problems that appear in everyday systems under load, failures, and concurrency.

Modern databases handle most of this automatically, but as engineers, understanding ACID helps us:

Design better systems
Write safer transactions
Debug data issues confidently

References

Understanding Database Deadlocks and Their Resolution Methods

Suman Prasad — Wed, 21 Jan 2026 05:15:56 GMT

Database deadlocks are among the most challenging concurrency issues encountered in real-world production systems. While modern databases are designed to handle parallel workloads efficiently, deadlocks remain an unavoidable side effect of correct locking and isolation.

To build a resilient application, it’s crucial to understand how deadlocks form, how databases detect them, and how they are resolved.

This article breaks down database deadlocks from the ground up, covering causes, detection techniques, resolution strategies, and real-world database behaviors.

What is Database Deadlock?

A database deadlock occurs when two or more transactions block each other indefinitely, each waiting for locks held by the other, creating a circular dependency that prevents any of the transactions from proceeding.

How Deadlock Pattern look like?

Transaction A holds a lock on Resource X and waits for Resource Y
Transaction B holds a lock on Resource Y and waits for Resource X
A circular dependency forms, and progress stops completely

Without intervention from the database engine, these transactions would wait forever.

Example

-- Transaction 1 (T1)
BEGIN;
UPDATE orders SET status = 'CONFIRMED' WHERE order_id = 101;
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 50;
COMMIT;

-- Transaction 2 (T2)
BEGIN;
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 50;
UPDATE orders SET status = 'CONFIRMED' WHERE order_id = 101;
COMMIT;

Transaction 1 (T1) Locks orders(order_id = 101) and waits for inventory(product_id = 50) and Transaction 2 (T2) Locks inventory(product_id = 50) and waits for orders(order_id = 101)

Common Causes of Deadlocks in Practice

Inconsistent Lock Ordering

When different transactions acquire locks on the same resources in different orders. Enforcing a consistent order (e.g., always lock Table A before Table B) is a primary prevention strategy.

Real-world Example: In a banking system, one service updates customer details and then logs the change, while another service logs the action first and then updates the customer.

-- Transaction A
BEGIN;
UPDATE customers SET address = 'New Address' WHERE customer_id = 101;
UPDATE audit_logs SET action = 'ADDRESS_UPDATE' WHERE customer_id = 101;
COMMIT;

-- Transaction B
BEGIN;
UPDATE audit_logs SET reviewed = true WHERE customer_id = 101;
UPDATE customers SET last_updated = NOW() WHERE customer_id = 101;
COMMIT;

Transaction A locks customers, and Transaction B locks audit_logs. Each waits for the other, causing circular dependency.

Long-running Transactions

Transactions that hold locks for extended periods increase the probability of conflict with other transactions.

Real-world Example: In a reporting system, a transaction reads large datasets, performs heavy computation, and then updates a summary table (all within transactions).

BEGIN;
SELECT * FROM sales WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Application processes data for several seconds
UPDATE yearly_summary SET total_sales = 500000 WHERE year = 2024;
COMMIT;

Locks remain held during long processing, and other transactions block and form waiting chains. Deadlock probability increases under concurrency.

Lock Escalation

Databases may automatically convert many fine-grained locks (like row-level) into fewer coarse-grained locks (like table-level) for performance efficiency, which can unexpectedly block other transactions and create deadlocks.

Real-world Example: In a Warehouse management system, bulk updates on inventory rows cause the database to escalate row locks into a table lock.

BEGIN;
UPDATE inventory
SET last_checked = NOW()
WHERE warehouse_id = 5;
COMMIT;

Internally, what is happening is that Many row-level locks are acquired, the database escalates to a table-level lock, and other transactions attempting row updates are blocked.

Concurrent Transaction

BEGIN;
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 900;
COMMIT;

Unexpected blocking and circular waits may form.

Poorly Optimized Queries

Inefficient queries that perform large table or index scans can acquire locks, holding them for longer than necessary and increasing contention.

Real-world Example: In a customer support system, missing indexes cause full table scans during updates, locking more rows than necessary.

-- Problematic code
BEGIN;
UPDATE tickets
SET status = 'CLOSED'
WHERE created_at < '2023-01-01';
COMMIT;

It is performing a full table or index scan, a large number of acquired locks, and locks held longer than needed increase contention and lead to deadlocks.

Optimized Version

CREATE INDEX idx_tickets_created_at ON tickets(created_at);

BEGIN;
UPDATE tickets
SET status = 'CLOSED'
WHERE created_at < '2023-01-01';
COMMIT;

Foreign Key Constraints

Actions on a parent table might require the database to internally check or lock related child table records, creating hidden dependencies and potential lock chains.

-- Transaction A
BEGIN;
DELETE FROM documents WHERE doc_id = 200;
COMMIT;

-- Transaction B
BEGIN;
INSERT INTO permissions (doc_id, user_id, role)
VALUES (200, 10, 'EDITOR');
COMMIT;

Deleting a document requires checking related permissions through foreign key constraints, which introduce implicit locks and create lock dependencies that remain invisible in application code.

Deadlock Detection

The database detects deadlocks automatically using the most famous approach, wait-for-graph algorithm.

Wait-for Graph Algorithm

A wait-for graph is a directed graph used by databases to model lock dependencies between transactions.
Each node represents an active transaction.
Each directed edge (T₁ → T₂) means Transaction T₁ is waiting for a resource held by Transaction T₂.

Q. Why do databases use Wait-for Graphs?

Lock-based systems naturally create waiting relationships. Tracking these relationships visually makes deadlock detection efficient. A deadlock is present if and only if a cycle exists in the graph.

Q. How is the Graph Built?

When a transaction requests a lock that cannot be granted:
- The database adds an edge from the waiting transaction to the holding transaction.
The graph is dynamic and updates as locks are acquired or released.
Only blocked transactions participate in the graph.

Deadlock Detection Rule

The database regularly checks (usually every few seconds) all transactions currently waiting on locks.
It builds a wait-for graph showing which transaction blocks which other based on active resource requests and holdings.
Graph traversal algorithms then scan for cycles, declaring a deadlock when one is detected.

Detection Frequency by Database System

Different database systems use varying detection intervals to balance overhead with responsiveness.

PostgreSQL: Checks for deadlocks every 1 second by default after the deadlock_timeout period.
MySQL (InnoDB): Uses immediate detection for simple two-transaction deadlocks, but falls back to periodic checking every ~5 seconds for complex scenarios.
SQL Server: Runs deadlock detection every 5 seconds by default, but can drop to as low as 100 milliseconds under high contention

Deadlock Resolution

Once a deadlock is detected (for example, using the wait-for graph algorithm), the database must break the circular dependency.

To do this, the database first selects a victim transaction. The choice is made carefully to minimize system impact. Typically, the database prefers terminating the transaction that has performed the least amount of work, as rolling it back requires fewer resources. In many systems, newer transactions are also favored as victims under the assumption that older transactions are closer to completion. Some databases additionally support transaction priorities, allowing lower-priority or background tasks to be aborted before critical operations.

Once the victim is chosen, the database rolls back the transaction, releasing all locks held by it. This immediately allows the remaining blocked transactions to continue execution. The rollback preserves atomicity and ensures the database remains in a consistent state.

From an application perspective, deadlocks are not exceptional failures but expected concurrency events. Applications should be designed to catch deadlock errors and retry the transaction, often with a small randomized backoff to avoid repeated collisions. In complex workflows, partial rollbacks using savepoints may also be used to limit lost work, although full rollbacks are more common during deadlock resolution.

While careful transaction design can reduce the likelihood of deadlock, most modern databases rely on detection and resolution rather than strict prevention, as eliminating deadlocks entirely is impractical in high-concurrency environments. The key guarantee provided by the database is that, after resolution, progress resumes safely without violating consistency or isolation.

Deadlock Prevention Strategies

Although deadlocks cannot be completely eliminated in concurrent systems, their frequency and impact can be significantly reduced through careful transaction design and system-level practices. One of the most effective techniques is enforcing a consistent lock ordering across the application. When all transactions acquire locks on shared resources in the same sequence, circular wait conditions are avoided entirely, making deadlocks structurally impossible for those code paths. For example, if an application always updates the users table before the orders table, deadlocks caused by reversed lock ordering are avoided:

-- Consistent ordering (users → orders)
BEGIN;
UPDATE users SET last_login = NOW() WHERE user_id = 10;
UPDATE orders SET status = 'PROCESSED' WHERE order_id = 500;
COMMIT;

-- Problems arise only when different transactions reverse this order.

Another critical strategy is minimizing the scope and duration of transactions. Transactions that hold locks for long periods - especially while performing heavy computation, waiting for user input, or calling external services—dramatically increase contention. By keeping transactions short and limiting them strictly to database operations, locks are released quickly, reducing the chance of conflicts with other concurrent transactions.

-- Read and process outside the transaction
SELECT * FROM reports WHERE year = 2024;

-- Short write transaction
BEGIN;
UPDATE report_summary SET status = 'READY' WHERE year = 2024;
COMMIT;

Query performance also plays a major role in deadlock prevention. Poorly optimized queries that scan large portions of tables or indexes tend to acquire more locks and hold them longer than necessary. Proper indexing, selective queries, and efficient execution plans help reduce lock footprints and improve overall concurrency. Faster queries mean shorter lock lifetimes, which directly lowers deadlock probability.

-- Index to avoid full table scan
CREATE INDEX idx_tickets_status ON tickets(status);

BEGIN;
UPDATE tickets SET status = 'CLOSED' WHERE status = 'RESOLVED';
COMMIT;

Understanding implicit locking behavior, particularly with foreign key constraints, is equally important. Operations on parent tables often require the database to internally lock related child records to maintain referential integrity. When these hidden dependencies are not accounted for, transactions may unintentionally acquire locks in conflicting orders. Designing transactions with awareness of these relationships helps prevent unexpected lock chains.

-- Parent table update
BEGIN;
UPDATE documents SET title = 'Final Draft' WHERE doc_id = 200;
-- Implicitly checks/locks permissions via FK
COMMIT;

Finally, applications should be built with the assumption that deadlocks can still occur under extreme concurrency. Implementing safe retry mechanisms with backoff ensures that when a deadlock does happen, it is handled transparently without user-facing errors. In practice, the most robust systems combine thoughtful transaction design with resilient retry logic, treating deadlocks as a manageable aspect of concurrency rather than a critical failure.

def execute_with_retry(txn_func, retries=3):
    for attempt in range(retries):
        try:
            return txn_func()
        except DeadlockError:
            if attempt == retries - 1:
                raise
            time.sleep(random.uniform(0.1, 0.5))

How Specific Database Handle Deadlock?

Different database engines handle detection and resolution differently, based on their design goals and performance trade-offs. Understanding these differences is important when tuning systems or debugging production issues.

SQL Server

SQL Server uses a lock monitor thread that periodically scans for deadlocks by analyzing wait relationships between sessions. When a deadlock is found, SQL Server chooses a victim based on deadlock priority and estimated rollback cost.

-- Set deadlock priority
SET DEADLOCK_PRIORITY LOW;

Example deadlock scenario:

BEGIN TRAN;
UPDATE employees SET role = 'Senior' WHERE emp_id = 77;
-- waits for payroll
UPDATE payroll SET salary = salary + 10000 WHERE emp_id = 77;
COMMIT;

When SQL Server resolves a deadlock, the victim transaction is rolled back with an error:

Transaction (Process ID xx) was deadlocked on lock resources and has been chosen as the deadlock victim.

MySQL

MySQL’s InnoDB engine takes a more aggressive approach to deadlock detection. For simple deadlock patterns, detection happens immediately when a lock request is made. For more complex cases, InnoDB falls back to timeout-based detection.

-- Enable deadlock detection and logging
SET GLOBAL innodb_deadlock_detect = ON; -- Usually default, use only if disabled
SET GLOBAL innodb_lock_wait_timeout = 50; -- The default value, can be adjusted if needed
SET GLOBAL innodb_print_all_deadlocks = ON; -- **Recommended** to log all deadlocks to the error log

Example deadlock in MySQL

-- Transaction 1
START TRANSACTION;
UPDATE wallets SET balance = balance - 500 WHERE user_id = 42;
INSERT INTO ledger (user_id, amount, type) VALUES (42, -500, 'DEBIT');

-- Transaction 2
START TRANSACTION;
INSERT INTO ledger (user_id, amount, type) VALUES (42, 500, 'CREDIT');
UPDATE wallets SET balance = balance + 500 WHERE user_id = 42;

InnoDB detects the deadlock, selects the transaction with lower rollback cost, and aborts it with:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

PostgreSQL

PostgreSQL uses a lazy deadlock detection approach. Instead of checking for deadlocks immediately, it waits for a configurable timeout period before initiating detection. The assumption is that most lock waits are short-lived and will resolve naturally without requiring expensive graph analysis.

-- Configure deadlock detection behavior
SET deadlock_timeout = '1s';
SET log_lock_waits = on;

When a transaction waits longer than deadlock_timeout, PostgreSQL builds a wait-for graph and checks for cycles. If a deadlock is detected, one transaction is aborted and the others continue.

-- Example deadlock scenario
BEGIN;
UPDATE seats SET status = 'HELD' WHERE seat_id = 12;
-- waits for another transaction
UPDATE payments SET amount = amount + 250 WHERE booking_id = 9001;
COMMIT;

If PostgreSQL detects a deadlock, it terminates one transaction with an error like:

ERROR: deadlock detected

This approach minimizes CPU overhead under normal workloads but may result in slightly longer waits before resolution.

Final Thoughts

Database deadlocks are a natural result of concurrent access in multi-user systems, not a database flaw. Modern databases detect and resolve them automatically, but well-designed applications reduce their frequency through consistent locking, short transactions, and proper indexing. Ultimately, treating deadlocks as expected events and handling them with retry logic is key to building reliable and scalable systems.