Chapter 7 — Interrupts and DMA: How Hardware Gets CPU Attention

The Problem of Waiting

Imagine you’re writing data to a disk. The disk is slow — a write might take 10 milliseconds. The CPU could sit in a loop checking “is it done yet?” every microsecond (polling), wasting 10,000 CPU cycles doing nothing useful.

Linux uses two mechanisms to avoid this waste: interrupts and DMA.

Interrupts: Hardware Tapping the CPU’s Shoulder

An interrupt is a hardware signal from a device to the CPU that says “I need attention.” When the interrupt fires, the CPU:

  1. Pauses whatever it was doing
  2. Saves its current state (registers, program counter)
  3. Jumps to the interrupt handler (a function in the kernel driver)
  4. The handler reads data from or writes data to the hardware
  5. CPU restores state and resumes the interrupted program

This means your Python code runs uninterrupted most of the time. When a network packet arrives or a key is pressed, the CPU briefly handles it and returns. From your program’s perspective, it never happened.

CPU running your Python code
    │
    ├─────── Network packet arrives ──→ CPU pauses
    │                                   CPU runs network interrupt handler
    │                                   Packet stored in kernel buffer
    │                                   CPU resumes Python code
    │
    ├─────── Key press ───────────────→ CPU pauses
    │                                   CPU runs keyboard handler
    │                                   Keycode stored in input queue
    │                                   CPU resumes Python code
    │
    └─ (Python code never noticed)

Types of Interrupts

Hardware IRQs — physical lines from devices to the interrupt controller:

Software interrupts (softirqs and tasklets) — deferred work scheduled by hardware interrupt handlers. Used for networking (packet processing), block I/O completion, etc.

The Interrupt Controller

Devices don’t connect directly to the CPU. They connect to an interrupt controller (APIC on x86), which aggregates all interrupt lines and delivers them to CPU cores in a managed way.

On SMP systems (multi-core), the kernel distributes interrupts across CPUs using the I/O APIC and LAPIC (local APIC per core). You can see and control this via /proc/irq/.

Viewing Interrupts

# Real-time interrupt counts per CPU
watch -n 1 cat /proc/interrupts

# Interrupts for a specific device (e.g., eth0)
grep eth0 /proc/interrupts
# Which CPU handles which interrupt
cat /proc/irq/24/smp_affinity     # bitmask of allowed CPUs
cat /proc/irq/24/smp_affinity_list  # human-readable: "0-3"

To move an IRQ to a specific CPU (useful for latency tuning):

echo 2 | sudo tee /proc/irq/24/smp_affinity   # pin to CPU1

Interrupt Latency

The time from interrupt fired → handler runs is interrupt latency. Standard Linux has latencies of tens to hundreds of microseconds. For real-time applications (audio, industrial control), you need the PREEMPT_RT patch which reduces this dramatically.

DMA: Hardware Accessing Memory Directly

DMA (Direct Memory Access) lets hardware devices read from and write to RAM without CPU involvement. The CPU sets up a DMA transfer and does other work while the hardware moves data.

Without DMA:

CPU reads byte from disk controller register
CPU writes byte to RAM
CPU reads next byte from disk controller...
(repeat 4096 times for one page)

With DMA:

CPU tells DMA controller: "move 4096 bytes from disk controller to RAM at 0x12345678"
CPU does other work
DMA controller moves all 4096 bytes autonomously
DMA controller fires interrupt: "done"
CPU processes the result

DMA is why modern systems can saturate 10 Gbit/s network cards, NVMe SSDs at 7 GB/s, and GPUs at hundreds of GB/s — the CPU would be the bottleneck otherwise.

DMA in the Linux Kernel

The kernel has a DMA API that drivers use to:

  1. Allocate DMA-capable memory (physically contiguous, below 4GB on legacy hardware)
  2. Program the device with the physical address
  3. Handle completion interrupts

As a Python developer you never program DMA directly. But understanding it explains:

Viewing DMA Channels

Legacy ISA DMA channels (rarely used today):

cat /proc/dma

Modern DMA usage is per-driver and not exposed in a single place. You can see DMA buffers in:

cat /proc/iomem | grep -i dma

The Complete Picture: An Interrupt-Driven I/O Example

Let’s trace a recv() call — receiving data from a TCP socket:

1. Ethernet frame arrives at NIC
2. NIC stores frame in its receive ring buffer (DMA into RAM, no CPU involved)
3. NIC fires an interrupt: "new data in ring"
4. CPU runs NIC interrupt handler (top half, fast)
5. Handler schedules a softirq for packet processing (bottom half, deferred)
6. CPU returns from interrupt, resumes other work
7. Softirq runs: kernel processes Ethernet → IP → TCP layers
8. Data placed in socket receive buffer
9. If your process is blocked on recv(), kernel wakes it up
10. Your Python code gets the bytes

Steps 2–8 happen completely outside your Python code, driven by interrupts and DMA.

Interrupt Coalescing

Modern high-speed NICs use interrupt coalescing: instead of firing an interrupt for every packet (which would overwhelm the CPU at 10 Gbit/s), they batch multiple packets and fire one interrupt. This trades latency for throughput.

You can tune this with ethtool:

ethtool -c eth0       # show current coalescing settings
ethtool -C eth0 rx-usecs 50  # fire interrupt every 50µs max

Lower coalescing = lower latency, higher CPU usage. Higher coalescing = higher latency, lower CPU usage.

This is why high-frequency trading systems tune their NICs and why low-latency audio systems need careful interrupt configuration.

What This Means for Python Performance

When your Python code seems to be waiting on I/O, it’s usually waiting for one of these:

asyncio, select, and epoll all work by telling the kernel “wake me up when this file descriptor has data” and then sleeping. The kernel wakes them up in response to hardware interrupts. This is why async Python is more efficient than threads for I/O — one thread can manage thousands of hardware events through a single epoll wait.


Previous: Chapter 6 — /proc

Next: Chapter 8 — Memory-Mapped I/O

Back to Table of Contents