Chapter 8 — Memory-Mapped I/O: Talking to Hardware Through Memory

Two Ways to Talk to Hardware

On x86 systems, hardware can be addressed in two ways:

Port-mapped I/O (PMIO): The CPU has a separate 64K address space for hardware I/O ports, accessed via the in/out instructions. Legacy hardware (keyboard, timer, parallel port) uses this. You can see these in /proc/ioports. This is Ring 0 only.

Memory-mapped I/O (MMIO): Hardware registers are mapped into the normal physical memory address space. The CPU reads/writes hardware registers exactly like RAM — with normal load/store instructions. This is the dominant method for modern hardware.

MMIO is what makes sysfs hardware interaction fast, what makes GPU programming possible, and what mmap() is designed for.

Physical Memory Layout

Every byte of physical address space is assigned to something:

0x00000000 - 0x0009FFFF : System RAM (first 640KB)
0x000A0000 - 0x000BFFFF : VGA frame buffer (legacy)
0x000C0000 - 0x000FFFFF : ROM/BIOS area
0x00100000 - 0x7EFFFFFF : System RAM (most of it)
0x80000000 - 0x8FFFFFFF : PCI device registers (GPU, etc.)
0xFD000000 - 0xFDFFFFFF : Intel GPU MMIO registers
0xFED00000 - 0xFED00FFF : HPET (high precision timer)
0xFEE00000 - 0xFEE00FFF : Local APIC registers

This is what /proc/iomem shows. The regions that are not “System RAM” are hardware registers — writing to address 0xFD000000 doesn’t go to RAM, it goes directly to the Intel GPU’s register file.

cat /proc/iomem | grep -v "System RAM" | head -20

How Drivers Use MMIO

A PCIe device declares its MMIO regions in its BAR (Base Address Registers) in PCI configuration space. The kernel reads these at boot, allocates physical address space, and makes the mappings. The driver then calls ioremap() to get a virtual address it can use:

// Simplified driver code (C, for illustration)
void __iomem *base = ioremap(pci_resource_start(dev, 0), pci_resource_len(dev, 0));

// Write to hardware register at offset 0x100
writel(0xDEADBEEF, base + 0x100);

// Read from hardware register at offset 0x200
u32 status = readl(base + 0x200);

The writel/readl macros ensure proper memory barriers — hardware registers often have ordering requirements that normal CPU memory does not.

mmap(): Mapping Memory into Your Process

mmap() is the syscall that brings MMIO into Python’s reach. It maps a range of a file (or device file) into your process’s virtual address space. Once mapped, you read/write memory to read/write the underlying file or device.

Mapping a Regular File

import mmap, os

# mmap a file — changes to the map write through to the file
with open("data.bin", "r+b") as f:
    mm = mmap.mmap(f.fileno(), 0)   # map entire file

    data = mm[0:4]                   # read first 4 bytes
    mm[0:4] = b"\x00\x00\x00\x00"  # write (modifies file!)
    mm.close()

This is faster than read()/write() for large files because the kernel maps pages directly — no copying between kernel and userspace buffers.

Mapping /dev/mem for Hardware Access

/dev/mem is a character device that represents the entire physical address space. You can use mmap() on it to access physical memory regions, including hardware registers.

import mmap, os

# Read the first 64KB of physical memory (contains BIOS data, etc.)
# Requires root AND /dev/mem access must not be restricted
fd = os.open("/dev/mem", os.O_RDONLY)
mem = mmap.mmap(fd, 65536, mmap.MAP_SHARED, mmap.PROT_READ, offset=0)

# Read the x86 BIOS date string (if present at 0xFFFF5)
# Note: This requires a large enough mapping and appropriate offset
mem.close()
os.close(fd)

Security note: Most modern kernels restrict /dev/mem access via CONFIG_STRICT_DEVMEM. You can check:

cat /boot/config-$(uname -r) | grep STRICT_DEVMEM

Mapping a PCI Device’s MMIO Region

A driver can expose its MMIO region via a resource file in sysfs. You can then mmap() it directly, bypassing the driver.

# Find the resource files for a PCI device
ls /sys/bus/pci/devices/0000:00:02.0/resource*
# resource   resource0   resource2   resource0_wc

resource0, resource1, etc. correspond to the device’s BARs.

import mmap, os

# Map PCI BAR0 of a device
# resource0 is the MMIO region for BAR0
dev_path = "/sys/bus/pci/devices/0000:00:02.0/resource0"

try:
    fd = os.open(dev_path, os.O_RDWR | os.O_SYNC)
    stat = os.fstat(fd)
    size = stat.st_size

    # Map the BAR
    mmapped = mmap.mmap(fd, size, mmap.MAP_SHARED,
                        mmap.PROT_READ | mmap.PROT_WRITE)

    # Read register at offset 0
    import struct
    reg = struct.unpack_from("<I", mmapped, 0)[0]
    print(f"Register 0: 0x{reg:08x}")

    mmapped.close()
    os.close(fd)
except PermissionError:
    print("Need root or correct permissions")

Memory Barriers and Ordering

Hardware registers are not like RAM. Writes to hardware registers must complete in order — you can’t let the CPU or compiler reorder them. This is why drivers use writel/readl instead of plain pointer dereferences in C.

In Python, when using mmap for hardware access, the OS memory mapping already handles this through the MAP_SHARED flag and the msync() call if needed. But be aware:

# Flush writes to a mmap'd region
mm.flush()   # calls msync() internally

The GPU Example: Why mmap Matters for Performance

GPUs have gigabytes of their own memory (VRAM). The GPU driver maps this VRAM into system memory address space via MMIO BARs. When you write data to a NumPy array that PyTorch or TensorFlow has allocated in GPU-accessible memory, you’re using mmap under the hood — the CPU is writing directly into the GPU’s memory through an MMIO mapping.

This is also why GPU memory transfers are measured in GB/s — DMA and MMIO together make it a bulk memory operation, not a device I/O operation.

/dev/shm and Shared Memory

While not MMIO, mmap with MAP_SHARED and MAP_ANONYMOUS is also how shared memory works between Python processes:

import mmap

# Create anonymous shared memory (no file backing)
shm = mmap.mmap(-1, 4096)   # -1 = no file, MAP_ANONYMOUS

# Write from one process, read from another (with multiprocessing)
shm[0:4] = b"ping"

This is how multiprocessing.shared_memory is implemented internally.


Previous: Chapter 7 — Interrupts and DMA

Next: Chapter 9 — udev and Hotplug

Back to Table of Contents