On x86 systems, hardware can be addressed in two ways:
Port-mapped I/O (PMIO): The CPU has a separate 64K address space for hardware I/O ports, accessed via the in/out instructions. Legacy hardware (keyboard, timer, parallel port) uses this. You can see these in /proc/ioports. This is Ring 0 only.
Memory-mapped I/O (MMIO): Hardware registers are mapped into the normal physical memory address space. The CPU reads/writes hardware registers exactly like RAM — with normal load/store instructions. This is the dominant method for modern hardware.
MMIO is what makes sysfs hardware interaction fast, what makes GPU programming possible, and what mmap() is designed for.
Every byte of physical address space is assigned to something:
0x00000000 - 0x0009FFFF : System RAM (first 640KB)
0x000A0000 - 0x000BFFFF : VGA frame buffer (legacy)
0x000C0000 - 0x000FFFFF : ROM/BIOS area
0x00100000 - 0x7EFFFFFF : System RAM (most of it)
0x80000000 - 0x8FFFFFFF : PCI device registers (GPU, etc.)
0xFD000000 - 0xFDFFFFFF : Intel GPU MMIO registers
0xFED00000 - 0xFED00FFF : HPET (high precision timer)
0xFEE00000 - 0xFEE00FFF : Local APIC registers
This is what /proc/iomem shows. The regions that are not “System RAM” are hardware registers — writing to address 0xFD000000 doesn’t go to RAM, it goes directly to the Intel GPU’s register file.
cat /proc/iomem | grep -v "System RAM" | head -20
A PCIe device declares its MMIO regions in its BAR (Base Address Registers) in PCI configuration space. The kernel reads these at boot, allocates physical address space, and makes the mappings. The driver then calls ioremap() to get a virtual address it can use:
// Simplified driver code (C, for illustration)
void __iomem *base = ioremap(pci_resource_start(dev, 0), pci_resource_len(dev, 0));
// Write to hardware register at offset 0x100
writel(0xDEADBEEF, base + 0x100);
// Read from hardware register at offset 0x200
u32 status = readl(base + 0x200);
The writel/readl macros ensure proper memory barriers — hardware registers often have ordering requirements that normal CPU memory does not.
mmap() is the syscall that brings MMIO into Python’s reach. It maps a range of a file (or device file) into your process’s virtual address space. Once mapped, you read/write memory to read/write the underlying file or device.
import mmap, os
# mmap a file — changes to the map write through to the file
with open("data.bin", "r+b") as f:
mm = mmap.mmap(f.fileno(), 0) # map entire file
data = mm[0:4] # read first 4 bytes
mm[0:4] = b"\x00\x00\x00\x00" # write (modifies file!)
mm.close()
This is faster than read()/write() for large files because the kernel maps pages directly — no copying between kernel and userspace buffers.
/dev/mem is a character device that represents the entire physical address space. You can use mmap() on it to access physical memory regions, including hardware registers.
import mmap, os
# Read the first 64KB of physical memory (contains BIOS data, etc.)
# Requires root AND /dev/mem access must not be restricted
fd = os.open("/dev/mem", os.O_RDONLY)
mem = mmap.mmap(fd, 65536, mmap.MAP_SHARED, mmap.PROT_READ, offset=0)
# Read the x86 BIOS date string (if present at 0xFFFF5)
# Note: This requires a large enough mapping and appropriate offset
mem.close()
os.close(fd)
Security note: Most modern kernels restrict /dev/mem access via CONFIG_STRICT_DEVMEM. You can check:
cat /boot/config-$(uname -r) | grep STRICT_DEVMEM
A driver can expose its MMIO region via a resource file in sysfs. You can then mmap() it directly, bypassing the driver.
# Find the resource files for a PCI device
ls /sys/bus/pci/devices/0000:00:02.0/resource*
# resource resource0 resource2 resource0_wc
resource0, resource1, etc. correspond to the device’s BARs.
import mmap, os
# Map PCI BAR0 of a device
# resource0 is the MMIO region for BAR0
dev_path = "/sys/bus/pci/devices/0000:00:02.0/resource0"
try:
fd = os.open(dev_path, os.O_RDWR | os.O_SYNC)
stat = os.fstat(fd)
size = stat.st_size
# Map the BAR
mmapped = mmap.mmap(fd, size, mmap.MAP_SHARED,
mmap.PROT_READ | mmap.PROT_WRITE)
# Read register at offset 0
import struct
reg = struct.unpack_from("<I", mmapped, 0)[0]
print(f"Register 0: 0x{reg:08x}")
mmapped.close()
os.close(fd)
except PermissionError:
print("Need root or correct permissions")
Hardware registers are not like RAM. Writes to hardware registers must complete in order — you can’t let the CPU or compiler reorder them. This is why drivers use writel/readl instead of plain pointer dereferences in C.
In Python, when using mmap for hardware access, the OS memory mapping already handles this through the MAP_SHARED flag and the msync() call if needed. But be aware:
# Flush writes to a mmap'd region
mm.flush() # calls msync() internally
GPUs have gigabytes of their own memory (VRAM). The GPU driver maps this VRAM into system memory address space via MMIO BARs. When you write data to a NumPy array that PyTorch or TensorFlow has allocated in GPU-accessible memory, you’re using mmap under the hood — the CPU is writing directly into the GPU’s memory through an MMIO mapping.
This is also why GPU memory transfers are measured in GB/s — DMA and MMIO together make it a bulk memory operation, not a device I/O operation.
While not MMIO, mmap with MAP_SHARED and MAP_ANONYMOUS is also how shared memory works between Python processes:
import mmap
# Create anonymous shared memory (no file backing)
shm = mmap.mmap(-1, 4096) # -1 = no file, MAP_ANONYMOUS
# Write from one process, read from another (with multiprocessing)
shm[0:4] = b"ping"
This is how multiprocessing.shared_memory is implemented internally.
Previous: Chapter 7 — Interrupts and DMA