An autonomous vehicle’s ability to drive safely depends entirely on its ability to perceive the world. This chapter explores the sensor technologies that serve as the vehicle’s eyes, ears, and spatial awareness — and the trade-offs between them.
Modern autonomous vehicles use a combination of complementary sensors, each with distinct strengths and weaknesses. No single sensor is sufficient for safe autonomous driving; redundancy and diversity are essential.
| Sensor | Range | Precision | Works in Dark | Works in Rain/Fog | Cost | Data Type |
|---|---|---|---|---|---|---|
| Camera | 50–250 m | High (2D) | Poor | Moderate | Low | Color images |
| LiDAR | 10–300 m | Very High (3D) | Yes | Moderate | Medium–High | 3D point clouds |
| Radar | 10–300 m | Moderate | Yes | Yes | Low | Velocity + range |
| Ultrasonic | 0.1–5 m | Moderate | Yes | Yes | Very Low | Range |
| IMU | N/A | High | Yes | Yes | Low–Medium | Acceleration, rotation |
| GPS/GNSS | Global | 1–10 m | Yes | Yes | Low | Global position |
Cameras are the most information-rich sensor. A single camera frame contains millions of pixels with color, texture, and contextual information that no other sensor can match.
A camera sensor converts photons into electrical signals using a CMOS (Complementary Metal-Oxide-Semiconductor) or CCD (Charge-Coupled Device) imaging chip. Each pixel measures the intensity of light at its location, typically across three color channels (RGB). Modern automotive cameras operate at resolutions from 1 to 8 megapixels, capturing frames at 30–60 Hz.
Autonomous vehicles typically use multiple cameras with overlapping fields of view to achieve 360° coverage:
Tesla uses 8 cameras for its vision-only system. Waymo’s 6th-generation sensor suite includes 13 cameras providing overlapping 360° coverage with both near-field and far-field resolution.
A stereo camera pair (two cameras separated by a known baseline distance) can estimate depth through triangulation, similar to human binocular vision. Given a point visible in both cameras, the disparity (pixel offset between left and right images) is inversely proportional to depth:
\[d = \frac{f \cdot B}{\text{disparity}}\]where $f$ is the focal length and $B$ is the baseline distance.
Stereo vision is computationally expensive and struggles with textureless surfaces, but provides dense depth maps without active illumination.
Strengths:
Limitations:
LiDAR is often considered the “gold standard” sensor for autonomous driving due to its precise 3D measurement capabilities. It works by emitting laser pulses and measuring the time for each pulse to return after bouncing off an object.
A LiDAR sensor emits rapid pulses of near-infrared laser light (typically at wavelengths of 905 nm or 1550 nm). For each pulse, the sensor measures:
where $c$ is the speed of light and $t$ is the round-trip time.
By scanning thousands of pulses per second across different angles, LiDAR builds a 3D point cloud — a collection of (x, y, z, intensity) points representing the surfaces in the environment.
Mechanical spinning LiDAR: The original design, used by Velodyne. A rotating assembly spins the laser emitters 360°. Provides excellent 360° coverage but has moving parts that can wear out. The iconic Velodyne VLP-16 (16 channels) and VLP-64 are classic examples.
Solid-state LiDAR: No moving parts. Uses techniques like MEMS mirrors, optical phased arrays (OPA), or flash illumination. More compact, cheaper, and more reliable. Companies: Luminar (MEMS-based), Innoviz, Hesai.
FMCW LiDAR: Frequency-Modulated Continuous Wave LiDAR measures the frequency shift of reflected light, providing instantaneous velocity in addition to range. This is a significant advantage for detecting and tracking moving objects. Aeva and Aurora are developing FMCW LiDAR systems.
Modern automotive LiDAR sensors typically offer:
Strengths:
Limitations:
One of the most contentious debates in autonomous driving: is LiDAR necessary, or can cameras alone suffice?
Tesla’s position: Cameras only. CEO Elon Musk has called LiDAR a “crutch” and argued that since humans drive with vision alone, machines should be able to as well. Tesla removed radar from its vehicles in 2021 and ultrasonic sensors in 2022, relying entirely on 8 cameras.
Waymo’s position: Multi-sensor (cameras + LiDAR + radar). CEO Dmitri Dolgov has argued that LiDAR’s direct 3D measurement provides a critical safety layer that cameras cannot match. Waymo develops its own custom LiDAR sensors optimized for cost and performance.
The technical truth is nuanced. Cameras provide richer semantic information; LiDAR provides more reliable geometric information. The best-performing systems as of 2026 use both. However, the rapid improvement of vision-only systems (especially with end-to-end learning) is narrowing the gap.
Radar uses radio waves to detect objects, measure their distance, and — crucially — their radial velocity via the Doppler effect.
A radar transmitter emits radio waves (typically at 77 GHz for automotive long-range radar, 24 GHz for short-range). When these waves hit an object, they are reflected back. The system measures:
where $\Delta f$ is the frequency shift, $c$ is the speed of light, and $f_0$ is the transmit frequency.
The latest generation of automotive radar — 4D imaging radar — represents a significant advancement. By using large antenna arrays (MIMO configurations with hundreds of virtual channels), 4D imaging radar can resolve objects in range, velocity, azimuth, and elevation. This produces radar “point clouds” that begin to approach LiDAR-like spatial resolution, while retaining radar’s all-weather capability and direct velocity measurement.
Companies developing 4D imaging radar: Arbe, Continental, ZF, Vayyar.
Strengths:
Limitations:
Ultrasonic sensors emit high-frequency sound waves (typically 40–48 kHz) and measure the echo return time. They are the simplest and cheapest ranging sensor.
Primarily used for:
Note: Tesla removed ultrasonic sensors from its vehicles in late 2022, replacing their functionality with camera-based depth estimation. Most other manufacturers continue to use them.
An IMU combines:
High-end IMUs also include magnetometers (measuring Earth’s magnetic field for heading).
MEMS (Micro-Electro-Mechanical Systems) accelerometers use a tiny proof mass suspended by springs. When the device accelerates, the proof mass moves relative to the housing, and the displacement is measured capacitively. MEMS gyroscopes use the Coriolis effect on a vibrating structure to measure rotation.
The IMU is critical for:
The main limitation is drift: small measurement errors accumulate over time, causing position estimates to diverge. This is why IMU is always fused with other sensors (GPS, LiDAR, cameras).
Global Navigation Satellite Systems (GNSS), including GPS (US), GLONASS (Russia), Galileo (EU), and BeiDou (China), provide global position by triangulating signals from multiple satellites.
Consumer GPS provides accuracy of ~2–5 meters, which is insufficient for lane-level positioning.
Real-Time Kinematic (RTK) GPS uses a fixed base station with a known position to provide correction signals, achieving accuracy of 1–2 centimeters. RTK is used in some autonomous vehicle systems for precise localization, though it requires infrastructure (base stations) and clear sky visibility.
No single sensor provides a complete picture. The real power of an AV’s perception comes from sensor fusion — combining data from all sensors to create a unified, robust understanding of the environment.
Consider a scenario: a car is approaching an intersection in light rain at dusk.
By fusing all these inputs, the system achieves a reliable understanding that no single sensor could provide alone. We will explore sensor fusion algorithms in detail in Chapter 4.
As a concrete example of a production sensor suite, Waymo’s 6th-generation system (deployed commercially starting February 2026) includes:
The entire suite is designed for redundancy: if any single sensor fails, the remaining sensors can still provide sufficient information for safe operation. The 6th-generation system features significantly improved resolution, range, and field of view compared to previous generations, while reducing the number of individual sensors (from 29 to 23) through better integration.
Processing sensor data in real time requires significant computational power. Modern autonomous vehicles use:
The compute requirements are enormous: a typical AV generates 1–2 TB of raw sensor data per hour. Processing this data — running neural networks for detection, tracking, prediction, and planning — requires 50–500+ TOPS of AI compute, depending on the approach.
The sensor suite is the foundation of an autonomous vehicle. Each sensor brings unique strengths:
The next chapter explores how computer vision algorithms extract meaning from this raw sensor data — turning pixels and point clouds into objects, lanes, and traffic lights.
| ← Previous: Introduction to Autonomous Vehicles | Next: Computer Vision and Deep Learning → |