Chapter 8: Camera-Based Perception

A camera is the richest sensor available on the platform: it captures a dense 2-D projection of the 3-D scene, providing texture, colour, and shape information that no range or reflectance sensor can match. The challenge is converting that rich but noisy image into actionable information — obstacle location, target detection, or navigational cues — within the computation budget of a Raspberry Pi.

8.1 Camera Geometry

The OV5647 camera module on the platform has:

Pinhole model. A 3-D point $(X, Y, Z)$ in the camera frame projects to pixel coordinates $(u, v)$:

$u = f_x \cdot X/Z + c_x$, $\quad v = f_y \cdot Y/Z + c_y$

where $(f_x, f_y)$ are focal lengths in pixels and $(c_x, c_y)$ is the principal point (typically near image centre). For 640×480 at 62° HFOV, $f_x \approx 640 / (2\tan(31°)) \approx 533$ pixels.

Estimating real-world distance from pixel height. If an object of known height $H$ metres appears as $h$ pixels tall at the bottom of the frame: $Z = f_y \cdot H / h$

This is useful for detecting known-size targets such as coloured balls or fiducial markers.

8.2 Capturing Frames with OpenCV

import cv2

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH,  640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
cap.set(cv2.CAP_PROP_FPS, 30)

ret, frame = cap.read()   # frame is a 480×640×3 NumPy array (BGR)

OpenCV reads frames in BGR order (not RGB). Convert for display with matplotlib using frame[:, :, ::-1].

On a Raspberry Pi 4, capturing and processing a 640×480 frame takes approximately 5–30 ms depending on the operations performed, giving throughput of 30–60 FPS for simple operations.

8.3 Colour-Based Detection

Colour detection in HSV (Hue-Saturation-Value) space is more robust than in BGR because hue is relatively invariant to illumination changes.

hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Orange target: H in [10, 25], S > 150, V > 80
lower = np.array([10, 150, 80])
upper = np.array([25, 255, 255])
mask  = cv2.inRange(hsv, lower, upper)

The mask is a binary image (255 where the colour condition is met, 0 elsewhere). Find the largest connected component (the target blob):

contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL,
                                cv2.CHAIN_APPROX_SIMPLE)
if contours:
    largest = max(contours, key=cv2.contourArea)
    M = cv2.moments(largest)
    cx = int(M["m10"] / M["m00"])   # centroid x in pixels
    cy = int(M["m01"] / M["m00"])   # centroid y in pixels

The centroid $(c_x, c_y)$ in pixel coordinates can be converted to a bearing angle: $\alpha = \arctan((c_x - c_{image}) / f_x)$

where $c_{image} = 320$ for a 640-pixel-wide image.

8.4 Edge Detection and Lane / Line Detection

For lane detection on a marked floor, Canny edge detection followed by Hough line transform is effective:

gray  = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
blur  = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blur, threshold1=50, threshold2=150)
lines = cv2.HoughLinesP(edges, rho=1, theta=np.pi/180,
                         threshold=50, minLineLength=40,
                         maxLineGap=10)

HoughLinesP returns line segments as $(x_1, y_1, x_2, y_2)$ tuples. Lines with angles near 90° (vertical in image) are lateral walls. Lines near 0° are floor markings running away from the robot.

A useful heuristic: average the slopes of detected lines, compare to the target slope (0 for going straight), and feed the error into a heading PID controller. This can serve as a camera-based substitute for line following on floor with visible markings.

8.5 ArUco Fiducial Markers

ArUco markers are black-and-white square patterns that OpenCV can detect and estimate a 6-DOF pose from. They are useful as landmarks for localisation correction:

aruco_dict   = cv2.aruco.getPredefinedDictionary(cv2.aruco.DICT_4X4_50)
detector_params = cv2.aruco.DetectorParameters()
detector = cv2.aruco.ArucoDetector(aruco_dict, detector_params)
corners, ids, _ = detector.detectMarkers(gray)

If the camera is calibrated (intrinsic matrix $K$ and distortion coefficients known), cv2.aruco.estimatePoseSingleMarkers returns the 3-D position and orientation of each detected marker in the camera frame. Combined with the sensor-head pan angle and the robot’s odometry pose, this gives an absolute robot position in the world frame — an odometry correction that Chapter 14 integrates using a Kalman filter.

8.6 Obstacle Detection from Camera

For detecting obstacles that do not have a known colour or pattern, background subtraction or depth from motion can be used. A simpler approach suitable for the platform:

Bottom-of-frame occupancy. Obstacles close to the robot appear near the bottom of the image. Segment the bottom third of the image, compute the fraction of pixels that differ from the expected floor colour (a calibrated floor-colour mask), and treat a high fraction as an obstacle signal.

Optical flow. Compute the dense optical flow between consecutive frames using Farneback’s method:

flow = cv2.calcOpticalFlowFarneback(prev_gray, curr_gray,
    None, 0.5, 3, 15, 3, 5, 1.2, 0)

A stationary obstacle has zero flow even when the robot moves, while the background (floor) has non-zero flow proportional to robot velocity. This can discriminate stationary obstacles from moving ones.

8.7 Computing Budget

On a Raspberry Pi 4, typical per-frame processing times at 640×480:

For real-time control at 20 Hz (50 ms budget per loop), a pipeline of frame capture + colour masking + simple contour detection fits comfortably. ArUco detection can run at 10 Hz without hurting the control loop if run in a separate thread.


Navigation: