A camera is a projective sensor, so each pixel measures light intensity and colour arriving from a specific direction through the camera centre. It does not measure depth, so a single image cannot distinguish between “small and close” and “large and far” objects without additional information. Hence, geometric reconstruction requires either prior knowledge about the scene or multiple viewpoints.
1.1 Ground Plane Observation
If we know the camera is observing a ground plane, each image pixel corresponds to a specific 3D point on that plane. This extra structure allows us to recover geometric information from a single image.
The mapping depends on:
Extrinsic parameters (camera position and orientation relative to the robot)
Intrinsic parameters (focal length and principal point)
The vector from the camera centre to a ground point, expressed in the camera frame , is:
where:
is the rotation from robot frame to camera frame
is the translation from robot centre to camera centre
is the ground point in robot coordinates
A perspective camera projects a 3D point into image coordinates using the calibration matrix :
Homogeneous Coordinates
The vector
is written in homogeneous coordinates, where the actual image coordinates are obtained by dividing by the third component. This representation simplifies projective transformations using matrix multiplication.
1.2 Ground Plane Homography
If the point lies on the ground plane, then:
Then:
where:
Combining everything:
We define a single matrix:
where is the ground plane homography, a matrix that maps ground coordinates to image coordinates.
Thus:
1.3 Direct Calibration
Instead of separately estimating intrinsic and extrinsic parameters, we can estimate the homography directly from point correspondences. This requires at least four known points on the ground plane and their corresponding image coordinates.
Using more than four points improves accuracy through least-squares estimation.
1.4 Homography Parameterization
Since homographies are defined up to scale, we write:
For one correspondence :
From rearranging:
Each point provides two linear equations. With four correspondences, we obtain eight equations to solve for using least squares.
The system can be written in matrix form:
where . This can be solved using least squares (e.g., np.linalg.lstsq()).
1.5 Using the Homography
Once is estimated, we can invert it to recover ground coordinates from image coordinates:
This is only valid for points known to lie on the ground plane.
2. Practical Applications
One application is detecting point-like ground obstacles using segmentation and blob detection. The inverse homography converts detected image points into metric ground coordinates.
We can also analyse uncertainty by placing markers at known locations and measuring reconstruction error.
2.1 Boundary-Based Distance Measurement
If we segment an image into ground and obstacles, the boundary between them lies on the ground plane. These boundary points can be transformed into ground coordinates using the homography.
This effectively turns the camera into a simple laser-like range sensor, measuring distances to walls or obstacles across multiple points simultaneously.