Marker Based Visual Loc
Goal
The goal of this exercise is to estimate the position and orientation (pose) of a robot in a 2D space by detecting and analyzing visual markers, specifically AprilTags. This process involves using computer vision to identify the tags in the robot’s environment and mathematical methods to derive its relative pose with respect to the detected tags.
The green robot represents the real position. The blue robot represents the position from the odometry (with noise). The red robot represents the user estimated position.

Robot API
from HAL import HAL
- to import the HAL library class. This class contains the functions that receives information from the webcam.from GUI import GUI
- to import the GUI (Graphical User Interface) library class. This class contains the functions used to view the debugging information, like image widgets.HAL.getImage()
- to get the imageGUI.showImage()
- allows you to view a debug image or with relevant informationGUI.showEstimatedPose((x, y, yaw))
- allows you to view your estimate pose in the mapHAL.setV()
- to set the linear speed-
HAL.setW()
- to set the angular velocity HAL.getOdom().x
- to get the approximated X coordinate of the robot (with noise)HAL.getOdom().y
- to get the approximated XY coordinate of the robot (with noise)HAL.getOdom().yaw
- to get the approximated orientation position of the robot (with noise)
Load tags position
import yaml
from pathlib import Path
conf = yaml.safe_load(
Path("/resources/exercises/marker_visual_loc/apriltags_poses.yaml").read_text()
)
tags = conf["tags"]
The content of the yaml file has this structure:
tags:
tag_0:
position: [1.75, -3.535, -1.57079]
Being the position [X, Y, Yaw]
Theory
Visual Markers: AprilTags
AprilTags are fiducial markers similar to QR codes but designed for robust detection and pose estimation. They consist of a binary grid with a unique pattern that encodes an identifier. They are ideal for robotics applications due to:
- Their high detection reliability.
- Robustness to noise, partial occlusion, and perspective distortions.
- Easy decoding to obtain a unique ID and retrieve the tag’s known position in the environment.
Example code for apriltags detection:
import GUI
import HAL
import pyapriltags
import cv2
detector = pyapriltags.Detector(searchpath=["apriltags"], families="tag36h11")
while True:
print("[INFO] loading image...")
image = HAL.getImage()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
print("[INFO] detecting AprilTags...")
results = detector.detect(gray)
print("[INFO] {} total AprilTags detected".format(len(results)))
# loop over the AprilTag detection results
for r in results:
# extract the bounding box (x, y)-coordinates for the AprilTag
# and convert each of the (x, y)-coordinate pairs to integers
(ptA, ptB, ptC, ptD) = r.corners
ptB = (int(ptB[0]), int(ptB[1]))
ptC = (int(ptC[0]), int(ptC[1]))
ptD = (int(ptD[0]), int(ptD[1]))
ptA = (int(ptA[0]), int(ptA[1]))
# draw the bounding box of the AprilTag detection
cv2.line(image, ptA, ptB, (0, 255, 0), 2)
cv2.line(image, ptB, ptC, (0, 255, 0), 2)
cv2.line(image, ptC, ptD, (0, 255, 0), 2)
cv2.line(image, ptD, ptA, (0, 255, 0), 2)
# draw the center (x, y)-coordinates of the AprilTag
(cX, cY) = (int(r.center[0]), int(r.center[1]))
cv2.circle(image, (cX, cY), 5, (0, 0, 255), -1)
# draw the tag family on the image
tagFamily = r.tag_family.decode("utf-8")
cv2.putText(
image,
tagFamily,
(ptA[0], ptA[1] - 15),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
(0, 255, 0),
2,
)
print("[INFO] tag family: {}".format(tagFamily))
GUI.showImage(image)
Camera Model and Calibration
Before detecting AprilTags, the camera used to observe the environment must be calibrated. Camera calibration provides intrinsic parameters like focal length and optical center, which are critical for accurate pose estimation. A camera model is typically represented using the pinhole camera model with distortion corrections.
The intrinsic parameters are used in:
Projection Matrix (P): Converts 3D world coordinates to 2D image coordinates.

You can use the following code to define the default parameters:
size = image.shape
focal_length = size[1]
center = (size[1] / 2, size[0] / 2)
matrix_camera = np.array(
[[focal_length, 0, center[0]], [0, focal_length, center[1]], [0, 0, 1]],
dtype="double",
)
dist_coeffs = np.zeros((4, 1))
AprilTag Detection
The detection process includes:
- Preprocessing: Filtering the image and identifying regions likely to contain tags.
- Decoding: Analyzing the binary pattern to extract the tag’s unique ID.
- Pose Estimation: Computing the 6-DoF pose of the tag with respect to the camera.

The apriltags are set at 0.8 meters in height and have a size of 0.3 x 0.3 meters (the black part is 0.24 x 0.24 meters).
PnP (Perspective-n-Point)
Given the 3D coordinates of the tag and their corresponding 2D image points (without z), the robot’s pose can be estimated using algorithms like Direct Linear Transformation (DLT) or iterative methods (e.g., Levenberg-Marquardt).
Here is the documentation of PnP in OpenCV.
Rotation and Translation matrix
The rotation and translation matrixes can be extracted from the sdf files that describe the turtlebot 3 waffle robot.
More Resources
You may find interesting the following resources:
Videos
Contributors
- Contributors: Jose María Cañas, David Duro, Javier Izquierdo.
References
https://pyimagesearch.com/2020/11/02/apriltag-with-python/ https://en.wikipedia.org/wiki/Perspective-n-Point https://april.eecs.umich.edu/software/apriltag https://ftc-docs.firstinspires.org/en/latest/apriltag/vision_portal/apriltag_pose/apriltag-pose.html