uniface/docs/modules/tracking.md

# Tracking

Multi-object tracking using [BYTETracker](https://github.com/yakhyo/bytetrack-tracker) with Kalman filtering and IoU-based association. The tracker assigns persistent IDs to detected objects across video frames using a two-stage association strategy — first matching high-confidence detections, then low-confidence ones.

---

## How It Works

BYTETracker takes detection bounding boxes as input and returns tracked bounding boxes with persistent IDs. It does not depend on any specific detector — any source of `[x1, y1, x2, y2, score]` arrays will work.

Each frame, the tracker:

1. Splits detections into high-confidence and low-confidence groups
2. Matches high-confidence detections to existing tracks using IoU
3. Matches remaining tracks to low-confidence detections (second chance)
4. Starts new tracks for unmatched high-confidence detections
5. Removes tracks that have been lost for too long

The Kalman filter predicts where each track will be in the next frame, which helps maintain associations even when detections are noisy.

---

## Basic Usage

```python
import cv2
import numpy as np
from uniface.common import xyxy_to_cxcywh
from uniface.detection import SCRFD
from uniface.tracking import BYTETracker
from uniface.draw import draw_tracks

detector = SCRFD()
tracker = BYTETracker(track_thresh=0.5, track_buffer=30)

cap = cv2.VideoCapture("video.mp4")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # 1. Detect faces
    faces = detector.detect(frame)

    # 2. Build detections array: [x1, y1, x2, y2, score]
    dets = np.array([[*f.bbox, f.confidence] for f in faces])
    dets = dets if len(dets) > 0 else np.empty((0, 5))

    # 3. Update tracker
    tracks = tracker.update(dets)

    # 4. Map track IDs back to face objects
    if len(tracks) > 0 and len(faces) > 0:
        face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
        track_ids = tracks[:, 4].astype(int)

        face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
        track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]

        for ti in range(len(tracks)):
            dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
            faces[int(np.argmin(dists))].track_id = track_ids[ti]

    # 5. Draw
    tracked_faces = [f for f in faces if f.track_id is not None]
    draw_tracks(image=frame, faces=tracked_faces)
    cv2.imshow("Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
```

Each track ID gets a deterministic color via golden-ratio hue stepping, so the same person keeps the same color across the entire video.

---

## Webcam Tracking

```python
import cv2
import numpy as np
from uniface.common import xyxy_to_cxcywh
from uniface.detection import SCRFD
from uniface.tracking import BYTETracker
from uniface.draw import draw_tracks

detector = SCRFD()
tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    faces = detector.detect(frame)
    dets = np.array([[*f.bbox, f.confidence] for f in faces])
    dets = dets if len(dets) > 0 else np.empty((0, 5))

    tracks = tracker.update(dets)

    if len(tracks) > 0 and len(faces) > 0:
        face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
        track_ids = tracks[:, 4].astype(int)

        face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
        track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]

        for ti in range(len(tracks)):
            dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
            faces[int(np.argmin(dists))].track_id = track_ids[ti]

    draw_tracks(image=frame, faces=[f for f in faces if f.track_id is not None])
    cv2.imshow("Face Tracking - Press 'q' to quit", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
```

---

## Parameters

```python
from uniface.tracking import BYTETracker

tracker = BYTETracker(
    track_thresh=0.5,
    track_buffer=30,
    match_thresh=0.8,
    low_thresh=0.1,
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `track_thresh` | 0.5 | Detections above this score go through first-pass association |
| `track_buffer` | 30 | How many frames to keep a lost track before removing it |
| `match_thresh` | 0.8 | IoU threshold for matching tracks to detections |
| `low_thresh` | 0.1 | Detections below this score are discarded entirely |

---

## Input / Output

**Input** — `(N, 5)` numpy array with `[x1, y1, x2, y2, confidence]` per detection:

```python
detections = np.array([
    [100, 50, 200, 160, 0.95],
    [300, 80, 380, 200, 0.87],
])
```

**Output** — `(M, 5)` numpy array with `[x1, y1, x2, y2, track_id]` per active track:

```python
tracks = tracker.update(detections)
# array([[101.2, 51.3, 199.8, 159.8, 1.],
#        [300.5, 80.2, 379.7, 200.1, 2.]])
```

The output bounding boxes come from the Kalman filter prediction, so they may differ slightly from the input. Track IDs are integers that persist across frames for the same object.

---

## Resetting the Tracker

When switching to a different video or scene, reset the tracker to clear all internal state:

```python
tracker.reset()
```

This clears all active, lost, and removed tracks, resets the frame counter, and resets the ID counter back to zero.

---

## Visualization

`draw_tracks` draws bounding boxes color-coded by track ID:

```python
from uniface.draw import draw_tracks

draw_tracks(
    image=frame,
    faces=tracked_faces,
    draw_landmarks=True,
    draw_id=True,
    corner_bbox=True,
)
```

---

## Small Face Performance

!!! warning "Tracking performance with small faces"
    The tracker relies on IoU (Intersection over Union) to match detections across
    frames. When faces occupy a small portion of the image — for example in
    surveillance footage or wide-angle cameras — even slight movement between frames
    can cause a large drop in IoU. This makes it harder for the tracker to maintain
    consistent IDs, and you may see IDs switching or resetting more often than expected.

    This is not specific to BYTETracker; it applies to any IoU-based tracker. A few
    things that can help:

    - **Lower `match_thresh`** (e.g. `0.5` or `0.6`) so the tracker accepts lower
      overlap as a valid match.
    - **Increase `track_buffer`** (e.g. `60` or higher) to hold onto lost tracks
      longer before discarding them.
    - **Use a higher-resolution input** if possible, so face bounding boxes are
      larger in pixel terms.

    ```python
    tracker = BYTETracker(
        track_thresh=0.4,
        track_buffer=60,
        match_thresh=0.6,
    )
    ```

---

## CLI Tool

```bash
# Track faces in a video
python tools/track.py --source video.mp4

# Webcam
python tools/track.py --source 0

# Save output
python tools/track.py --source video.mp4 --output tracked.mp4

# Use RetinaFace instead of SCRFD
python tools/track.py --source video.mp4 --detector retinaface

# Keep lost tracks longer
python tools/track.py --source video.mp4 --track-buffer 60
```

---

## References

- [yakhyo/bytetrack-tracker](https://github.com/yakhyo/bytetrack-tracker) — standalone BYTETracker implementation used in UniFace
- [ByteTrack paper](https://arxiv.org/abs/2110.06864) — Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"

---

## See Also

- [Detection](detection.md) — face detection models
- [Video & Webcam](../recipes/video-webcam.md) — video processing patterns
- [Inputs & Outputs](../concepts/inputs-outputs.md) — data types and formats