mirror of
https://github.com/yakhyo/uniface.git
synced 2026-05-15 04:37:49 +00:00
* feat: Add BYTETrack for face/person tracking * docs: Update documentation * ref: Update tools folder file naming and imports * docs: Update jupyter notebook examples * ref: Rename the file and remove duplicate codes * docs: Update README.md * chore: Update description in mkdocs, add keywords for face tracking * docs: Add announcement section * feat: Remove expand bbox for tracking and update docs
264 lines
7.6 KiB
Markdown
264 lines
7.6 KiB
Markdown
# Tracking
|
|
|
|
Multi-object tracking using [BYTETracker](https://github.com/yakhyo/bytetrack-tracker) with Kalman filtering and IoU-based association. The tracker assigns persistent IDs to detected objects across video frames using a two-stage association strategy — first matching high-confidence detections, then low-confidence ones.
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
BYTETracker takes detection bounding boxes as input and returns tracked bounding boxes with persistent IDs. It does not depend on any specific detector — any source of `[x1, y1, x2, y2, score]` arrays will work.
|
|
|
|
Each frame, the tracker:
|
|
|
|
1. Splits detections into high-confidence and low-confidence groups
|
|
2. Matches high-confidence detections to existing tracks using IoU
|
|
3. Matches remaining tracks to low-confidence detections (second chance)
|
|
4. Starts new tracks for unmatched high-confidence detections
|
|
5. Removes tracks that have been lost for too long
|
|
|
|
The Kalman filter predicts where each track will be in the next frame, which helps maintain associations even when detections are noisy.
|
|
|
|
---
|
|
|
|
## Basic Usage
|
|
|
|
```python
|
|
import cv2
|
|
import numpy as np
|
|
from uniface.common import xyxy_to_cxcywh
|
|
from uniface.detection import SCRFD
|
|
from uniface.tracking import BYTETracker
|
|
from uniface.draw import draw_tracks
|
|
|
|
detector = SCRFD()
|
|
tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
|
|
|
|
cap = cv2.VideoCapture("video.mp4")
|
|
|
|
while cap.isOpened():
|
|
ret, frame = cap.read()
|
|
if not ret:
|
|
break
|
|
|
|
# 1. Detect faces
|
|
faces = detector.detect(frame)
|
|
|
|
# 2. Build detections array: [x1, y1, x2, y2, score]
|
|
dets = np.array([[*f.bbox, f.confidence] for f in faces])
|
|
dets = dets if len(dets) > 0 else np.empty((0, 5))
|
|
|
|
# 3. Update tracker
|
|
tracks = tracker.update(dets)
|
|
|
|
# 4. Map track IDs back to face objects
|
|
if len(tracks) > 0 and len(faces) > 0:
|
|
face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
|
|
track_ids = tracks[:, 4].astype(int)
|
|
|
|
face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
|
|
track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
|
|
|
|
for ti in range(len(tracks)):
|
|
dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
|
|
faces[int(np.argmin(dists))].track_id = track_ids[ti]
|
|
|
|
# 5. Draw
|
|
tracked_faces = [f for f in faces if f.track_id is not None]
|
|
draw_tracks(image=frame, faces=tracked_faces)
|
|
cv2.imshow("Tracking", frame)
|
|
if cv2.waitKey(1) & 0xFF == ord('q'):
|
|
break
|
|
|
|
cap.release()
|
|
cv2.destroyAllWindows()
|
|
```
|
|
|
|
Each track ID gets a deterministic color via golden-ratio hue stepping, so the same person keeps the same color across the entire video.
|
|
|
|
---
|
|
|
|
## Webcam Tracking
|
|
|
|
```python
|
|
import cv2
|
|
import numpy as np
|
|
from uniface.common import xyxy_to_cxcywh
|
|
from uniface.detection import SCRFD
|
|
from uniface.tracking import BYTETracker
|
|
from uniface.draw import draw_tracks
|
|
|
|
detector = SCRFD()
|
|
tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
|
|
cap = cv2.VideoCapture(0)
|
|
|
|
while True:
|
|
ret, frame = cap.read()
|
|
if not ret:
|
|
break
|
|
|
|
faces = detector.detect(frame)
|
|
dets = np.array([[*f.bbox, f.confidence] for f in faces])
|
|
dets = dets if len(dets) > 0 else np.empty((0, 5))
|
|
|
|
tracks = tracker.update(dets)
|
|
|
|
if len(tracks) > 0 and len(faces) > 0:
|
|
face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
|
|
track_ids = tracks[:, 4].astype(int)
|
|
|
|
face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
|
|
track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
|
|
|
|
for ti in range(len(tracks)):
|
|
dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
|
|
faces[int(np.argmin(dists))].track_id = track_ids[ti]
|
|
|
|
draw_tracks(image=frame, faces=[f for f in faces if f.track_id is not None])
|
|
cv2.imshow("Face Tracking - Press 'q' to quit", frame)
|
|
if cv2.waitKey(1) & 0xFF == ord('q'):
|
|
break
|
|
|
|
cap.release()
|
|
cv2.destroyAllWindows()
|
|
```
|
|
|
|
---
|
|
|
|
## Parameters
|
|
|
|
```python
|
|
from uniface.tracking import BYTETracker
|
|
|
|
tracker = BYTETracker(
|
|
track_thresh=0.5,
|
|
track_buffer=30,
|
|
match_thresh=0.8,
|
|
low_thresh=0.1,
|
|
)
|
|
```
|
|
|
|
| Parameter | Default | Description |
|
|
|-----------|---------|-------------|
|
|
| `track_thresh` | 0.5 | Detections above this score go through first-pass association |
|
|
| `track_buffer` | 30 | How many frames to keep a lost track before removing it |
|
|
| `match_thresh` | 0.8 | IoU threshold for matching tracks to detections |
|
|
| `low_thresh` | 0.1 | Detections below this score are discarded entirely |
|
|
|
|
---
|
|
|
|
## Input / Output
|
|
|
|
**Input** — `(N, 5)` numpy array with `[x1, y1, x2, y2, confidence]` per detection:
|
|
|
|
```python
|
|
detections = np.array([
|
|
[100, 50, 200, 160, 0.95],
|
|
[300, 80, 380, 200, 0.87],
|
|
])
|
|
```
|
|
|
|
**Output** — `(M, 5)` numpy array with `[x1, y1, x2, y2, track_id]` per active track:
|
|
|
|
```python
|
|
tracks = tracker.update(detections)
|
|
# array([[101.2, 51.3, 199.8, 159.8, 1.],
|
|
# [300.5, 80.2, 379.7, 200.1, 2.]])
|
|
```
|
|
|
|
The output bounding boxes come from the Kalman filter prediction, so they may differ slightly from the input. Track IDs are integers that persist across frames for the same object.
|
|
|
|
---
|
|
|
|
## Resetting the Tracker
|
|
|
|
When switching to a different video or scene, reset the tracker to clear all internal state:
|
|
|
|
```python
|
|
tracker.reset()
|
|
```
|
|
|
|
This clears all active, lost, and removed tracks, resets the frame counter, and resets the ID counter back to zero.
|
|
|
|
---
|
|
|
|
## Visualization
|
|
|
|
`draw_tracks` draws bounding boxes color-coded by track ID:
|
|
|
|
```python
|
|
from uniface.draw import draw_tracks
|
|
|
|
draw_tracks(
|
|
image=frame,
|
|
faces=tracked_faces,
|
|
draw_landmarks=True,
|
|
draw_id=True,
|
|
corner_bbox=True,
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Small Face Performance
|
|
|
|
!!! warning "Tracking performance with small faces"
|
|
The tracker relies on IoU (Intersection over Union) to match detections across
|
|
frames. When faces occupy a small portion of the image — for example in
|
|
surveillance footage or wide-angle cameras — even slight movement between frames
|
|
can cause a large drop in IoU. This makes it harder for the tracker to maintain
|
|
consistent IDs, and you may see IDs switching or resetting more often than expected.
|
|
|
|
This is not specific to BYTETracker; it applies to any IoU-based tracker. A few
|
|
things that can help:
|
|
|
|
- **Lower `match_thresh`** (e.g. `0.5` or `0.6`) so the tracker accepts lower
|
|
overlap as a valid match.
|
|
- **Increase `track_buffer`** (e.g. `60` or higher) to hold onto lost tracks
|
|
longer before discarding them.
|
|
- **Use a higher-resolution input** if possible, so face bounding boxes are
|
|
larger in pixel terms.
|
|
|
|
```python
|
|
tracker = BYTETracker(
|
|
track_thresh=0.4,
|
|
track_buffer=60,
|
|
match_thresh=0.6,
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## CLI Tool
|
|
|
|
```bash
|
|
# Track faces in a video
|
|
python tools/track.py --source video.mp4
|
|
|
|
# Webcam
|
|
python tools/track.py --source 0
|
|
|
|
# Save output
|
|
python tools/track.py --source video.mp4 --output tracked.mp4
|
|
|
|
# Use RetinaFace instead of SCRFD
|
|
python tools/track.py --source video.mp4 --detector retinaface
|
|
|
|
# Keep lost tracks longer
|
|
python tools/track.py --source video.mp4 --track-buffer 60
|
|
```
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [yakhyo/bytetrack-tracker](https://github.com/yakhyo/bytetrack-tracker) — standalone BYTETracker implementation used in UniFace
|
|
- [ByteTrack paper](https://arxiv.org/abs/2110.06864) — Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- [Detection](detection.md) — face detection models
|
|
- [Video & Webcam](../recipes/video-webcam.md) — video processing patterns
|
|
- [Inputs & Outputs](../concepts/inputs-outputs.md) — data types and formats
|