3 Commits

Author SHA1 Message Date
yakhyo
cb81d2fcf8 fix: Cleaning up excessive comments 2026-03-11 12:22:42 +09:00
yakhyo
f0bae6dd80 feat: Add landmark and recognition models to uniface-cpp 2026-03-11 12:22:42 +09:00
yakhyo
eec8f99850 feat: Add uniface cpp support for detection only 2026-03-11 12:22:42 +09:00
99 changed files with 2786 additions and 2726 deletions

View File

@@ -20,7 +20,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.10"
- uses: pre-commit/action@v3.0.1
test:
@@ -34,11 +34,9 @@ jobs:
include:
# Full Python range on Linux (fastest runner)
- os: ubuntu-latest
python-version: "3.11"
python-version: "3.10"
- os: ubuntu-latest
python-version: "3.13"
- os: ubuntu-latest
python-version: "3.14"
- os: macos-latest
python-version: "3.13"
- os: windows-latest

View File

@@ -2,8 +2,7 @@ name: Deploy docs
on:
push:
tags:
- "v*.*.*"
branches: [main]
workflow_dispatch:
permissions:

View File

@@ -54,7 +54,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.11", "3.13"]
python-version: ["3.10", "3.13"]
steps:
- name: Checkout code
@@ -92,7 +92,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.10"
cache: 'pip'
- name: Install build tools

1
.gitignore vendored
View File

@@ -1,5 +1,6 @@
tmp_*
.vscode/
*.onnx
# Byte-compiled / optimized / DLL files
__pycache__/

View File

@@ -59,12 +59,12 @@ This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formattin
#### General Rules
- **Line length:** 120 characters maximum
- **Python version:** 3.11+ (use modern syntax)
- **Python version:** 3.10+ (use modern syntax)
- **Quote style:** Single quotes for strings, double quotes for docstrings
#### Type Hints
Use modern Python 3.11+ type hints (PEP 585 and PEP 604):
Use modern Python 3.10+ type hints (PEP 585 and PEP 604):
```python
# Preferred (modern)
@@ -184,9 +184,6 @@ Example notebooks demonstrating library usage:
| Face Parsing | [06_face_parsing.ipynb](examples/06_face_parsing.ipynb) |
| Face Anonymization | [07_face_anonymization.ipynb](examples/07_face_anonymization.ipynb) |
| Gaze Estimation | [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb) |
| Face Segmentation | [09_face_segmentation.ipynb](examples/09_face_segmentation.ipynb) |
| Face Vector Store | [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb) |
| Head Pose Estimation | [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) |
## Questions?

View File

@@ -3,7 +3,7 @@
<div align="center">
[![PyPI Version](https://img.shields.io/pypi/v/uniface.svg?label=Version)](https://pypi.org/project/uniface/)
[![Python Version](https://img.shields.io/badge/Python-3.11%2B-blue)](https://www.python.org/)
[![Python Version](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Github Build Status](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/uniface?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=Downloads)](https://pepy.tech/projects/uniface)
@@ -31,7 +31,6 @@
- **Facial Landmarks** — 106-point landmark localization module (separate from 5-point detector landmarks)
- **Face Parsing** — BiSeNet semantic segmentation (19 classes), XSeg face masking
- **Gaze Estimation** — Real-time gaze direction with MobileGaze
- **Head Pose Estimation** — 3D head orientation (pitch, yaw, roll) with 6D rotation representation
- **Attribute Analysis** — Age, gender, race (FairFace), and emotion
- **Vector Indexing** — FAISS-backed embedding store for fast multi-identity search
- **Anti-Spoofing** — Face liveness detection with MiniFASNet
@@ -186,7 +185,6 @@ Full documentation: https://yakhyo.github.io/uniface/
| Recognition | WebFace600K | ArcFace |
| Recognition | WebFace4M / 12M | AdaFace |
| Gaze | Gaze360 | MobileGaze |
| Head Pose | 300W-LP | HeadPose (ResNet, MobileNet) |
| Parsing | CelebAMask-HQ | BiSeNet |
| Attributes | CelebA, FairFace, AffectNet | AgeGender, FairFace, Emotion |
@@ -208,7 +206,6 @@ Full documentation: https://yakhyo.github.io/uniface/
| [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | Gaze direction estimation |
| [09_face_segmentation.ipynb](examples/09_face_segmentation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | Face segmentation with XSeg |
| [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
| [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | Head pose estimation (pitch, yaw, roll) |
---
@@ -237,7 +234,6 @@ If you plan commercial use, verify model license compatibility.
| Parsing | [face-parsing](https://github.com/yakhyo/face-parsing) | ✓ | BiSeNet Face Parsing |
| Parsing | [face-segmentation](https://github.com/yakhyo/face-segmentation) | - | XSeg Face Segmentation |
| Gaze | [gaze-estimation](https://github.com/yakhyo/gaze-estimation) | ✓ | MobileGaze Training |
| Head Pose | [head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) | ✓ | Head Pose Training (6DRepNet-style) |
| Anti-Spoofing | [face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) | - | MiniFASNet Inference |
| Attributes | [fairface-onnx](https://github.com/yakhyo/fairface-onnx) | - | FairFace ONNX Inference |

View File

@@ -39,20 +39,16 @@ recognizer = ArcFace(providers=['CPUExecutionProvider'])
detector = RetinaFace(providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
```
All **ONNX-based** model classes accept the `providers` parameter:
All model classes accept the `providers` parameter:
- Detection: `RetinaFace`, `SCRFD`, `YOLOv5Face`, `YOLOv8Face`
- Recognition: `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace`
- Landmarks: `Landmark106`
- Gaze: `MobileGaze`
- Parsing: `BiSeNet`, `XSeg`
- Parsing: `BiSeNet`
- Attributes: `AgeGender`, `FairFace`
- Anti-Spoofing: `MiniFASNet`
!!! note "Non-ONNX components"
- **Emotion** uses TorchScript and selects its device automatically (`mps` / `cuda` / `cpu`). It does **not** accept the `providers` parameter.
- **BlurFace** is a pure OpenCV utility and does not load any model.
---
## Check Available Providers

View File

@@ -106,27 +106,6 @@ print(f"Yaw: {np.degrees(result.yaw):.1f}°")
---
### HeadPoseResult
```python
@dataclass(frozen=True)
class HeadPoseResult:
pitch: float # Rotation around X-axis (degrees), + = looking down
yaw: float # Rotation around Y-axis (degrees), + = looking right
roll: float # Rotation around Z-axis (degrees), + = tilting clockwise
```
**Usage:**
```python
result = head_pose.estimate(face_crop)
print(f"Pitch: {result.pitch:.1f}°")
print(f"Yaw: {result.yaw:.1f}°")
print(f"Roll: {result.roll:.1f}°")
```
---
### SpoofingResult
```python
@@ -165,11 +144,11 @@ class AttributeResult:
```python
# AgeGender model
result = age_gender.predict(image, face)
result = age_gender.predict(image, face.bbox)
print(f"{result.sex}, {result.age} years old")
# FairFace model
result = fairface.predict(image, face)
result = fairface.predict(image, face.bbox)
print(f"{result.sex}, {result.age_group}, {result.race}")
```
@@ -192,7 +171,7 @@ Face recognition models return normalized 512-dimensional embeddings:
```python
embedding = recognizer.get_normalized_embedding(image, landmarks)
print(f"Shape: {embedding.shape}") # (512,)
print(f"Shape: {embedding.shape}") # (1, 512)
print(f"Norm: {np.linalg.norm(embedding):.4f}") # ~1.0
```

View File

@@ -23,7 +23,6 @@ graph TB
LMK[Landmarks]
ATTR[Attributes]
GAZE[Gaze]
HPOSE[Head Pose]
PARSE[Parsing]
SPOOF[Anti-Spoofing]
PRIV[Privacy]
@@ -46,7 +45,6 @@ graph TB
DET --> LMK
DET --> ATTR
DET --> GAZE
DET --> HPOSE
DET --> PARSE
DET --> SPOOF
DET --> PRIV
@@ -121,11 +119,10 @@ uniface/
├── attribute/ # Age, gender, emotion, race
├── parsing/ # Face semantic segmentation
├── gaze/ # Gaze estimation
├── headpose/ # Head pose estimation
├── spoofing/ # Anti-spoofing
├── privacy/ # Face anonymization
├── indexing/ # Vector indexing (FAISS)
├── types.py # Dataclasses (Face, GazeResult, HeadPoseResult, etc.)
├── types.py # Dataclasses (Face, GazeResult, etc.)
├── constants.py # Model weights and URLs
├── model_store.py # Model download and caching
├── onnx_utils.py # ONNX Runtime utilities
@@ -161,7 +158,7 @@ for face in faces:
embedding = recognizer.get_normalized_embedding(image, face.landmarks)
# Attributes
attrs = age_gender.predict(image, face)
attrs = age_gender.predict(image, face.bbox)
print(f"Face: {attrs.sex}, {attrs.age} years")
```
@@ -186,7 +183,8 @@ fairface = FairFace()
analyzer = FaceAnalyzer(
detector,
recognizer=recognizer,
attributes=[age_gender, fairface],
age_gender=age_gender,
fairface=fairface,
)
faces = analyzer.analyze(image)

View File

@@ -32,7 +32,7 @@ ruff check . --fix
**Guidelines:**
- Line length: 120
- Python 3.11+ type hints
- Python 3.10+ type hints
- Google-style docstrings
---

View File

@@ -183,30 +183,6 @@ data/
---
### Head Pose Estimation
#### 300W-LP
Large-scale synthesized face dataset with large pose variations, generated from 300W by face profiling. Used for training head pose estimation models.
| Property | Value |
| ----------- | ----------------------------- |
| Images | ~122,000 (synthesized) |
| Source | 300W (profiled) |
| Pose range | ±90° yaw |
| Evaluation | AFLW2000 |
| Used by | All HeadPose models |
!!! info "Download & Reference"
**Paper**: [Face Alignment Across Large Poses: A 3D Solution](https://arxiv.org/abs/1511.07212)
**Training code**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation)
!!! note "UniFace Models"
All HeadPose models shipped with UniFace are trained on 300W-LP and evaluated on AFLW2000.
---
### Face Parsing
#### CelebAMask-HQ

View File

@@ -13,7 +13,7 @@ template: home.html
<p class="hero-subtitle">All-in-One Open-Source Face Analysis Library</p>
[![PyPI Version](https://img.shields.io/pypi/v/uniface.svg?label=Version)](https://pypi.org/project/uniface/)
[![Python Version](https://img.shields.io/badge/Python-3.11%2B-blue)](https://www.python.org/)
[![Python Version](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Github Build Status](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/uniface?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=Downloads)](https://pepy.tech/projects/uniface)
@@ -59,11 +59,6 @@ BiSeNet semantic segmentation with 19 facial component classes.
Real-time gaze direction prediction with MobileGaze models.
</div>
<div class="feature-card" markdown>
### :material-axis-arrow: Head Pose
3D head orientation (pitch, yaw, roll) estimation with 6D rotation models.
</div>
<div class="feature-card" markdown>
### :material-motion-play: Tracking
Multi-object tracking with BYTETracker for persistent face IDs across video frames.

View File

@@ -6,7 +6,7 @@ This guide covers all installation options for UniFace.
## Requirements
- **Python**: 3.11 or higher
- **Python**: 3.10 or higher
- **Operating Systems**: macOS, Linux, Windows
---
@@ -159,11 +159,11 @@ print("Installation successful!")
### Import Errors
If you encounter import errors, ensure you're using Python 3.11+:
If you encounter import errors, ensure you're using Python 3.10+:
```bash
python --version
# Should show: Python 3.11.x or higher
# Should show: Python 3.10.x or higher
```
### Model Download Issues

View File

@@ -257,33 +257,6 @@ Gaze direction prediction models trained on [Gaze360](datasets.md#gaze360) datas
---
## Head Pose Estimation Models
### HeadPose Family
Head pose estimation models using 6D rotation representation. Trained on [300W-LP](datasets.md#300w-lp) dataset, evaluated on AFLW2000. Returns pitch, yaw, and roll angles in degrees.
| Model Name | Backbone | Size | MAE* |
| -------------- | -------- | ------- | ----- |
| `RESNET18` :material-check-circle: | ResNet18 | 43 MB | 5.22° |
| `RESNET34` | ResNet34 | 82 MB | 5.07° |
| `RESNET50` | ResNet50 | 91 MB | 4.83° |
| `MOBILENET_V2` | MobileNetV2 | 9.6 MB | 5.72° |
| `MOBILENET_V3_SMALL` | MobileNetV3-Small | 4.8 MB | 6.31° |
| `MOBILENET_V3_LARGE` | MobileNetV3-Large | 16 MB | 5.58° |
*MAE (Mean Absolute Error) in degrees on AFLW2000 test set — lower is better
!!! info "Training Data"
**Dataset**: Trained on [300W-LP](datasets.md#300w-lp) (synthesized large-pose faces from 300W)
**Method**: 6D rotation representation (rotation matrix → Euler angles)
!!! note "Input Requirements"
Requires face crop as input. Use face detection first to obtain bounding boxes.
---
## Face Parsing Models
### BiSeNet Family
@@ -399,7 +372,6 @@ See [Model Cache & Offline Use](concepts/model-cache-offline.md) for full detail
- **AdaFace ONNX**: [yakhyo/adaface-onnx](https://github.com/yakhyo/adaface-onnx) - ONNX export and inference
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
- **Head Pose Estimation**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) - 6D rotation head pose estimation training and ONNX models
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
- **Face Segmentation**: [yakhyo/face-segmentation](https://github.com/yakhyo/face-segmentation) - XSeg ONNX Inference
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))

View File

@@ -30,10 +30,9 @@ age_gender = AgeGender()
faces = detector.detect(image)
for face in faces:
result = age_gender.predict(image, face)
result = age_gender.predict(image, face.bbox)
print(f"Gender: {result.sex}") # "Female" or "Male"
print(f"Age: {result.age} years")
# face.gender and face.age are also set automatically
```
### Output
@@ -65,11 +64,10 @@ fairface = FairFace()
faces = detector.detect(image)
for face in faces:
result = fairface.predict(image, face)
result = fairface.predict(image, face.bbox)
print(f"Gender: {result.sex}")
print(f"Age Group: {result.age_group}")
print(f"Race: {result.race}")
# face.gender, face.age_group, face.race are also set automatically
```
### Output
@@ -134,7 +132,7 @@ emotion = Emotion(model_name=DDAMFNWeights.AFFECNET7)
faces = detector.detect(image)
for face in faces:
result = emotion.predict(image, face)
result = emotion.predict(image, face.landmarks)
print(f"Emotion: {result.emotion}")
print(f"Confidence: {result.confidence:.2%}")
```
@@ -181,22 +179,6 @@ emotion = Emotion(model_name=DDAMFNWeights.AFFECNET8)
---
## Factory Function
Use `create_attribute_predictor()` for dynamic model selection:
```python
from uniface import create_attribute_predictor
age_gender = create_attribute_predictor('age_gender')
fairface = create_attribute_predictor('fairface')
emotion = create_attribute_predictor('emotion')
```
Available model names: `'age_gender'`, `'fairface'`, `'emotion'`.
---
## Combining Models
### Full Attribute Analysis
@@ -213,10 +195,10 @@ faces = detector.detect(image)
for face in faces:
# Get exact age from AgeGender
ag_result = age_gender.predict(image, face)
ag_result = age_gender.predict(image, face.bbox)
# Get race from FairFace
ff_result = fairface.predict(image, face)
ff_result = fairface.predict(image, face.bbox)
print(f"Gender: {ag_result.sex}")
print(f"Exact Age: {ag_result.age}")
@@ -233,7 +215,7 @@ from uniface.detection import RetinaFace
analyzer = FaceAnalyzer(
RetinaFace(),
attributes=[AgeGender()],
age_gender=AgeGender(),
)
faces = analyzer.analyze(image)
@@ -275,7 +257,7 @@ def draw_attributes(image, face, result):
# Usage
for face in faces:
result = age_gender.predict(image, face)
result = age_gender.predict(image, face.bbox)
image = draw_attributes(image, face, result)
cv2.imwrite("attributes.jpg", image)

View File

@@ -267,7 +267,6 @@ gaze = create_gaze_estimator() # Returns MobileGaze
## Next Steps
- [Head Pose Estimation](headpose.md) - 3D head orientation
- [Anti-Spoofing](spoofing.md) - Face liveness detection
- [Privacy](privacy.md) - Face anonymization
- [Video Recipe](../recipes/video-webcam.md) - Real-time processing

View File

@@ -1,232 +0,0 @@
# Head Pose Estimation
Head pose estimation predicts the 3D orientation of a person's head (pitch, yaw, and roll angles).
---
## Available Models
| Model | Backbone | Size | MAE* |
|-------|----------|------|------|
| **ResNet18** :material-check-circle: | ResNet18 | 43 MB | 5.22° |
| ResNet34 | ResNet34 | 82 MB | 5.07° |
| ResNet50 | ResNet50 | 91 MB | 4.83° |
| MobileNetV2 | MobileNetV2 | 9.6 MB | 5.72° |
| MobileNetV3-Small | MobileNetV3 | 4.8 MB | 6.31° |
| MobileNetV3-Large | MobileNetV3 | 16 MB | 5.58° |
*MAE = Mean Absolute Error on AFLW2000 test set (lower is better)
---
## Basic Usage
```python
import cv2
from uniface.detection import RetinaFace
from uniface.headpose import HeadPose
detector = RetinaFace()
head_pose = HeadPose()
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
for face in faces:
# Crop face
x1, y1, x2, y2 = map(int, face.bbox)
face_crop = image[y1:y2, x1:x2]
if face_crop.size > 0:
# Estimate head pose
result = head_pose.estimate(face_crop)
print(f"Pitch: {result.pitch:.1f}°, Yaw: {result.yaw:.1f}°, Roll: {result.roll:.1f}°")
```
---
## Model Variants
```python
from uniface.headpose import HeadPose
from uniface.constants import HeadPoseWeights
# Default (ResNet18, recommended balance of speed and accuracy)
hp = HeadPose()
# Lightweight for mobile/edge
hp = HeadPose(model_name=HeadPoseWeights.MOBILENET_V3_SMALL)
# Higher accuracy
hp = HeadPose(model_name=HeadPoseWeights.RESNET50)
```
---
## Output Format
```python
result = head_pose.estimate(face_crop)
# HeadPoseResult dataclass
result.pitch # Rotation around X-axis in degrees
result.yaw # Rotation around Y-axis in degrees
result.roll # Rotation around Z-axis in degrees
```
### Angle Convention
```
pitch > 0 (looking down)
yaw < 0 ─────┼───── yaw > 0
(looking left) │ (looking right)
pitch < 0 (looking up)
roll > 0 = clockwise tilt
roll < 0 = counter-clockwise tilt
```
- **Pitch**: Rotation around X-axis (positive = looking down)
- **Yaw**: Rotation around Y-axis (positive = looking right)
- **Roll**: Rotation around Z-axis (positive = tilting clockwise)
---
## Visualization
### 3D Cube (default)
The default visualization draws a wireframe cube oriented to match the head pose.
```python
from uniface.draw import draw_head_pose
faces = detector.detect(image)
for face in faces:
x1, y1, x2, y2 = map(int, face.bbox)
face_crop = image[y1:y2, x1:x2]
if face_crop.size > 0:
result = head_pose.estimate(face_crop)
# Draw cube on image (default)
draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll)
cv2.imwrite("headpose_output.jpg", image)
```
### Axis Visualization
```python
from uniface.draw import draw_head_pose
# X/Y/Z coordinate axes
draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll, draw_type='axis')
```
### Low-Level Drawing Functions
```python
from uniface.draw import draw_head_pose_cube, draw_head_pose_axis
# Draw cube directly
draw_head_pose_cube(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
# Draw axes directly
draw_head_pose_axis(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
```
---
## Real-Time Head Pose Tracking
```python
import cv2
from uniface.detection import RetinaFace
from uniface.headpose import HeadPose
from uniface.draw import draw_head_pose
detector = RetinaFace()
head_pose = HeadPose()
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
faces = detector.detect(frame)
for face in faces:
x1, y1, x2, y2 = map(int, face.bbox)
face_crop = frame[y1:y2, x1:x2]
if face_crop.size > 0:
result = head_pose.estimate(face_crop)
draw_head_pose(frame, face.bbox, result.pitch, result.yaw, result.roll)
cv2.imshow("Head Pose Estimation", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
```
---
## Use Cases
### Driver Drowsiness Detection
```python
def is_head_drooping(result, pitch_threshold=-15):
"""Check if the head is drooping (looking down significantly)."""
return result.pitch < pitch_threshold
result = head_pose.estimate(face_crop)
if is_head_drooping(result):
print("Warning: Head drooping detected")
```
### Attention Monitoring
```python
def is_facing_forward(result, threshold=20):
"""Check if the person is facing roughly forward."""
return (
abs(result.pitch) < threshold
and abs(result.yaw) < threshold
and abs(result.roll) < threshold
)
result = head_pose.estimate(face_crop)
if is_facing_forward(result):
print("Facing forward")
else:
print("Looking away")
```
---
## Factory Function
```python
from uniface.headpose import create_head_pose_estimator
hp = create_head_pose_estimator() # Returns HeadPose
```
---
## Next Steps
- [Gaze Estimation](gaze.md) - Eye gaze direction
- [Anti-Spoofing](spoofing.md) - Face liveness detection
- [Video Recipe](../recipes/video-webcam.md) - Real-time processing

View File

@@ -18,7 +18,6 @@ Run UniFace examples directly in your browser with Google Colab, or download and
| [Gaze Estimation](https://github.com/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | Gaze direction estimation |
| [Face Segmentation](https://github.com/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | Face segmentation with XSeg |
| [Face Vector Store](https://github.com/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
| [Head Pose Estimation](https://github.com/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | 3D head orientation estimation |
---

View File

@@ -80,6 +80,7 @@ Compare two faces:
```python
import cv2
import numpy as np
from uniface.detection import RetinaFace
from uniface.recognition import ArcFace
@@ -96,13 +97,12 @@ faces1 = detector.detect(image1)
faces2 = detector.detect(image2)
if faces1 and faces2:
# Extract embeddings (normalized 1-D vectors)
# Extract embeddings
emb1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
emb2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)
# Compute cosine similarity
from uniface import compute_similarity
similarity = compute_similarity(emb1, emb2, normalized=True)
# Compute similarity (cosine similarity)
similarity = np.dot(emb1, emb2.T)[0][0]
# Interpret result
if similarity > 0.6:
@@ -135,7 +135,7 @@ faces = detector.detect(image)
# Predict attributes
for i, face in enumerate(faces):
result = age_gender.predict(image, face)
result = age_gender.predict(image, face.bbox)
print(f"Face {i+1}: {result.sex}, {result.age} years old")
```
@@ -164,7 +164,7 @@ image = cv2.imread("photo.jpg")
faces = detector.detect(image)
for i, face in enumerate(faces):
result = fairface.predict(image, face)
result = fairface.predict(image, face.bbox)
print(f"Face {i+1}: {result.sex}, {result.age_group}, {result.race}")
```
@@ -234,36 +234,6 @@ cv2.imwrite("gaze_output.jpg", image)
---
## Head Pose Estimation
```python
import cv2
from uniface.detection import RetinaFace
from uniface.headpose import HeadPose
from uniface.draw import draw_head_pose
detector = RetinaFace()
head_pose = HeadPose()
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
for i, face in enumerate(faces):
x1, y1, x2, y2 = map(int, face.bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size > 0:
result = head_pose.estimate(face_crop)
print(f"Face {i+1}: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°")
# Draw 3D cube visualization
draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll)
cv2.imwrite("headpose_output.jpg", image)
```
---
## Face Parsing
Segment face into semantic components:
@@ -454,7 +424,6 @@ For detailed model comparisons and benchmarks, see the [Model Zoo](models.md).
| Recognition | `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace` |
| Tracking | `BYTETracker` |
| Gaze | `MobileGaze` (ResNet18/34/50, MobileNetV2, MobileOneS0) |
| Head Pose | `HeadPose` (ResNet18/34/50, MobileNetV2/V3) |
| Parsing | `BiSeNet` (ResNet18/34) |
| Attributes | `AgeGender`, `FairFace`, `Emotion` |
| Anti-Spoofing | `MiniFASNet` (V1SE, V2) |
@@ -501,7 +470,6 @@ from uniface.recognition import ArcFace, AdaFace
from uniface.attribute import AgeGender, FairFace
from uniface.landmark import Landmark106
from uniface.gaze import MobileGaze
from uniface.headpose import HeadPose
from uniface.parsing import BiSeNet, XSeg
from uniface.privacy import BlurFace
from uniface.spoofing import MiniFASNet

View File

@@ -34,7 +34,7 @@ def process_image(image_path):
embedding = recognizer.get_normalized_embedding(image, face.landmarks)
# Step 3: Predict attributes
attrs = age_gender.predict(image, face)
attrs = age_gender.predict(image, face.bbox)
results.append({
'face_id': i,
@@ -83,7 +83,7 @@ age_gender = AgeGender()
analyzer = FaceAnalyzer(
detector,
recognizer=recognizer,
attributes=[age_gender],
age_gender=age_gender,
)
# Process image
@@ -109,12 +109,11 @@ import numpy as np
from uniface.attribute import AgeGender, FairFace
from uniface.detection import RetinaFace
from uniface.gaze import MobileGaze
from uniface.headpose import HeadPose
from uniface.landmark import Landmark106
from uniface.recognition import ArcFace
from uniface.parsing import BiSeNet
from uniface.spoofing import MiniFASNet
from uniface.draw import draw_detections, draw_gaze, draw_head_pose
from uniface.draw import draw_detections, draw_gaze
class FaceAnalysisPipeline:
def __init__(self):
@@ -125,7 +124,6 @@ class FaceAnalysisPipeline:
self.fairface = FairFace()
self.landmarker = Landmark106()
self.gaze = MobileGaze()
self.head_pose = HeadPose()
self.parser = BiSeNet()
self.spoofer = MiniFASNet()
@@ -147,12 +145,12 @@ class FaceAnalysisPipeline:
)
# Attributes
ag_result = self.age_gender.predict(image, face)
ag_result = self.age_gender.predict(image, face.bbox)
result['age'] = ag_result.age
result['gender'] = ag_result.sex
# FairFace attributes
ff_result = self.fairface.predict(image, face)
ff_result = self.fairface.predict(image, face.bbox)
result['age_group'] = ff_result.age_group
result['race'] = ff_result.race
@@ -169,13 +167,6 @@ class FaceAnalysisPipeline:
result['gaze_pitch'] = gaze_result.pitch
result['gaze_yaw'] = gaze_result.yaw
# Head pose estimation
if face_crop.size > 0:
hp_result = self.head_pose.estimate(face_crop)
result['head_pitch'] = hp_result.pitch
result['head_yaw'] = hp_result.yaw
result['head_roll'] = hp_result.roll
# Face parsing
if face_crop.size > 0:
result['parsing_mask'] = self.parser.parse(face_crop)
@@ -198,7 +189,6 @@ for i, r in enumerate(results):
print(f" Gender: {r['gender']}, Age: {r['age']}")
print(f" Race: {r['race']}, Age Group: {r['age_group']}")
print(f" Gaze: pitch={np.degrees(r['gaze_pitch']):.1f}°")
print(f" Head Pose: P={r['head_pitch']:.1f}° Y={r['head_yaw']:.1f}° R={r['head_roll']:.1f}°")
print(f" Real: {r['is_real']} ({r['spoof_confidence']:.1%})")
```
@@ -230,7 +220,7 @@ def visualize_analysis(image_path, output_path):
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Age and gender
attrs = age_gender.predict(image, face)
attrs = age_gender.predict(image, face.bbox)
label = f"{attrs.sex}, {attrs.age}y"
cv2.putText(image, label, (x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
@@ -278,11 +268,6 @@ def results_to_json(results):
'gaze': {
'pitch_deg': float(np.degrees(r['gaze_pitch'])) if 'gaze_pitch' in r else None,
'yaw_deg': float(np.degrees(r['gaze_yaw'])) if 'gaze_yaw' in r else None
},
'head_pose': {
'pitch': float(r['head_pitch']) if 'head_pitch' in r else None,
'yaw': float(r['head_yaw']) if 'head_yaw' in r else None,
'roll': float(r['head_roll']) if 'head_roll' in r else None
}
}
output.append(item)
@@ -306,4 +291,3 @@ with open('results.json', 'w') as f:
- [Face Search](face-search.md) - Build a search system
- [Detection Module](../modules/detection.md) - Detection options
- [Recognition Module](../modules/recognition.md) - Recognition details
- [Head Pose Module](../modules/headpose.md) - Head orientation estimation

View File

@@ -175,4 +175,3 @@ while True:
- [Batch Processing](batch-processing.md) - Process multiple files
- [Detection Module](../modules/detection.md) - Detection options
- [Gaze Module](../modules/gaze.md) - Gaze estimation
- [Head Pose Module](../modules/headpose.md) - Head orientation estimation

View File

@@ -51,7 +51,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3.2.0\n"
"3.0.0\n"
]
}
],

View File

@@ -55,7 +55,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3.2.0\n"
"3.0.0\n"
]
}
],

View File

@@ -44,7 +44,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3.2.0\n"
"3.0.0\n"
]
}
],

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -53,7 +53,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"UniFace version: 3.2.0\n"
"UniFace version: 3.0.0\n"
]
}
],

View File

@@ -51,7 +51,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"UniFace version: 3.2.0\n"
"UniFace version: 3.0.0\n"
]
}
],

View File

@@ -51,7 +51,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"UniFace version: 3.2.0\n"
"UniFace version: 3.0.0\n"
]
}
],

View File

@@ -53,7 +53,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"UniFace version: 3.2.0\n"
"UniFace version: 3.0.0\n"
]
}
],

View File

@@ -54,7 +54,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"UniFace version: 3.2.0\n"
"UniFace version: 3.0.0\n"
]
}
],

File diff suppressed because one or more lines are too long

View File

@@ -151,7 +151,6 @@ nav:
- Attributes: modules/attributes.md
- Parsing: modules/parsing.md
- Gaze: modules/gaze.md
- Head Pose: modules/headpose.md
- Anti-Spoofing: modules/spoofing.md
- Privacy: modules/privacy.md
- Indexing: modules/indexing.md

View File

@@ -1,6 +1,6 @@
[project]
name = "uniface"
version = "3.2.0"
version = "3.1.0"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Tracking, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
readme = "README.md"
license = "MIT"
@@ -9,7 +9,7 @@ maintainers = [
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" },
]
requires-python = ">=3.11,<3.15"
requires-python = ">=3.10,<3.14"
keywords = [
"face-detection",
"face-recognition",
@@ -34,17 +34,17 @@ classifiers = [
"Intended Audience :: Science/Research",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
]
dependencies = [
"numpy>=1.21.0",
"opencv-python>=4.5.0",
"onnxruntime>=1.16.0",
"scikit-image>=0.26.0",
"scikit-image>=0.19.0",
"scipy>=1.7.0",
"requests>=2.28.0",
"tqdm>=4.64.0",
@@ -73,7 +73,7 @@ uniface = ["py.typed"]
[tool.ruff]
line-length = 120
target-version = "py311"
target-version = "py310"
exclude = [
".git",
".ruff_cache",

View File

@@ -1,7 +1,7 @@
numpy>=1.21.0
opencv-python>=4.5.0
onnxruntime>=1.16.0
scikit-image>=0.26.0
scikit-image>=0.19.0
scipy>=1.7.0
requests>=2.28.0
tqdm>=4.64.0

View File

@@ -9,14 +9,6 @@ import numpy as np
import pytest
from uniface.attribute import AgeGender, AttributeResult
from uniface.types import Face
def _make_face(bbox: list[int] | np.ndarray) -> Face:
"""Helper: build a minimal Face from a bounding box."""
bbox = np.asarray(bbox)
landmarks = np.zeros((5, 2), dtype=np.float32)
return Face(bbox=bbox, confidence=0.99, landmarks=landmarks)
@pytest.fixture
@@ -30,30 +22,30 @@ def mock_image():
@pytest.fixture
def mock_face():
return _make_face([100, 100, 300, 300])
def mock_bbox():
return [100, 100, 300, 300]
def test_model_initialization(age_gender_model):
assert age_gender_model is not None, 'AgeGender model initialization failed.'
def test_prediction_output_format(age_gender_model, mock_image, mock_face):
result = age_gender_model.predict(mock_image, mock_face)
def test_prediction_output_format(age_gender_model, mock_image, mock_bbox):
result = age_gender_model.predict(mock_image, mock_bbox)
assert isinstance(result, AttributeResult), f'Result should be AttributeResult, got {type(result)}'
assert isinstance(result.gender, int), f'Gender should be int, got {type(result.gender)}'
assert isinstance(result.age, int), f'Age should be int, got {type(result.age)}'
assert isinstance(result.sex, str), f'Sex should be str, got {type(result.sex)}'
def test_gender_values(age_gender_model, mock_image, mock_face):
result = age_gender_model.predict(mock_image, mock_face)
def test_gender_values(age_gender_model, mock_image, mock_bbox):
result = age_gender_model.predict(mock_image, mock_bbox)
assert result.gender in [0, 1], f'Gender should be 0 (Female) or 1 (Male), got {result.gender}'
assert result.sex in ['Female', 'Male'], f'Sex should be Female or Male, got {result.sex}'
def test_age_range(age_gender_model, mock_image, mock_face):
result = age_gender_model.predict(mock_image, mock_face)
def test_age_range(age_gender_model, mock_image, mock_bbox):
result = age_gender_model.predict(mock_image, mock_bbox)
assert 0 <= result.age <= 120, f'Age should be between 0 and 120, got {result.age}'
@@ -65,52 +57,39 @@ def test_different_bbox_sizes(age_gender_model, mock_image):
]
for bbox in test_bboxes:
face = _make_face(bbox)
result = age_gender_model.predict(mock_image, face)
result = age_gender_model.predict(mock_image, bbox)
assert result.gender in [0, 1], f'Failed for bbox {bbox}'
assert 0 <= result.age <= 120, f'Age out of range for bbox {bbox}'
def test_different_image_sizes(age_gender_model):
def test_different_image_sizes(age_gender_model, mock_bbox):
test_sizes = [(480, 640, 3), (720, 1280, 3), (1080, 1920, 3)]
face = _make_face([100, 100, 300, 300])
for size in test_sizes:
mock_image = np.random.randint(0, 255, size, dtype=np.uint8)
result = age_gender_model.predict(mock_image, face)
result = age_gender_model.predict(mock_image, mock_bbox)
assert result.gender in [0, 1], f'Failed for image size {size}'
assert 0 <= result.age <= 120, f'Age out of range for image size {size}'
def test_consistency(age_gender_model, mock_image, mock_face):
result1 = age_gender_model.predict(mock_image, mock_face)
result2 = age_gender_model.predict(mock_image, mock_face)
def test_consistency(age_gender_model, mock_image, mock_bbox):
result1 = age_gender_model.predict(mock_image, mock_bbox)
result2 = age_gender_model.predict(mock_image, mock_bbox)
assert result1.gender == result2.gender, 'Same input should produce same gender prediction'
assert result1.age == result2.age, 'Same input should produce same age prediction'
def test_face_enrichment(age_gender_model, mock_image, mock_face):
"""predict() must write gender & age back to the Face object."""
assert mock_face.gender is None
assert mock_face.age is None
result = age_gender_model.predict(mock_image, mock_face)
assert mock_face.gender == result.gender
assert mock_face.age == result.age
def test_bbox_list_format(age_gender_model, mock_image):
face = _make_face([100, 100, 300, 300])
result = age_gender_model.predict(mock_image, face)
bbox_list = [100, 100, 300, 300]
result = age_gender_model.predict(mock_image, bbox_list)
assert result.gender in [0, 1], 'Should work with bbox as list'
assert 0 <= result.age <= 120, 'Age should be in valid range'
def test_bbox_array_format(age_gender_model, mock_image):
face = _make_face(np.array([100, 100, 300, 300]))
result = age_gender_model.predict(mock_image, face)
bbox_array = np.array([100, 100, 300, 300])
result = age_gender_model.predict(mock_image, bbox_array)
assert result.gender in [0, 1], 'Should work with bbox as numpy array'
assert 0 <= result.age <= 120, 'Age should be in valid range'
@@ -124,8 +103,7 @@ def test_multiple_predictions(age_gender_model, mock_image):
results = []
for bbox in bboxes:
face = _make_face(bbox)
result = age_gender_model.predict(mock_image, face)
result = age_gender_model.predict(mock_image, bbox)
results.append(result)
assert len(results) == 3, 'Should have 3 predictions'
@@ -134,26 +112,28 @@ def test_multiple_predictions(age_gender_model, mock_image):
assert 0 <= result.age <= 120
def test_age_is_positive(age_gender_model, mock_image, mock_face):
def test_age_is_positive(age_gender_model, mock_image, mock_bbox):
for _ in range(5):
result = age_gender_model.predict(mock_image, mock_face)
result = age_gender_model.predict(mock_image, mock_bbox)
assert result.age >= 0, f'Age should be non-negative, got {result.age}'
def test_output_format_for_visualization(age_gender_model, mock_image, mock_face):
result = age_gender_model.predict(mock_image, mock_face)
def test_output_format_for_visualization(age_gender_model, mock_image, mock_bbox):
result = age_gender_model.predict(mock_image, mock_bbox)
text = f'{result.sex}, {result.age}y'
assert isinstance(text, str), 'Should be able to format as string'
assert 'Male' in text or 'Female' in text, 'Text should contain gender'
assert 'y' in text, "Text should contain 'y' for years"
def test_attribute_result_fields(age_gender_model, mock_image, mock_face):
def test_attribute_result_fields(age_gender_model, mock_image, mock_bbox):
"""Test that AttributeResult has correct fields for AgeGender model."""
result = age_gender_model.predict(mock_image, mock_face)
result = age_gender_model.predict(mock_image, mock_bbox)
# AgeGender should set gender and age
assert result.gender is not None
assert result.age is not None
# AgeGender should NOT set race and age_group (FairFace only)
assert result.race is None
assert result.age_group is None

View File

@@ -9,14 +9,12 @@ import numpy as np
import pytest
from uniface import (
create_attribute_predictor,
create_detector,
create_landmarker,
create_recognizer,
list_available_detectors,
)
from uniface.attribute import AgeGender, FairFace
from uniface.constants import AgeGenderWeights, FairFaceWeights, RetinaFaceWeights, SCRFDWeights
from uniface.constants import RetinaFaceWeights, SCRFDWeights
from uniface.spoofing import MiniFASNet, create_spoofer
@@ -167,7 +165,7 @@ def test_recognizer_inference_from_factory():
embedding = recognizer.get_embedding(mock_image)
assert embedding is not None, 'Recognizer should return embedding'
assert embedding.shape == (1, 512), 'get_embedding should return (1, 512) with batch dimension'
assert embedding.shape[1] == 512, 'Should return 512-dimensional embedding'
def test_landmarker_inference_from_factory():
@@ -238,19 +236,3 @@ def test_create_spoofer_with_providers():
"""Test that create_spoofer forwards providers kwarg without TypeError."""
spoofer = create_spoofer(providers=['CPUExecutionProvider'])
assert isinstance(spoofer, MiniFASNet), 'Should return MiniFASNet instance'
# create_attribute_predictor tests
def test_create_attribute_predictor_age_gender():
predictor = create_attribute_predictor(AgeGenderWeights.DEFAULT)
assert isinstance(predictor, AgeGender), 'Should return AgeGender instance'
def test_create_attribute_predictor_fairface():
predictor = create_attribute_predictor(FairFaceWeights.DEFAULT)
assert isinstance(predictor, FairFace), 'Should return FairFace instance'
def test_create_attribute_predictor_invalid():
with pytest.raises(ValueError, match='Unsupported attribute model'):
create_attribute_predictor('invalid_model')

View File

@@ -1,115 +0,0 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
import numpy as np
import pytest
from uniface import HeadPose, HeadPoseResult, create_head_pose_estimator
from uniface.headpose import BaseHeadPoseEstimator
from uniface.headpose.models import HeadPose as HeadPoseModel
def test_create_head_pose_estimator_default():
"""Test creating a head pose estimator with default parameters."""
estimator = create_head_pose_estimator()
assert isinstance(estimator, HeadPose), 'Should return HeadPose instance'
def test_create_head_pose_estimator_aliases():
"""Test that factory accepts all documented aliases."""
for alias in ('headpose', 'head_pose', '6drepnet'):
estimator = create_head_pose_estimator(alias)
assert isinstance(estimator, HeadPose), f"Alias '{alias}' should return HeadPose"
def test_create_head_pose_estimator_invalid():
"""Test that invalid method raises ValueError."""
with pytest.raises(ValueError, match='Unsupported head pose estimation method'):
create_head_pose_estimator('invalid_method')
def test_head_pose_inference():
"""Test that HeadPose can run inference on a mock image."""
estimator = HeadPose()
mock_image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
result = estimator.estimate(mock_image)
assert isinstance(result, HeadPoseResult), 'Should return HeadPoseResult'
assert isinstance(result.pitch, float), 'pitch should be float'
assert isinstance(result.yaw, float), 'yaw should be float'
assert isinstance(result.roll, float), 'roll should be float'
def test_head_pose_callable():
"""Test that HeadPose is callable via __call__."""
estimator = HeadPose()
mock_image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
result = estimator(mock_image)
assert isinstance(result, HeadPoseResult), '__call__ should return HeadPoseResult'
def test_head_pose_result_repr():
"""Test HeadPoseResult repr formatting."""
result = HeadPoseResult(pitch=10.5, yaw=-20.3, roll=5.1)
repr_str = repr(result)
assert 'HeadPoseResult' in repr_str
assert '10.5' in repr_str
assert '-20.3' in repr_str
assert '5.1' in repr_str
def test_head_pose_result_frozen():
"""Test that HeadPoseResult is immutable."""
result = HeadPoseResult(pitch=1.0, yaw=2.0, roll=3.0)
with pytest.raises(AttributeError):
result.pitch = 99.0 # type: ignore[misc]
def test_rotation_matrix_to_euler_identity():
"""Test that identity rotation matrix gives zero angles."""
identity = np.eye(3).reshape(1, 3, 3)
euler = HeadPoseModel.rotation_matrix_to_euler(identity)
assert euler.shape == (1, 3), 'Should return (1, 3) shaped array'
np.testing.assert_allclose(euler[0], [0.0, 0.0, 0.0], atol=1e-5)
def test_rotation_matrix_to_euler_90deg_yaw():
"""Test 90-degree yaw rotation."""
angle = np.radians(90)
R = np.array(
[
[np.cos(angle), 0, np.sin(angle)],
[0, 1, 0],
[-np.sin(angle), 0, np.cos(angle)],
]
).reshape(1, 3, 3)
euler = HeadPoseModel.rotation_matrix_to_euler(R)
np.testing.assert_allclose(euler[0, 1], 90.0, atol=1e-3)
def test_rotation_matrix_to_euler_batch():
"""Test batch processing of rotation matrices."""
batch = np.stack([np.eye(3), np.eye(3), np.eye(3)], axis=0)
euler = HeadPoseModel.rotation_matrix_to_euler(batch)
assert euler.shape == (3, 3), 'Batch of 3 should return (3, 3)'
np.testing.assert_allclose(euler, 0.0, atol=1e-5)
def test_factory_returns_correct_type():
"""Test that factory function returns BaseHeadPoseEstimator subclass."""
estimator = create_head_pose_estimator()
assert isinstance(estimator, BaseHeadPoseEstimator), 'Should be BaseHeadPoseEstimator subclass'
def test_head_pose_with_providers():
"""Test that HeadPose accepts providers kwarg."""
estimator = HeadPose(providers=['CPUExecutionProvider'])
assert isinstance(estimator, HeadPose), 'Should create with explicit providers'

View File

@@ -74,7 +74,7 @@ def test_arcface_embedding_shape(arcface_model, mock_aligned_face):
"""
embedding = arcface_model.get_embedding(mock_aligned_face)
# ArcFace get_embedding returns raw ONNX output with batch dimension
# ArcFace typically produces 512-dimensional embeddings
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
@@ -88,8 +88,7 @@ def test_arcface_normalized_embedding(arcface_model, mock_landmarks):
embedding = arcface_model.get_normalized_embedding(mock_image, mock_landmarks)
# Check shape and normalization
assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
# Check that embedding is normalized (L2 norm ≈ 1.0)
norm = np.linalg.norm(embedding)
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
@@ -126,7 +125,7 @@ def test_mobileface_embedding_shape(mobileface_model, mock_aligned_face):
"""
embedding = mobileface_model.get_embedding(mock_aligned_face)
# MobileFace get_embedding returns raw ONNX output with batch dimension
# MobileFace typically produces 512-dimensional embeddings
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
@@ -139,7 +138,6 @@ def test_mobileface_normalized_embedding(mobileface_model, mock_landmarks):
embedding = mobileface_model.get_normalized_embedding(mock_image, mock_landmarks)
assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
norm = np.linalg.norm(embedding)
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
@@ -158,7 +156,7 @@ def test_sphereface_embedding_shape(sphereface_model, mock_aligned_face):
"""
embedding = sphereface_model.get_embedding(mock_aligned_face)
# SphereFace get_embedding returns raw ONNX output with batch dimension
# SphereFace typically produces 512-dimensional embeddings
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
@@ -171,7 +169,6 @@ def test_sphereface_normalized_embedding(sphereface_model, mock_landmarks):
embedding = sphereface_model.get_normalized_embedding(mock_image, mock_landmarks)
assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
norm = np.linalg.norm(embedding)
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'

View File

@@ -12,11 +12,9 @@ CLI utilities for testing and running UniFace features.
| `anonymize.py` | Face anonymization/blurring for privacy |
| `emotion.py` | Emotion detection (7 or 8 emotions) |
| `gaze.py` | Gaze direction estimation |
| `headpose.py` | Head pose estimation (pitch, yaw, roll) |
| `landmarks.py` | 106-point facial landmark detection |
| `recognize.py` | Face embedding extraction and comparison |
| `search.py` | Real-time face matching against reference |
| `faiss_search.py` | FAISS index build and multi-identity face search |
| `fairface.py` | FairFace attribute prediction (race, gender, age) |
| `attribute.py` | Age and gender prediction |
| `spoofing.py` | Face anti-spoofing detection |
@@ -63,11 +61,6 @@ python tools/emotion.py --source 0
python tools/gaze.py --source assets/test.jpg
python tools/gaze.py --source 0
# Head pose estimation
python tools/headpose.py --source assets/test.jpg
python tools/headpose.py --source 0
python tools/headpose.py --source 0 --draw-type axis
# Landmarks
python tools/landmarks.py --source assets/test.jpg
python tools/landmarks.py --source 0
@@ -115,7 +108,7 @@ python tools/download_model.py # downloads all
| Option | Description |
|--------|-------------|
| `--source` | Input source: image/video path or camera ID (0, 1, ...) |
| `--detector` | Choose detector: `retinaface`, `scrfd`, `yolov5face`, `yolov8face` |
| `--detector` | Choose detector: `retinaface`, `scrfd`, `yolov5face` |
| `--threshold` | Visualization confidence threshold (default: varies) |
| `--save-dir` | Output directory (default: `outputs`) |

View File

@@ -167,9 +167,9 @@ def run_camera(analyzer, camera_id: int = 0):
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = analyzer.analyze(frame)
@@ -201,7 +201,7 @@ def main():
detector = RetinaFace()
recognizer = ArcFace()
age_gender = AgeGender()
analyzer = FaceAnalyzer(detector, recognizer=recognizer, attributes=[age_gender])
analyzer = FaceAnalyzer(detector, recognizer, age_gender)
source_type = get_source_type(args.source)

View File

@@ -121,9 +121,9 @@ def run_camera(detector, blurrer: BlurFace, camera_id: int = 0):
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
if faces:

View File

@@ -60,7 +60,7 @@ def process_image(
)
for i, face in enumerate(faces):
result = age_gender.predict(image, face)
result = age_gender.predict(image, face.bbox)
print(f' Face {i + 1}: {result.sex}, {result.age} years old')
draw_age_gender_label(image, face.bbox, result.sex, result.age)
@@ -112,7 +112,7 @@ def process_video(
)
for face in faces:
result = age_gender.predict(frame, face)
result = age_gender.predict(frame, face.bbox)
draw_age_gender_label(frame, face.bbox, result.sex, result.age)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
@@ -137,9 +137,9 @@ def run_camera(detector, age_gender, camera_id: int = 0, threshold: float = 0.6)
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
@@ -151,7 +151,7 @@ def run_camera(detector, age_gender, camera_id: int = 0, threshold: float = 0.6)
)
for face in faces:
result = age_gender.predict(frame, face)
result = age_gender.predict(frame, face.bbox)
draw_age_gender_label(frame, face.bbox, result.sex, result.age)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

View File

@@ -135,9 +135,9 @@ def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):
prev_time = time.perf_counter()
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)

View File

@@ -4,7 +4,6 @@ from uniface.constants import (
AgeGenderWeights,
ArcFaceWeights,
DDAMFNWeights,
HeadPoseWeights,
LandmarkWeights,
MobileFaceWeights,
RetinaFaceWeights,
@@ -22,7 +21,6 @@ MODEL_TYPES = {
'ddamfn': DDAMFNWeights,
'agegender': AgeGenderWeights,
'landmark': LandmarkWeights,
'headpose': HeadPoseWeights,
}

View File

@@ -60,7 +60,7 @@ def process_image(
)
for i, face in enumerate(faces):
result = emotion_predictor.predict(image, face)
result = emotion_predictor.predict(image, face.landmarks)
print(f' Face {i + 1}: {result.emotion} (confidence: {result.confidence:.3f})')
draw_emotion_label(image, face.bbox, result.emotion, result.confidence)
@@ -112,7 +112,7 @@ def process_video(
)
for face in faces:
result = emotion_predictor.predict(frame, face)
result = emotion_predictor.predict(frame, face.landmarks)
draw_emotion_label(frame, face.bbox, result.emotion, result.confidence)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
@@ -137,9 +137,9 @@ def run_camera(detector, emotion_predictor, camera_id: int = 0, threshold: float
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
@@ -151,7 +151,7 @@ def run_camera(detector, emotion_predictor, camera_id: int = 0, threshold: float
)
for face in faces:
result = emotion_predictor.predict(frame, face)
result = emotion_predictor.predict(frame, face.landmarks)
draw_emotion_label(frame, face.bbox, result.emotion, result.confidence)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

View File

@@ -60,7 +60,7 @@ def process_image(
)
for i, face in enumerate(faces):
result = fairface.predict(image, face)
result = fairface.predict(image, face.bbox)
print(f' Face {i + 1}: {result.sex}, {result.age_group}, {result.race}')
draw_fairface_label(image, face.bbox, result.sex, result.age_group, result.race)
@@ -112,7 +112,7 @@ def process_video(
)
for face in faces:
result = fairface.predict(frame, face)
result = fairface.predict(frame, face.bbox)
draw_fairface_label(frame, face.bbox, result.sex, result.age_group, result.race)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
@@ -137,9 +137,9 @@ def run_camera(detector, fairface, camera_id: int = 0, threshold: float = 0.6):
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
@@ -151,7 +151,7 @@ def run_camera(detector, fairface, camera_id: int = 0, threshold: float = 0.6):
)
for face in faces:
result = fairface.predict(frame, face)
result = fairface.predict(frame, face.bbox)
draw_fairface_label(frame, face.bbox, result.sex, result.age_group, result.race)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

View File

@@ -97,9 +97,9 @@ def run_camera(detector, recognizer, store: FAISS, camera_id: int = 0, threshold
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
frame = process_frame(frame, detector, recognizer, store, threshold)

View File

@@ -1,181 +0,0 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""Head pose estimation on detected faces.
Usage:
python tools/headpose.py --source path/to/image.jpg
python tools/headpose.py --source path/to/video.mp4
python tools/headpose.py --source 0 # webcam
python tools/headpose.py --source path/to/image.jpg --draw-type axis
"""
from __future__ import annotations
import argparse
import os
from pathlib import Path
from _common import get_source_type
import cv2
from uniface.detection import RetinaFace
from uniface.draw import draw_head_pose
from uniface.headpose import HeadPose
def process_image(detector, head_pose_estimator, image_path: str, save_dir: str = 'outputs', draw_type: str = 'cube'):
"""Process a single image."""
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
for i, face in enumerate(faces):
bbox = face.bbox
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size == 0:
continue
result = head_pose_estimator.estimate(face_crop)
print(f' Face {i + 1}: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
draw_head_pose(image, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_headpose.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def process_video(detector, head_pose_estimator, video_path: str, save_dir: str = 'outputs', draw_type: str = 'cube'):
"""Process a video file."""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"Error: Cannot open video file '{video_path}'")
return
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(video_path).stem}_headpose.mp4')
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
print(f'Processing video: {video_path} ({total_frames} frames)')
frame_count = 0
while True:
ret, frame = cap.read()
if not ret:
break
frame_count += 1
faces = detector.detect(frame)
for face in faces:
bbox = face.bbox
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = frame[y1:y2, x1:x2]
if face_crop.size == 0:
continue
result = head_pose_estimator.estimate(face_crop)
draw_head_pose(frame, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
out.write(frame)
if frame_count % 100 == 0:
print(f' Processed {frame_count}/{total_frames} frames...')
cap.release()
out.release()
print(f'Done! Output saved: {output_path}')
def run_camera(detector, head_pose_estimator, camera_id: int = 0, draw_type: str = 'cube'):
"""Run real-time detection on webcam."""
cap = cv2.VideoCapture(camera_id)
if not cap.isOpened():
print(f'Cannot open camera {camera_id}')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
for face in faces:
bbox = face.bbox
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = frame[y1:y2, x1:x2]
if face_crop.size == 0:
continue
result = head_pose_estimator.estimate(face_crop)
draw_head_pose(frame, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Head Pose Estimation', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description='Run head pose estimation')
parser.add_argument('--source', type=str, required=True, help='Image/video path or camera ID (0, 1, ...)')
parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
parser.add_argument(
'--draw-type',
type=str,
default='cube',
choices=['cube', 'axis'],
help='Visualization type: cube (default) or axis',
)
args = parser.parse_args()
detector = RetinaFace()
head_pose_estimator = HeadPose()
source_type = get_source_type(args.source)
if source_type == 'camera':
run_camera(detector, head_pose_estimator, int(args.source), args.draw_type)
elif source_type == 'image':
if not os.path.exists(args.source):
print(f'Error: Image not found: {args.source}')
return
process_image(detector, head_pose_estimator, args.source, args.save_dir, args.draw_type)
elif source_type == 'video':
if not os.path.exists(args.source):
print(f'Error: Video not found: {args.source}')
return
process_video(detector, head_pose_estimator, args.source, args.save_dir, args.draw_type)
else:
print(f"Error: Unknown source type for '{args.source}'")
print('Supported formats: images (.jpg, .png, ...), videos (.mp4, .avi, ...), or camera ID (0, 1, ...)')
if __name__ == '__main__':
main()

View File

@@ -114,9 +114,9 @@ def run_camera(detector, landmarker, camera_id: int = 0):
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)

View File

@@ -41,13 +41,12 @@ def run_inference(detector, recognizer, image_path: str):
print(f'Detected {len(faces)} face(s). Extracting embedding for the first face...')
landmarks = faces[0].landmarks
landmarks = faces[0].landmarks # 5-point landmarks for alignment (already np.ndarray)
embedding = recognizer.get_embedding(image, landmarks)
raw_norm = np.linalg.norm(embedding)
norm_embedding = embedding.ravel() / raw_norm if raw_norm > 0 else embedding.ravel()
norm_embedding = recognizer.get_normalized_embedding(image, landmarks) # L2 normalized
print(f' Embedding shape: {embedding.shape}')
print(f' L2 norm (raw): {raw_norm:.4f}')
print(f' L2 norm (raw): {np.linalg.norm(embedding):.4f}')
print(f' L2 norm (normalized): {np.linalg.norm(norm_embedding):.4f}')

View File

@@ -109,9 +109,9 @@ def run_camera(detector, recognizer, ref_embedding: np.ndarray, camera_id: int =
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
frame = process_frame(frame, detector, recognizer, ref_embedding, threshold)

View File

@@ -134,9 +134,9 @@ def run_camera(
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
if not ret:
break
frame = cv2.flip(frame, 1)
# Detect faces
faces = detector.detect(frame)

180
uniface-cpp/.clang-format Normal file
View File

@@ -0,0 +1,180 @@
---
# Modern C++ style based on Google with enhancements
Language: Cpp
Standard: c++17
BasedOnStyle: Google
ColumnLimit: 100
IndentWidth: 4
TabWidth: 4
UseTab: Never
# Access modifiers
AccessModifierOffset: -4
IndentAccessModifiers: false
# Alignment
AlignAfterOpenBracket: BlockIndent
AlignArrayOfStructures: Right
AlignConsecutiveAssignments:
Enabled: false
AlignConsecutiveBitFields:
Enabled: true
AlignConsecutiveDeclarations:
Enabled: false
AlignConsecutiveMacros:
Enabled: true
AlignEscapedNewlines: Left
AlignOperands: AlignAfterOperator
AlignTrailingComments:
Kind: Always
OverEmptyLines: 1
# Arguments and parameters
AllowAllArgumentsOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
BinPackArguments: false
BinPackParameters: false
# Short forms
AllowShortBlocksOnASingleLine: Empty
AllowShortCaseLabelsOnASingleLine: false
AllowShortEnumsOnASingleLine: false
AllowShortFunctionsOnASingleLine: Inline
AllowShortIfStatementsOnASingleLine: Never
AllowShortLambdasOnASingleLine: Inline
AllowShortLoopsOnASingleLine: false
# Break behavior
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: Yes
BreakAfterAttributes: Leave
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Attach
BreakBeforeConceptDeclarations: Always
BreakBeforeTernaryOperators: true
BreakConstructorInitializers: BeforeComma
BreakInheritanceList: BeforeComma
BreakStringLiterals: true
# Braces
InsertBraces: false
RemoveBracesLLVM: false
# Constructors
PackConstructorInitializers: CurrentLine
ConstructorInitializerIndentWidth: 4
# Empty lines
EmptyLineAfterAccessModifier: Never
EmptyLineBeforeAccessModifier: LogicalBlock
KeepEmptyLinesAtTheStartOfBlocks: false
MaxEmptyLinesToKeep: 1
SeparateDefinitionBlocks: Always
# Includes
IncludeBlocks: Regroup
IncludeCategories:
# Main header (same name as source file)
- Regex: '^"([a-zA-Z0-9_]+)\.(h|hpp)"$'
Priority: 1
SortPriority: 1
CaseSensitive: true
# Project headers
- Regex: '^".*"$'
Priority: 2
SortPriority: 2
# C system headers
- Regex: '^<(assert|complex|ctype|errno|fenv|float|inttypes|iso646|limits|locale|math|setjmp|signal|stdalign|stdarg|stdatomic|stdbool|stddef|stdint|stdio|stdlib|stdnoreturn|string|tgmath|threads|time|uchar|wchar|wctype)\.h>$'
Priority: 3
SortPriority: 3
# C++ standard library
- Regex: '^<[a-z_]+>$'
Priority: 4
SortPriority: 4
# External libraries
- Regex: '^<.*>$'
Priority: 5
SortPriority: 5
SortIncludes: CaseSensitive
# Indentation
IndentCaseBlocks: false
IndentCaseLabels: true
IndentExternBlock: NoIndent
IndentGotoLabels: false
IndentPPDirectives: AfterHash
IndentRequiresClause: true
IndentWrappedFunctionNames: false
# Lambdas
LambdaBodyIndentation: Signature
# Namespaces
CompactNamespaces: false
FixNamespaceComments: true
NamespaceIndentation: None
ShortNamespaceLines: 0
# Penalties (guide formatting decisions)
PenaltyBreakAssignment: 25
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakOpenParenthesis: 0
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyIndentedWhitespace: 0
PenaltyReturnTypeOnItsOwnLine: 200
# Pointers and references
DerivePointerAlignment: false
PointerAlignment: Left
ReferenceAlignment: Pointer
QualifierAlignment: Leave
# Requires clause (C++20 concepts)
RequiresClausePosition: OwnLine
RequiresExpressionIndentation: OuterScope
# Spacing
BitFieldColonSpacing: Both
SpaceAfterCStyleCast: false
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceAroundPointerQualifiers: Default
SpaceBeforeAssignmentOperators: true
SpaceBeforeCaseColon: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceBeforeRangeBasedForLoopColon: true
SpaceBeforeSquareBrackets: false
SpaceInEmptyBlock: false
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles: Never
SpacesInCStyleCastParentheses: false
SpacesInConditionalStatement: false
SpacesInContainerLiterals: false
SpacesInLineCommentPrefix:
Minimum: 1
Maximum: -1
SpacesInParentheses: false
SpacesInSquareBrackets: false
# Other
Cpp11BracedListStyle: true
InsertNewlineAtEOF: true
InsertTrailingCommas: None
IntegerLiteralSeparator:
Binary: 4
Decimal: 3
Hex: 4
ReflowComments: true
RemoveSemicolon: false
SortUsingDeclarations: LexicographicNumeric
...

View File

@@ -0,0 +1,51 @@
cmake_minimum_required(VERSION 3.14)
project(uniface
VERSION 1.0.0
DESCRIPTION "Uniface C++ face analysis library"
LANGUAGES CXX
)
# Options
option(UNIFACE_BUILD_EXAMPLES "Build example programs" ON)
# C++ standard
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
# Compiler warnings
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU|Clang")
add_compile_options(-Wall -Wextra -Wpedantic)
elseif(MSVC)
add_compile_options(/W4)
endif()
# Find dependencies
find_package(OpenCV REQUIRED COMPONENTS core imgproc dnn calib3d)
# Library
add_library(uniface
src/utils.cpp
src/detector.cpp
src/recognizer.cpp
src/landmarker.cpp
src/analyzer.cpp
)
target_include_directories(uniface
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>
)
target_link_libraries(uniface
PUBLIC
${OpenCV_LIBS}
)
# Examples
if(UNIFACE_BUILD_EXAMPLES)
add_subdirectory(examples)
endif()

69
uniface-cpp/README.md Normal file
View File

@@ -0,0 +1,69 @@
# Uniface C++
C++ implementation of the Uniface face analysis library.
## Features
- **Face Detection** - RetinaFace detector with 5-point landmarks
## Requirements
- C++17 compiler
- CMake 3.14+
- OpenCV 4.x
## Build
```bash
mkdir build && cd build
cmake ..
make -j$(nproc)
```
## Usage
### Image Detection
```bash
./examples/detect <model_path> <image_path>
```
### Webcam Demo
```bash
./examples/webcam <model_path> [camera_id]
```
### Code Example
```cpp
#include <uniface/uniface.hpp>
#include <opencv2/highgui.hpp>
int main() {
uniface::RetinaFace detector("retinaface.onnx");
cv::Mat image = cv::imread("photo.jpg");
auto faces = detector.detect(image);
for (const auto& face : faces) {
cv::rectangle(image, face.bbox, cv::Scalar(0, 255, 0), 2);
}
cv::imwrite("result.jpg", image);
return 0;
}
```
## Models
Download models from the main uniface repository or use:
```bash
# RetinaFace MobileNet V2
wget https://github.com/your-repo/uniface/releases/download/v1.0/retinaface_mv2.onnx -P models/
```
## License
Same license as the main uniface project.

View File

@@ -0,0 +1,23 @@
# Examples
find_package(OpenCV REQUIRED COMPONENTS highgui imgcodecs videoio)
# Image detection example
add_executable(detect detect.cpp)
target_link_libraries(detect PRIVATE uniface ${OpenCV_LIBS})
# Face recognition example
add_executable(recognize recognize.cpp)
target_link_libraries(recognize PRIVATE uniface ${OpenCV_LIBS})
# Facial landmarks example
add_executable(landmarks landmarks.cpp)
target_link_libraries(landmarks PRIVATE uniface ${OpenCV_LIBS})
# Face analyzer example
add_executable(analyzer analyzer.cpp)
target_link_libraries(analyzer PRIVATE uniface ${OpenCV_LIBS})
# Webcam example
add_executable(webcam webcam.cpp)
target_link_libraries(webcam PRIVATE uniface ${OpenCV_LIBS})

View File

@@ -0,0 +1,113 @@
#include <iomanip>
#include <iostream>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <uniface/uniface.hpp>
int main(int argc, char** argv) {
if (argc < 3) {
std::cout << "Usage: " << argv[0]
<< " <detector_model> <image_path> [recognizer_model] [landmark_model]"
<< std::endl;
std::cout << "\nAnalyzes faces in an image using available models." << std::endl;
std::cout << " - detector_model: Required. Path to face detector ONNX model." << std::endl;
std::cout << " - recognizer_model: Optional. Path to face recognizer ONNX model."
<< std::endl;
std::cout << " - landmark_model: Optional. Path to 106-point landmark ONNX model."
<< std::endl;
return 1;
}
const std::string detector_path = argv[1];
const std::string image_path = argv[2];
const std::string recognizer_path = (argc > 3) ? argv[3] : "";
const std::string landmark_path = (argc > 4) ? argv[4] : "";
try {
// Create analyzer and load components
uniface::FaceAnalyzer analyzer;
std::cout << "Loading detector: " << detector_path << std::endl;
analyzer.loadDetector(detector_path);
if (!recognizer_path.empty()) {
std::cout << "Loading recognizer: " << recognizer_path << std::endl;
analyzer.loadRecognizer(recognizer_path);
}
if (!landmark_path.empty()) {
std::cout << "Loading landmarker: " << landmark_path << std::endl;
analyzer.loadLandmarker(landmark_path);
}
// Load image
cv::Mat image = cv::imread(image_path);
if (image.empty()) {
std::cerr << "Failed to load image: " << image_path << std::endl;
return 1;
}
std::cout << "\nAnalyzing image..." << std::endl;
// Analyze faces
auto results = analyzer.analyze(image);
std::cout << "Found " << results.size() << " face(s)\n" << std::endl;
// Process each face
for (size_t i = 0; i < results.size(); ++i) {
const auto& result = results[i];
std::cout << "Face " << (i + 1) << ":" << std::endl;
std::cout << " BBox: [" << result.face.bbox.x << ", " << result.face.bbox.y << ", "
<< result.face.bbox.width << ", " << result.face.bbox.height << "]"
<< std::endl;
std::cout << std::fixed << std::setprecision(3);
std::cout << " Confidence: " << result.face.confidence << std::endl;
// Draw bounding box
cv::rectangle(image, result.face.bbox, cv::Scalar(0, 255, 0), 2);
// Draw 5-point landmarks from detector
for (const auto& pt : result.face.landmarks) {
cv::circle(image, pt, 3, cv::Scalar(0, 0, 255), -1);
}
// If 106-point landmarks available
if (result.landmarks) {
std::cout << " Landmarks: 106 points detected" << std::endl;
for (const auto& pt : result.landmarks->points) {
cv::circle(image, pt, 1, cv::Scalar(0, 255, 255), -1);
}
}
// If embedding available
if (result.embedding) {
// Show first few values of embedding
std::cout << " Embedding: [";
for (size_t j = 0; j < 5; ++j) {
std::cout << (*result.embedding)[j];
if (j < 4)
std::cout << ", ";
}
std::cout << ", ... ] (512-dim)" << std::endl;
}
std::cout << std::endl;
}
// Save result
cv::imwrite("analyzer_result.jpg", image);
std::cout << "Saved result to analyzer_result.jpg" << std::endl;
} catch (const cv::Exception& e) {
std::cerr << "OpenCV Error: " << e.what() << std::endl;
return 1;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
return 0;
}

View File

@@ -0,0 +1,48 @@
#include <iostream>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <uniface/uniface.hpp>
int main(int argc, char** argv) {
if (argc < 3) {
std::cout << "Usage: " << argv[0] << " <model_path> <image_path>" << std::endl;
return 1;
}
const std::string model_path = argv[1];
const std::string image_path = argv[2];
try {
uniface::RetinaFace detector(model_path);
cv::Mat image = cv::imread(image_path);
if (image.empty()) {
std::cerr << "Failed to load image: " << image_path << std::endl;
return 1;
}
const auto faces = detector.detect(image);
std::cout << "Detected " << faces.size() << " faces." << std::endl;
// Draw results
for (const auto& face : faces) {
cv::rectangle(image, face.bbox, cv::Scalar(0, 255, 0), 2);
for (const auto& pt : face.landmarks) {
cv::circle(image, pt, 2, cv::Scalar(0, 0, 255), -1);
}
}
cv::imwrite("result.jpg", image);
std::cout << "Saved result to result.jpg" << std::endl;
} catch (const cv::Exception& e) {
std::cerr << "OpenCV Error: " << e.what() << std::endl;
return 1;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
return 0;
}

View File

@@ -0,0 +1,66 @@
#include <iostream>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <uniface/uniface.hpp>
int main(int argc, char** argv) {
if (argc < 4) {
std::cout << "Usage: " << argv[0] << " <detector_model> <landmark_model> <image_path>"
<< std::endl;
std::cout << "\nDetects 106-point facial landmarks and saves visualization." << std::endl;
return 1;
}
const std::string detector_path = argv[1];
const std::string landmark_path = argv[2];
const std::string image_path = argv[3];
try {
// Load models
uniface::RetinaFace detector(detector_path);
uniface::Landmark106 landmarker(landmark_path);
// Load image
cv::Mat image = cv::imread(image_path);
if (image.empty()) {
std::cerr << "Failed to load image: " << image_path << std::endl;
return 1;
}
// Detect faces
auto faces = detector.detect(image);
std::cout << "Detected " << faces.size() << " face(s)" << std::endl;
// Process each face
for (size_t i = 0; i < faces.size(); ++i) {
const auto& face = faces[i];
// Draw bounding box
cv::rectangle(image, face.bbox, cv::Scalar(0, 255, 0), 2);
// Get 106-point landmarks
auto landmarks = landmarker.getLandmarks(image, face.bbox);
// Draw all 106 points
for (const auto& pt : landmarks.points) {
cv::circle(image, pt, 1, cv::Scalar(0, 255, 255), -1);
}
std::cout << "Face " << (i + 1) << ": 106 landmarks detected" << std::endl;
}
// Save result
cv::imwrite("landmarks_result.jpg", image);
std::cout << "Saved result to landmarks_result.jpg" << std::endl;
} catch (const cv::Exception& e) {
std::cerr << "OpenCV Error: " << e.what() << std::endl;
return 1;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
return 0;
}

View File

@@ -0,0 +1,81 @@
#include <iomanip>
#include <iostream>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <uniface/uniface.hpp>
int main(int argc, char** argv) {
if (argc < 5) {
std::cout << "Usage: " << argv[0]
<< " <detector_model> <recognizer_model> <image1> <image2>" << std::endl;
std::cout << "\nCompares faces from two images and outputs similarity score." << std::endl;
return 1;
}
const std::string detector_path = argv[1];
const std::string recognizer_path = argv[2];
const std::string image1_path = argv[3];
const std::string image2_path = argv[4];
try {
// Load models
uniface::RetinaFace detector(detector_path);
uniface::ArcFace recognizer(recognizer_path);
// Load images
cv::Mat image1 = cv::imread(image1_path);
cv::Mat image2 = cv::imread(image2_path);
if (image1.empty()) {
std::cerr << "Failed to load image: " << image1_path << std::endl;
return 1;
}
if (image2.empty()) {
std::cerr << "Failed to load image: " << image2_path << std::endl;
return 1;
}
// Detect faces
auto faces1 = detector.detect(image1);
auto faces2 = detector.detect(image2);
if (faces1.empty()) {
std::cerr << "No face detected in image1" << std::endl;
return 1;
}
if (faces2.empty()) {
std::cerr << "No face detected in image2" << std::endl;
return 1;
}
std::cout << "Detected " << faces1.size() << " face(s) in image1" << std::endl;
std::cout << "Detected " << faces2.size() << " face(s) in image2" << std::endl;
// Get embeddings for first face in each image
auto embedding1 = recognizer.getNormalizedEmbedding(image1, faces1[0].landmarks);
auto embedding2 = recognizer.getNormalizedEmbedding(image2, faces2[0].landmarks);
// Compute similarity
float similarity = uniface::cosineSimilarity(embedding1, embedding2);
std::cout << std::fixed << std::setprecision(4);
std::cout << "\nCosine Similarity: " << similarity << std::endl;
// Interpretation
if (similarity > 0.4f) {
std::cout << "Result: Same person (similarity > 0.4)" << std::endl;
} else {
std::cout << "Result: Different persons (similarity <= 0.4)" << std::endl;
}
} catch (const cv::Exception& e) {
std::cerr << "OpenCV Error: " << e.what() << std::endl;
return 1;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
return 0;
}

View File

@@ -0,0 +1,209 @@
#include <chrono>
#include <iostream>
#include <memory>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/videoio.hpp>
#include <uniface/uniface.hpp>
int main(int argc, char** argv) {
if (argc < 2) {
std::cout << "Usage: " << argv[0] << " <detector_model> [landmark_model] [camera_id]"
<< std::endl;
std::cout << "\nArguments:" << std::endl;
std::cout << " detector_model : Path to face detector ONNX model (required)" << std::endl;
std::cout << " landmark_model : Path to 106-point landmark ONNX model (optional)"
<< std::endl;
std::cout << " camera_id : Camera device ID, default 0 (optional)" << std::endl;
std::cout << "\nExamples:" << std::endl;
std::cout << " " << argv[0] << " detector.onnx" << std::endl;
std::cout << " " << argv[0] << " detector.onnx landmark.onnx" << std::endl;
std::cout << " " << argv[0] << " detector.onnx landmark.onnx 1" << std::endl;
return 1;
}
const std::string detector_path = argv[1];
std::string landmark_path;
int camera_id = 0;
// Parse arguments - landmark_model is optional
if (argc >= 3) {
// Check if argv[2] is a number (camera_id) or a path (landmark_model)
if (std::isdigit(argv[2][0]) && strlen(argv[2]) <= 2) {
camera_id = std::atoi(argv[2]);
} else {
landmark_path = argv[2];
if (argc >= 4) {
camera_id = std::atoi(argv[3]);
}
}
}
try {
// Load detector
std::cout << "Loading detector: " << detector_path << std::endl;
uniface::RetinaFace detector(detector_path);
std::cout << "Detector loaded!" << std::endl;
// Load landmark model if provided
std::unique_ptr<uniface::Landmark106> landmarker;
if (!landmark_path.empty()) {
std::cout << "Loading landmarker: " << landmark_path << std::endl;
landmarker = std::make_unique<uniface::Landmark106>(landmark_path);
std::cout << "Landmarker loaded!" << std::endl;
}
// Open camera
cv::VideoCapture cap(camera_id);
if (!cap.isOpened()) {
std::cerr << "Error: Cannot open camera " << camera_id << std::endl;
return 1;
}
const int frame_width = static_cast<int>(cap.get(cv::CAP_PROP_FRAME_WIDTH));
const int frame_height = static_cast<int>(cap.get(cv::CAP_PROP_FRAME_HEIGHT));
std::cout << "\nCamera opened: " << frame_width << "x" << frame_height << std::endl;
std::cout << "Press 'q' to quit, 's' to save screenshot, 'l' to toggle landmarks"
<< std::endl;
cv::Mat frame;
int frame_count = 0;
double total_time = 0.0;
bool show_landmarks = true; // Toggle for 106-point landmarks
while (true) {
cap >> frame;
if (frame.empty()) {
std::cerr << "Error: Empty frame captured" << std::endl;
break;
}
const auto start = std::chrono::high_resolution_clock::now();
// Detect faces
const auto faces = detector.detect(frame);
// Get 106-point landmarks if available
std::vector<uniface::Landmarks> all_landmarks;
if (landmarker && show_landmarks) {
all_landmarks.reserve(faces.size());
for (const auto& face : faces) {
all_landmarks.push_back(landmarker->getLandmarks(frame, face.bbox));
}
}
const auto end = std::chrono::high_resolution_clock::now();
const std::chrono::duration<double, std::milli> elapsed = end - start;
const double inference_time = elapsed.count();
++frame_count;
total_time += inference_time;
const double avg_time = total_time / static_cast<double>(frame_count);
const double fps = 1000.0 / avg_time;
// Draw results
for (size_t i = 0; i < faces.size(); ++i) {
const auto& face = faces[i];
// Draw bounding box
cv::rectangle(frame, face.bbox, cv::Scalar(0, 255, 0), 2);
// Draw 5-point landmarks from detector
for (size_t j = 0; j < face.landmarks.size(); ++j) {
cv::Scalar color;
if (j < 2) {
color = cv::Scalar(255, 0, 0); // Eyes - Blue
} else if (j == 2) {
color = cv::Scalar(0, 255, 0); // Nose - Green
} else {
color = cv::Scalar(0, 0, 255); // Mouth - Red
}
cv::circle(frame, face.landmarks[j], 3, color, -1);
}
// Draw 106-point landmarks if available
if (i < all_landmarks.size()) {
const auto& lm = all_landmarks[i];
// Draw all 106 points
for (const auto& pt : lm.points) {
cv::circle(frame, pt, 1, cv::Scalar(0, 255, 255), -1);
}
}
// Draw confidence
const std::string conf_text = cv::format("%.2f", face.confidence);
const cv::Point text_org(
static_cast<int>(face.bbox.x), static_cast<int>(face.bbox.y) - 5
);
cv::putText(
frame,
conf_text,
text_org,
cv::FONT_HERSHEY_SIMPLEX,
0.5,
cv::Scalar(0, 255, 0),
1
);
}
// Draw info overlay
std::string mode = landmarker
? (show_landmarks ? "Detection + 106 Landmarks" : "Detection Only")
: "Detection Only";
const std::string info_text = cv::format(
"FPS: %.1f | Faces: %zu | Time: %.1fms", fps, faces.size(), inference_time
);
cv::putText(
frame,
info_text,
cv::Point(10, 30),
cv::FONT_HERSHEY_SIMPLEX,
0.7,
cv::Scalar(0, 255, 0),
2
);
cv::putText(
frame,
mode,
cv::Point(10, 60),
cv::FONT_HERSHEY_SIMPLEX,
0.6,
cv::Scalar(255, 255, 0),
2
);
cv::imshow("Uniface - Face Detection & Landmarks", frame);
const char key = static_cast<char>(cv::waitKey(1));
if (key == 'q' || key == 27) {
break;
} else if (key == 's') {
const std::string filename = cv::format("screenshot_%d.jpg", frame_count);
cv::imwrite(filename, frame);
std::cout << "Screenshot saved: " << filename << std::endl;
} else if (key == 'l' && landmarker) {
show_landmarks = !show_landmarks;
std::cout << "106-point landmarks: " << (show_landmarks ? "ON" : "OFF")
<< std::endl;
}
}
cap.release();
cv::destroyAllWindows();
std::cout << "\n=== Statistics ===" << std::endl;
std::cout << "Total frames: " << frame_count << std::endl;
std::cout << "Average inference time: " << (total_time / frame_count) << " ms" << std::endl;
} catch (const cv::Exception& e) {
std::cerr << "OpenCV Error: " << e.what() << std::endl;
return 1;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
return 0;
}

View File

@@ -0,0 +1,63 @@
#ifndef UNIFACE_ANALYZER_HPP_
#define UNIFACE_ANALYZER_HPP_
#include "uniface/detector.hpp"
#include "uniface/landmarker.hpp"
#include "uniface/recognizer.hpp"
#include "uniface/types.hpp"
#include <memory>
#include <optional>
#include <string>
#include <vector>
namespace uniface {
// Result of face analysis
struct AnalyzedFace {
Face face; // detection result (bbox, confidence, 5-point landmarks)
std::optional<Landmarks> landmarks; // 106-point landmarks (if landmarker loaded)
std::optional<Embedding> embedding; // face embedding (if recognizer loaded)
};
// Unified face analysis combining detection, recognition, and landmarks
class FaceAnalyzer {
public:
FaceAnalyzer() = default;
~FaceAnalyzer() = default;
FaceAnalyzer(const FaceAnalyzer&) = delete;
FaceAnalyzer& operator=(const FaceAnalyzer&) = delete;
FaceAnalyzer(FaceAnalyzer&&) = default;
FaceAnalyzer& operator=(FaceAnalyzer&&) = default;
// Load components (returns *this for chaining)
FaceAnalyzer& loadDetector(const std::string& path, const DetectorConfig& config = DetectorConfig{});
FaceAnalyzer& loadRecognizer(const std::string& path, const RecognizerConfig& config = RecognizerConfig{});
FaceAnalyzer& loadLandmarker(const std::string& path, const LandmarkerConfig& config = LandmarkerConfig{});
// Analyze faces in BGR image (throws if detector not loaded)
[[nodiscard]] std::vector<AnalyzedFace> analyze(const cv::Mat& image);
// Component checks
[[nodiscard]] bool hasDetector() const noexcept { return detector_ != nullptr; }
[[nodiscard]] bool hasRecognizer() const noexcept { return recognizer_ != nullptr; }
[[nodiscard]] bool hasLandmarker() const noexcept { return landmarker_ != nullptr; }
// Direct component access
[[nodiscard]] RetinaFace* detector() noexcept { return detector_.get(); }
[[nodiscard]] ArcFace* recognizer() noexcept { return recognizer_.get(); }
[[nodiscard]] Landmark106* landmarker() noexcept { return landmarker_.get(); }
[[nodiscard]] const RetinaFace* detector() const noexcept { return detector_.get(); }
[[nodiscard]] const ArcFace* recognizer() const noexcept { return recognizer_.get(); }
[[nodiscard]] const Landmark106* landmarker() const noexcept { return landmarker_.get(); }
private:
std::unique_ptr<RetinaFace> detector_;
std::unique_ptr<ArcFace> recognizer_;
std::unique_ptr<Landmark106> landmarker_;
};
} // namespace uniface
#endif // UNIFACE_ANALYZER_HPP_

View File

@@ -0,0 +1,47 @@
#ifndef UNIFACE_DETECTOR_HPP_
#define UNIFACE_DETECTOR_HPP_
#include "uniface/types.hpp"
#include <array>
#include <string>
#include <vector>
#include <opencv2/dnn.hpp>
namespace uniface {
// RetinaFace detector using OpenCV DNN backend
class RetinaFace {
public:
explicit RetinaFace(
const std::string& model_path,
float conf_thresh = 0.5f,
float nms_thresh = 0.4f,
cv::Size input_size = cv::Size(640, 640)
);
// Detect faces in BGR image, returns bboxes + 5-point landmarks
[[nodiscard]] std::vector<Face> detect(const cv::Mat& image);
// Accessors
[[nodiscard]] float getConfidenceThreshold() const noexcept { return confidence_threshold_; }
[[nodiscard]] float getNmsThreshold() const noexcept { return nms_threshold_; }
[[nodiscard]] cv::Size getInputSize() const noexcept { return input_size_; }
void setConfidenceThreshold(float threshold) noexcept { confidence_threshold_ = threshold; }
void setNmsThreshold(float threshold) noexcept { nms_threshold_ = threshold; }
private:
cv::dnn::Net net_;
float confidence_threshold_;
float nms_threshold_;
cv::Size input_size_;
std::vector<std::array<float, 4>> anchors_;
void generateAnchors();
};
} // namespace uniface
#endif // UNIFACE_DETECTOR_HPP_

View File

@@ -0,0 +1,32 @@
#ifndef UNIFACE_LANDMARKER_HPP_
#define UNIFACE_LANDMARKER_HPP_
#include "uniface/types.hpp"
#include <string>
#include <opencv2/dnn.hpp>
namespace uniface {
// 106-point facial landmark detector
class Landmark106 {
public:
explicit Landmark106(const std::string& model_path, const LandmarkerConfig& config = LandmarkerConfig{});
// Detect 106 landmarks for a face, returns points in original image coordinates
[[nodiscard]] Landmarks getLandmarks(const cv::Mat& image, const cv::Rect2f& bbox);
[[nodiscard]] cv::Size getInputSize() const noexcept { return config_.input_size; }
private:
cv::dnn::Net net_;
LandmarkerConfig config_;
[[nodiscard]] cv::Mat preprocess(const cv::Mat& image, const cv::Rect2f& bbox, cv::Mat& transform);
[[nodiscard]] Landmarks postprocess(const cv::Mat& predictions, const cv::Mat& transform);
};
} // namespace uniface
#endif // UNIFACE_LANDMARKER_HPP_

View File

@@ -0,0 +1,37 @@
#ifndef UNIFACE_RECOGNIZER_HPP_
#define UNIFACE_RECOGNIZER_HPP_
#include "uniface/types.hpp"
#include <string>
#include <opencv2/dnn.hpp>
namespace uniface {
// ArcFace face recognition (MobileNet/ResNet backbones)
class ArcFace {
public:
explicit ArcFace(const std::string& model_path, const RecognizerConfig& config = RecognizerConfig{});
// Get 512-dim embedding from pre-aligned 112x112 face
[[nodiscard]] Embedding getEmbedding(const cv::Mat& aligned_face);
// Get 512-dim embedding with automatic alignment
[[nodiscard]] Embedding getEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks);
// Get L2-normalized embedding with automatic alignment
[[nodiscard]] Embedding getNormalizedEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks);
[[nodiscard]] cv::Size getInputSize() const noexcept { return config_.input_size; }
private:
cv::dnn::Net net_;
RecognizerConfig config_;
[[nodiscard]] cv::Mat preprocess(const cv::Mat& face_image);
};
} // namespace uniface
#endif // UNIFACE_RECOGNIZER_HPP_

View File

@@ -0,0 +1,45 @@
#ifndef UNIFACE_TYPES_HPP_
#define UNIFACE_TYPES_HPP_
#include <array>
#include <vector>
#include <opencv2/core.hpp>
namespace uniface {
// Detected face with bbox, confidence, and 5-point landmarks
struct Face {
cv::Rect2f bbox;
float confidence;
std::array<cv::Point2f, 5> landmarks; // left_eye, right_eye, nose, left_mouth, right_mouth
};
// 512-dimensional face embedding
using Embedding = std::array<float, 512>;
// 106-point facial landmarks
struct Landmarks {
std::array<cv::Point2f, 106> points;
};
// Configuration structs
struct DetectorConfig {
float conf_thresh = 0.5f;
float nms_thresh = 0.4f;
cv::Size input_size = cv::Size(640, 640);
};
struct RecognizerConfig {
float input_mean = 127.5f;
float input_std = 127.5f;
cv::Size input_size = cv::Size(112, 112);
};
struct LandmarkerConfig {
cv::Size input_size = cv::Size(192, 192);
};
} // namespace uniface
#endif // UNIFACE_TYPES_HPP_

View File

@@ -0,0 +1,11 @@
#ifndef UNIFACE_HPP_
#define UNIFACE_HPP_
#include "uniface/analyzer.hpp"
#include "uniface/detector.hpp"
#include "uniface/landmarker.hpp"
#include "uniface/recognizer.hpp"
#include "uniface/types.hpp"
#include "uniface/utils.hpp"
#endif // UNIFACE_HPP_

View File

@@ -0,0 +1,58 @@
#ifndef UNIFACE_UTILS_HPP_
#define UNIFACE_UTILS_HPP_
#include "uniface/types.hpp"
#include <array>
#include <cmath>
#include <opencv2/core.hpp>
namespace uniface {
// Reference 5-point landmarks for ArcFace alignment (112x112)
inline constexpr std::array<float, 10> kReferenceAlignment = {
38.2946f, 51.6963f, // left eye
73.5318f, 51.5014f, // right eye
56.0252f, 71.7366f, // nose
41.5493f, 92.3655f, // left mouth
70.7299f, 92.2041f // right mouth
};
// Align face using 5-point landmarks (default 112x112 for ArcFace)
[[nodiscard]] cv::Mat alignFace(
const cv::Mat& image,
const std::array<cv::Point2f, 5>& landmarks,
cv::Size output_size = cv::Size(112, 112)
);
// Cosine similarity between embeddings, returns [-1, 1]
[[nodiscard]] float cosineSimilarity(const Embedding& a, const Embedding& b) noexcept;
// Apply 2x3 affine transform to points
template <size_t N>
[[nodiscard]] std::array<cv::Point2f, N> transformPoints2D(
const std::array<cv::Point2f, N>& points, const cv::Mat& transform
) {
std::array<cv::Point2f, N> result{};
for (size_t i = 0; i < N; ++i) {
const float x = points[i].x;
const float y = points[i].y;
result[i].x = static_cast<float>(
transform.at<double>(0, 0) * x + transform.at<double>(0, 1) * y +
transform.at<double>(0, 2)
);
result[i].y = static_cast<float>(
transform.at<double>(1, 0) * x + transform.at<double>(1, 1) * y +
transform.at<double>(1, 2)
);
}
return result;
}
// Letterbox resize preserving aspect ratio, returns scale factor
[[nodiscard]] float letterboxResize(const cv::Mat& src, cv::Mat& dst, cv::Size target_size);
} // namespace uniface
#endif // UNIFACE_UTILS_HPP_

View File

View File

@@ -0,0 +1,55 @@
#include "uniface/analyzer.hpp"
#include <stdexcept>
namespace uniface {
FaceAnalyzer& FaceAnalyzer::loadDetector(const std::string& path, const DetectorConfig& config) {
detector_ = std::make_unique<RetinaFace>(
path, config.conf_thresh, config.nms_thresh, config.input_size
);
return *this;
}
FaceAnalyzer& FaceAnalyzer::loadRecognizer(
const std::string& path, const RecognizerConfig& config
) {
recognizer_ = std::make_unique<ArcFace>(path, config);
return *this;
}
FaceAnalyzer& FaceAnalyzer::loadLandmarker(
const std::string& path, const LandmarkerConfig& config
) {
landmarker_ = std::make_unique<Landmark106>(path, config);
return *this;
}
std::vector<AnalyzedFace> FaceAnalyzer::analyze(const cv::Mat& image) {
if (!detector_) {
throw std::runtime_error("FaceAnalyzer: detector not loaded. Call loadDetector() first.");
}
auto faces = detector_->detect(image);
std::vector<AnalyzedFace> results;
results.reserve(faces.size());
for (const auto& face : faces) {
AnalyzedFace result;
result.face = face;
if (landmarker_) {
result.landmarks = landmarker_->getLandmarks(image, face.bbox);
}
if (recognizer_) {
result.embedding = recognizer_->getNormalizedEmbedding(image, face.landmarks);
}
results.push_back(std::move(result));
}
return results;
}
} // namespace uniface

View File

@@ -0,0 +1,204 @@
#include "uniface/detector.hpp"
#include "uniface/utils.hpp"
#include <cmath>
#include <iostream>
#include <opencv2/imgproc.hpp>
namespace uniface {
namespace {
// Model configuration constants
constexpr std::array<int, 3> kFeatureStrides = {8, 16, 32};
constexpr std::array<float, 2> kVariance = {0.1f, 0.2f};
constexpr int kNumLandmarks = 5;
// BGR mean values for image normalization
constexpr float kMeanB = 104.0f;
constexpr float kMeanG = 117.0f;
constexpr float kMeanR = 123.0f;
// Anchor min sizes for each feature map level
const std::vector<std::vector<int>> kMinSizes = {
{ 16, 32},
{ 64, 128},
{256, 512}
};
} // namespace
RetinaFace::RetinaFace(
const std::string& model_path, float conf_thresh, float nms_thresh, cv::Size input_size
)
: net_(cv::dnn::readNetFromONNX(model_path))
, confidence_threshold_(conf_thresh)
, nms_threshold_(nms_thresh)
, input_size_(input_size) {
generateAnchors();
}
void RetinaFace::generateAnchors() {
anchors_.clear();
size_t estimated_anchors = 0;
for (size_t k = 0; k < kFeatureStrides.size(); ++k) {
const int step = kFeatureStrides[k];
const auto feature_h = static_cast<size_t>(
std::ceil(static_cast<float>(input_size_.height) / static_cast<float>(step))
);
const auto feature_w = static_cast<size_t>(
std::ceil(static_cast<float>(input_size_.width) / static_cast<float>(step))
);
estimated_anchors += feature_h * feature_w * kMinSizes[k].size();
}
anchors_.reserve(estimated_anchors);
for (size_t k = 0; k < kFeatureStrides.size(); ++k) {
const int step = kFeatureStrides[k];
const int feature_h = static_cast<int>(
std::ceil(static_cast<float>(input_size_.height) / static_cast<float>(step))
);
const int feature_w = static_cast<int>(
std::ceil(static_cast<float>(input_size_.width) / static_cast<float>(step))
);
for (int i = 0; i < feature_h; ++i) {
for (int j = 0; j < feature_w; ++j) {
for (const int min_size : kMinSizes[k]) {
const float s_kx = static_cast<float>(min_size) /
static_cast<float>(input_size_.height);
const float s_ky = static_cast<float>(min_size) /
static_cast<float>(input_size_.width);
const float cx = (static_cast<float>(j) + 0.5f) * static_cast<float>(step) /
static_cast<float>(input_size_.height);
const float cy = (static_cast<float>(i) + 0.5f) * static_cast<float>(step) /
static_cast<float>(input_size_.width);
anchors_.push_back({cx, cy, s_kx, s_ky});
}
}
}
}
}
std::vector<Face> RetinaFace::detect(const cv::Mat& image) {
cv::Mat input_blob;
const float resize_factor = letterboxResize(image, input_blob, input_size_);
const cv::Mat blob = cv::dnn::blobFromImage(
input_blob, 1.0, cv::Size(), cv::Scalar(kMeanB, kMeanG, kMeanR), false, false
);
net_.setInput(blob);
const auto output_names = net_.getUnconnectedOutLayersNames();
std::vector<cv::Mat> outputs;
net_.forward(outputs, output_names);
if (outputs.size() < 3) {
std::cerr << "Error: Model output count mismatch. Expected at least 3, got "
<< outputs.size() << std::endl;
return {};
}
// Identify outputs by shape: loc(N,4), conf(N,2), landmarks(N,10)
cv::Mat loc_output, conf_output, land_output;
for (const auto& output : outputs) {
switch (output.size[2]) {
case 4: loc_output = output; break;
case 2: conf_output = output; break;
case 10: land_output = output; break;
default: break;
}
}
// Fallback to positional outputs
if (loc_output.empty()) loc_output = outputs[0];
if (conf_output.empty()) conf_output = outputs[1];
if (land_output.empty()) land_output = outputs[2];
const auto* loc_data = reinterpret_cast<const float*>(loc_output.data);
const auto* conf_data = reinterpret_cast<const float*>(conf_output.data);
const auto* land_data = reinterpret_cast<const float*>(land_output.data);
const auto num_priors = static_cast<size_t>(loc_output.size[1]);
if (num_priors != anchors_.size()) {
std::cerr << "Error: Anchor count mismatch! Expected " << anchors_.size()
<< " anchors but model output has " << num_priors << " priors.\n"
<< "This usually means the input size doesn't match the model's "
<< "expected size." << std::endl;
return {};
}
std::vector<cv::Rect2f> decoded_boxes;
std::vector<float> scores;
std::vector<std::array<cv::Point2f, 5>> decoded_landmarks;
decoded_boxes.reserve(num_priors);
scores.reserve(num_priors);
decoded_landmarks.reserve(num_priors);
const auto scale_w = static_cast<float>(input_size_.width);
const auto scale_h = static_cast<float>(input_size_.height);
for (size_t i = 0; i < num_priors; ++i) {
const float score = conf_data[i * 2 + 1];
if (score < confidence_threshold_) continue;
const float px = anchors_[i][0];
const float py = anchors_[i][1];
const float pw = anchors_[i][2];
const float ph = anchors_[i][3];
const float dx = loc_data[i * 4 + 0];
const float dy = loc_data[i * 4 + 1];
const float dw = loc_data[i * 4 + 2];
const float dh = loc_data[i * 4 + 3];
const float cx = px + dx * kVariance[0] * pw;
const float cy = py + dy * kVariance[0] * ph;
const float w = pw * std::exp(dw * kVariance[1]);
const float h = ph * std::exp(dh * kVariance[1]);
const float x1 = (cx - w / 2.0f) * scale_w / resize_factor;
const float y1 = (cy - h / 2.0f) * scale_h / resize_factor;
const float x2 = (cx + w / 2.0f) * scale_w / resize_factor;
const float y2 = (cy + h / 2.0f) * scale_h / resize_factor;
decoded_boxes.emplace_back(x1, y1, x2 - x1, y2 - y1);
scores.push_back(score);
std::array<cv::Point2f, 5> landmarks{};
for (int k = 0; k < kNumLandmarks; ++k) {
const float ldx = land_data[i * 10 + static_cast<size_t>(k) * 2 + 0];
const float ldy = land_data[i * 10 + static_cast<size_t>(k) * 2 + 1];
const float lx = (px + ldx * kVariance[0] * pw) * scale_w / resize_factor;
const float ly = (py + ldy * kVariance[0] * ph) * scale_h / resize_factor;
landmarks[static_cast<size_t>(k)] = cv::Point2f(lx, ly);
}
decoded_landmarks.push_back(landmarks);
}
// NMS
std::vector<cv::Rect2d> boxes_for_nms;
boxes_for_nms.reserve(decoded_boxes.size());
for (const auto& box : decoded_boxes) {
boxes_for_nms.emplace_back(box.x, box.y, box.width, box.height);
}
std::vector<int> nms_indices;
cv::dnn::NMSBoxes(boxes_for_nms, scores, confidence_threshold_, nms_threshold_, nms_indices);
std::vector<Face> results;
results.reserve(nms_indices.size());
for (const int idx : nms_indices) {
const auto uidx = static_cast<size_t>(idx);
results.push_back({decoded_boxes[uidx], scores[uidx], decoded_landmarks[uidx]});
}
return results;
}
} // namespace uniface

View File

@@ -0,0 +1,90 @@
#include "uniface/landmarker.hpp"
#include <cmath>
#include <opencv2/imgproc.hpp>
namespace uniface {
namespace {
constexpr int kNumLandmarks = 106;
cv::Mat computeCenterTransform(const cv::Point2f& center, float scale, int output_size) {
cv::Mat transform = cv::Mat::zeros(2, 3, CV_64F);
transform.at<double>(0, 0) = scale;
transform.at<double>(1, 1) = scale;
transform.at<double>(0, 2) = -center.x * scale + output_size / 2.0;
transform.at<double>(1, 2) = -center.y * scale + output_size / 2.0;
return transform;
}
} // namespace
Landmark106::Landmark106(const std::string& model_path, const LandmarkerConfig& config)
: net_(cv::dnn::readNetFromONNX(model_path))
, config_(config) {}
cv::Mat Landmark106::preprocess(const cv::Mat& image, const cv::Rect2f& bbox, cv::Mat& transform) {
const float width = bbox.width;
const float height = bbox.height;
const float center_x = bbox.x + width / 2.0f;
const float center_y = bbox.y + height / 2.0f;
const float max_dim = std::max(width, height);
const float scale = static_cast<float>(config_.input_size.width) / (max_dim * 1.5f);
transform = computeCenterTransform(cv::Point2f(center_x, center_y), scale, config_.input_size.width);
cv::Mat aligned;
cv::warpAffine(image, aligned, transform, config_.input_size, cv::INTER_LINEAR, cv::BORDER_CONSTANT);
cv::Mat blob = cv::dnn::blobFromImage(aligned, 1.0, config_.input_size, cv::Scalar(0, 0, 0), true, false);
return blob;
}
Landmarks Landmark106::postprocess(const cv::Mat& predictions, const cv::Mat& transform) {
Landmarks result{};
const auto* pred_data = reinterpret_cast<const float*>(predictions.data);
cv::Mat inverse_transform;
cv::invertAffineTransform(transform, inverse_transform);
const int input_size = config_.input_size.width;
const float half_size = static_cast<float>(input_size) / 2.0f;
for (int i = 0; i < kNumLandmarks; ++i) {
// Denormalize from [-1, 1] to pixel coordinates
float x = (pred_data[i * 2 + 0] + 1.0f) * half_size;
float y = (pred_data[i * 2 + 1] + 1.0f) * half_size;
// Transform back to original image coordinates
const float orig_x = static_cast<float>(
inverse_transform.at<double>(0, 0) * x + inverse_transform.at<double>(0, 1) * y +
inverse_transform.at<double>(0, 2)
);
const float orig_y = static_cast<float>(
inverse_transform.at<double>(1, 0) * x + inverse_transform.at<double>(1, 1) * y +
inverse_transform.at<double>(1, 2)
);
result.points[static_cast<size_t>(i)] = cv::Point2f(orig_x, orig_y);
}
return result;
}
Landmarks Landmark106::getLandmarks(const cv::Mat& image, const cv::Rect2f& bbox) {
cv::Mat transform;
cv::Mat blob = preprocess(image, bbox, transform);
net_.setInput(blob);
cv::Mat output = net_.forward();
return postprocess(output, transform);
}
} // namespace uniface

View File

@@ -0,0 +1,73 @@
#include "uniface/recognizer.hpp"
#include "uniface/utils.hpp"
#include <cmath>
#include <opencv2/imgproc.hpp>
namespace uniface {
ArcFace::ArcFace(const std::string& model_path, const RecognizerConfig& config)
: net_(cv::dnn::readNetFromONNX(model_path))
, config_(config) {}
cv::Mat ArcFace::preprocess(const cv::Mat& face_image) {
cv::Mat resized;
if (face_image.size() != config_.input_size) {
cv::resize(face_image, resized, config_.input_size);
} else {
resized = face_image;
}
// Normalize: (pixel - mean) / std, BGR -> RGB
cv::Mat blob = cv::dnn::blobFromImage(
resized, 1.0 / config_.input_std, config_.input_size,
cv::Scalar(config_.input_mean, config_.input_mean, config_.input_mean), true, false
);
return blob;
}
Embedding ArcFace::getEmbedding(const cv::Mat& aligned_face) {
cv::Mat blob = preprocess(aligned_face);
net_.setInput(blob);
cv::Mat output = net_.forward();
Embedding embedding{};
const auto* output_data = reinterpret_cast<const float*>(output.data);
const size_t embedding_size = std::min(static_cast<size_t>(output.total()), embedding.size());
for (size_t i = 0; i < embedding_size; ++i) {
embedding[i] = output_data[i];
}
return embedding;
}
Embedding ArcFace::getEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks) {
cv::Mat aligned = alignFace(image, landmarks, config_.input_size);
return getEmbedding(aligned);
}
Embedding ArcFace::getNormalizedEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks) {
Embedding embedding = getEmbedding(image, landmarks);
// L2 normalize
float norm = 0.0f;
for (const float val : embedding) {
norm += val * val;
}
norm = std::sqrt(norm);
if (norm > 1e-8f) {
for (float& val : embedding) {
val /= norm;
}
}
return embedding;
}
} // namespace uniface

82
uniface-cpp/src/utils.cpp Normal file
View File

@@ -0,0 +1,82 @@
#include "uniface/utils.hpp"
#include <opencv2/calib3d.hpp>
#include <opencv2/imgproc.hpp>
namespace uniface {
cv::Mat alignFace(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks, cv::Size output_size) {
const float ratio = static_cast<float>(output_size.width) / 112.0f;
std::vector<cv::Point2f> dst_points(5);
for (int i = 0; i < 5; ++i) {
dst_points[i].x = kReferenceAlignment[static_cast<size_t>(i) * 2] * ratio;
dst_points[i].y = kReferenceAlignment[static_cast<size_t>(i) * 2 + 1] * ratio;
}
std::vector<cv::Point2f> src_points(landmarks.begin(), landmarks.end());
cv::Mat transform = cv::estimateAffinePartial2D(src_points, dst_points);
if (transform.empty()) {
cv::Mat resized;
cv::resize(image, resized, output_size);
return resized;
}
cv::Mat aligned;
cv::warpAffine(image, aligned, transform, output_size, cv::INTER_LINEAR, cv::BORDER_CONSTANT);
return aligned;
}
float cosineSimilarity(const Embedding& a, const Embedding& b) noexcept {
float dot = 0.0f;
float norm_a = 0.0f;
float norm_b = 0.0f;
for (size_t i = 0; i < a.size(); ++i) {
dot += a[i] * b[i];
norm_a += a[i] * a[i];
norm_b += b[i] * b[i];
}
const float denom = std::sqrt(norm_a) * std::sqrt(norm_b);
if (denom < 1e-8f) {
return 0.0f;
}
return dot / denom;
}
float letterboxResize(const cv::Mat& src, cv::Mat& dst, cv::Size target_size) {
const auto src_height = static_cast<float>(src.rows);
const auto src_width = static_cast<float>(src.cols);
const auto target_height = static_cast<float>(target_size.height);
const auto target_width = static_cast<float>(target_size.width);
const float im_ratio = src_height / src_width;
const float model_ratio = target_height / target_width;
int new_width = 0;
int new_height = 0;
if (im_ratio > model_ratio) {
new_height = static_cast<int>(target_height);
new_width = static_cast<int>(static_cast<float>(new_height) / im_ratio);
} else {
new_width = static_cast<int>(target_width);
new_height = static_cast<int>(static_cast<float>(new_width) * im_ratio);
}
const float resize_factor = static_cast<float>(new_height) / src_height;
cv::Mat resized;
cv::resize(src, resized, cv::Size(new_width, new_height));
dst = cv::Mat::zeros(target_size, src.type());
resized.copyTo(dst(cv::Rect(0, 0, new_width, new_height)));
return resize_factor;
}
} // namespace uniface

View File

@@ -20,7 +20,6 @@ This library provides unified APIs for:
- Facial landmarks (106-point detection)
- Face parsing (semantic segmentation)
- Gaze estimation
- Head pose estimation
- Age, gender, and emotion prediction
- Face anti-spoofing
- Privacy/anonymization
@@ -30,7 +29,7 @@ from __future__ import annotations
__license__ = 'MIT'
__author__ = 'Yakhyokhuja Valikhujaev'
__version__ = '3.2.0'
__version__ = '3.1.0'
import contextlib
@@ -39,7 +38,7 @@ from uniface.log import Logger, enable_logging
from uniface.model_store import download_models, get_cache_dir, set_cache_dir, verify_model_weights
from .analyzer import FaceAnalyzer
from .attribute import AgeGender, Emotion, FairFace, create_attribute_predictor
from .attribute import AgeGender, Emotion, FairFace
from .detection import (
SCRFD,
RetinaFace,
@@ -49,14 +48,13 @@ from .detection import (
list_available_detectors,
)
from .gaze import MobileGaze, create_gaze_estimator
from .headpose import HeadPose, create_head_pose_estimator
from .landmark import Landmark106, create_landmarker
from .parsing import BiSeNet, XSeg, create_face_parser
from .privacy import BlurFace
from .recognition import AdaFace, ArcFace, MobileFace, SphereFace, create_recognizer
from .spoofing import MiniFASNet, create_spoofer
from .tracking import BYTETracker
from .types import AttributeResult, EmotionResult, Face, GazeResult, HeadPoseResult, SpoofingResult
from .types import AttributeResult, EmotionResult, Face, GazeResult, SpoofingResult
# Optional: FAISS vector store (requires `pip install faiss-cpu`)
with contextlib.suppress(ImportError):
@@ -74,7 +72,6 @@ __all__ = [
'create_detector',
'create_face_parser',
'create_gaze_estimator',
'create_head_pose_estimator',
'create_landmarker',
'create_recognizer',
'create_spoofer',
@@ -94,16 +91,12 @@ __all__ = [
# Gaze models
'GazeResult',
'MobileGaze',
# Head pose models
'HeadPose',
'HeadPoseResult',
# Parsing models
'BiSeNet',
'XSeg',
# Attribute models
'AgeGender',
'AttributeResult',
'create_attribute_predictor',
'Emotion',
'EmotionResult',
'FairFace',

View File

@@ -6,7 +6,8 @@ from __future__ import annotations
import numpy as np
from uniface.attribute.base import Attribute
from uniface.attribute.age_gender import AgeGender
from uniface.attribute.fairface import FairFace
from uniface.detection.base import BaseDetector
from uniface.log import Logger
from uniface.recognition.base import BaseRecognizer
@@ -20,24 +21,19 @@ class FaceAnalyzer:
This class provides a high-level interface for face analysis by combining
multiple components: face detection, recognition (embedding extraction),
and an extensible list of attribute predictors (age, gender, race,
emotion, etc.).
Any :class:`~uniface.attribute.base.Attribute` subclass can be passed
via the ``attributes`` list. Each predictor's ``predict(image, face)``
is called once per detected face, enriching the :class:`Face` in-place.
and attribute prediction (age, gender, race).
Args:
detector: Face detector instance for detecting faces in images.
recognizer: Optional face recognizer for extracting embeddings.
attributes: Optional list of ``Attribute`` predictors to run on
each detected face (e.g. ``[AgeGender(), FairFace(), Emotion()]``).
age_gender: Optional age/gender predictor.
fairface: Optional FairFace predictor for demographics.
Example:
>>> from uniface import RetinaFace, ArcFace, AgeGender, FaceAnalyzer
>>> from uniface import RetinaFace, ArcFace, FaceAnalyzer
>>> detector = RetinaFace()
>>> recognizer = ArcFace()
>>> analyzer = FaceAnalyzer(detector, recognizer=recognizer, attributes=[AgeGender()])
>>> analyzer = FaceAnalyzer(detector, recognizer=recognizer)
>>> faces = analyzer.analyze(image)
"""
@@ -45,23 +41,27 @@ class FaceAnalyzer:
self,
detector: BaseDetector,
recognizer: BaseRecognizer | None = None,
attributes: list[Attribute] | None = None,
age_gender: AgeGender | None = None,
fairface: FairFace | None = None,
) -> None:
self.detector = detector
self.recognizer = recognizer
self.attributes: list[Attribute] = attributes or []
self.age_gender = age_gender
self.fairface = fairface
Logger.info(f'Initialized FaceAnalyzer with detector={detector.__class__.__name__}')
if recognizer:
Logger.info(f' - Recognition enabled: {recognizer.__class__.__name__}')
for attr in self.attributes:
Logger.info(f' - Attribute enabled: {attr.__class__.__name__}')
if age_gender:
Logger.info(f' - Age/Gender enabled: {age_gender.__class__.__name__}')
if fairface:
Logger.info(f' - FairFace enabled: {fairface.__class__.__name__}')
def analyze(self, image: np.ndarray) -> list[Face]:
"""Analyze faces in an image.
Performs face detection, optionally extracts embeddings, and runs
every registered attribute predictor on each detected face.
Performs face detection and optionally extracts embeddings and
predicts attributes for each detected face.
Args:
image: Input image as numpy array with shape (H, W, C) in BGR format.
@@ -80,13 +80,24 @@ class FaceAnalyzer:
except Exception as e:
Logger.warning(f' Face {idx + 1}: Failed to extract embedding: {e}')
for attr in self.attributes:
attr_name = attr.__class__.__name__
if self.age_gender is not None:
try:
attr.predict(image, face)
Logger.debug(f' Face {idx + 1}: {attr_name} prediction succeeded')
result = self.age_gender.predict(image, face.bbox)
face.gender = result.gender
face.age = result.age
Logger.debug(f' Face {idx + 1}: Age={face.age}, Gender={face.sex}')
except Exception as e:
Logger.warning(f' Face {idx + 1}: {attr_name} prediction failed: {e}')
Logger.warning(f' Face {idx + 1}: Failed to predict age/gender: {e}')
if self.fairface is not None:
try:
result = self.fairface.predict(image, face.bbox)
face.gender = result.gender
face.age_group = result.age_group
face.race = result.race
Logger.debug(f' Face {idx + 1}: AgeGroup={face.age_group}, Gender={face.sex}, Race={face.race}')
except Exception as e:
Logger.warning(f' Face {idx + 1}: Failed to predict FairFace attributes: {e}')
Logger.info(f'Analysis complete: {len(faces)} face(s) processed')
return faces
@@ -95,6 +106,8 @@ class FaceAnalyzer:
parts = [f'FaceAnalyzer(detector={self.detector.__class__.__name__}']
if self.recognizer:
parts.append(f'recognizer={self.recognizer.__class__.__name__}')
for attr in self.attributes:
parts.append(f'{attr.__class__.__name__}')
if self.age_gender:
parts.append(f'age_gender={self.age_gender.__class__.__name__}')
if self.fairface:
parts.append(f'fairface={self.fairface.__class__.__name__}')
return ', '.join(parts) + ')'

View File

@@ -12,7 +12,7 @@ from uniface.attribute.age_gender import AgeGender
from uniface.attribute.base import Attribute
from uniface.attribute.fairface import FairFace
from uniface.constants import AgeGenderWeights, DDAMFNWeights, FairFaceWeights
from uniface.types import AttributeResult, EmotionResult, Face
from uniface.types import AttributeResult, EmotionResult
try:
from uniface.attribute.emotion import Emotion
@@ -30,7 +30,7 @@ except ImportError:
def _initialize_model(self) -> None: ...
def preprocess(self, image: np.ndarray, *args: Any) -> Any: ...
def postprocess(self, prediction: Any) -> Any: ...
def predict(self, image: np.ndarray, face: Face) -> Any: ...
def predict(self, image: np.ndarray, *args: Any) -> Any: ...
__all__ = [

View File

@@ -12,7 +12,7 @@ from uniface.face_utils import bbox_center_alignment
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from uniface.types import AttributeResult, Face
from uniface.types import AttributeResult
__all__ = ['AgeGender']
@@ -133,20 +133,17 @@ class AgeGender(Attribute):
age = int(np.round(prediction[2] * 100))
return AttributeResult(gender=gender, age=age)
def predict(self, image: np.ndarray, face: Face) -> AttributeResult:
"""Predict age and gender and enrich the Face in-place.
def predict(self, image: np.ndarray, bbox: list | np.ndarray) -> AttributeResult:
"""
Predicts age and gender for a single face specified by a bounding box.
Args:
image: The full input image in BGR format.
face: Detected face; ``face.bbox`` is used for alignment.
image (np.ndarray): The full input image in BGR format.
bbox (Union[List, np.ndarray]): The face bounding box coordinates [x1, y1, x2, y2].
Returns:
``AttributeResult`` with gender (0=Female, 1=Male) and age (years).
AttributeResult: Result containing gender (0=Female, 1=Male) and age (in years).
"""
face_blob = self.preprocess(image, face.bbox)
face_blob = self.preprocess(image, bbox)
prediction = self.session.run(self.output_names, {self.input_name: face_blob})[0][0]
result = self.postprocess(prediction)
face.gender = result.gender
face.age = result.age
return result
return self.postprocess(prediction)

View File

@@ -2,78 +2,95 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Any
import numpy as np
from uniface.types import AttributeResult, EmotionResult, Face
from uniface.types import AttributeResult, EmotionResult
__all__ = ['Attribute', 'AttributeResult', 'EmotionResult']
class Attribute(ABC):
"""Abstract base class for face attribute models.
"""
Abstract base class for face attribute models.
All attribute models (age-gender, emotion, FairFace, etc.) implement this
interface so they can be used interchangeably inside ``FaceAnalyzer``.
The ``predict`` method accepts an image and a :class:`Face` object. Each
subclass extracts what it needs (bbox, landmarks) from the Face, runs
inference, writes the results back to the Face **and** returns a typed
result dataclass.
This class defines the common interface that all attribute models
(e.g., age-gender, emotion) must implement. It ensures a consistent API
across different attribute prediction modules in the library, making them
interchangeable and easy to use.
"""
@abstractmethod
def _initialize_model(self) -> None:
"""Load model weights and create the inference session."""
"""
Initializes the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime, PyTorch), and any necessary
warm-up procedures to prepare the model for prediction.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, image: np.ndarray, *args: Any) -> Any:
"""Preprocess the input data for the model.
"""
Preprocesses the input data for the model.
This method should take a raw image and any other necessary data
(like bounding boxes or landmarks) and convert it into the format
expected by the model's inference engine (e.g., a blob or tensor).
Args:
image: The input image in BGR format.
*args: Subclass-specific data (bbox, landmarks, etc.).
image (np.ndarray): The input image containing the face, typically
in BGR format.
*args: Additional arguments required for preprocessing, such as
bounding boxes or facial landmarks.
Returns:
Preprocessed data ready for model inference.
The preprocessed data ready for model inference.
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, prediction: Any) -> Any:
"""Convert raw model output into a typed result dataclass.
"""
Postprocesses the raw model output into a human-readable format.
This method takes the raw output from the model's inference and
converts it into a meaningful result, such as an age value, a gender
label, or an emotion category.
Args:
prediction: Raw output from the model.
prediction (Any): The raw output from the model's inference.
Returns:
An ``AttributeResult`` or ``EmotionResult``.
The final, processed attributes.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def predict(self, image: np.ndarray, face: Face) -> AttributeResult | EmotionResult:
"""Run end-to-end prediction and enrich the Face in-place.
def predict(self, image: np.ndarray, *args: Any) -> Any:
"""
Performs end-to-end attribute prediction on a given image.
Each subclass extracts what it needs from *face* (e.g. ``face.bbox``
or ``face.landmarks``), runs the full preprocess-infer-postprocess
pipeline, writes relevant fields back to *face*, and returns the
result dataclass.
This method orchestrates the full pipeline: it calls the preprocess,
inference, and postprocess steps to return the final, user-friendly
attribute prediction.
Args:
image: The full input image in BGR format.
face: Detected face whose attribute fields will be populated.
image (np.ndarray): The input image containing the face.
*args: Additional data required for prediction, such as a bounding
box or landmarks.
Returns:
The prediction result (``AttributeResult`` or ``EmotionResult``).
The final predicted attributes.
"""
raise NotImplementedError('Subclasses must implement the predict method.')
def __call__(self, image: np.ndarray, face: Face) -> AttributeResult | EmotionResult:
"""Callable shortcut for :meth:`predict`."""
return self.predict(image, face)
def __call__(self, *args, **kwargs) -> Any:
"""
Provides a convenient, callable shortcut for the `predict` method.
"""
return self.predict(*args, **kwargs)

View File

@@ -12,7 +12,7 @@ from uniface.constants import DDAMFNWeights
from uniface.face_utils import face_alignment
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.types import EmotionResult, Face
from uniface.types import EmotionResult
__all__ = ['Emotion']
@@ -116,23 +116,14 @@ class Emotion(Attribute):
confidence = float(probabilities[pred_index])
return EmotionResult(emotion=emotion_label, confidence=confidence)
def predict(self, image: np.ndarray, face: Face) -> EmotionResult:
"""Predict emotion and enrich the Face in-place.
Args:
image: The full input image in BGR format.
face: Detected face; ``face.landmarks`` is used for alignment.
Returns:
``EmotionResult`` with emotion label and confidence score.
def predict(self, image: np.ndarray, landmark: list | np.ndarray) -> EmotionResult:
"""
input_tensor = self.preprocess(image, face.landmarks)
Predicts the emotion from a single face specified by its landmarks.
"""
input_tensor = self.preprocess(image, landmark)
with torch.no_grad():
output = self.model(input_tensor)
if isinstance(output, tuple):
output = output[0]
result = self.postprocess(output)
face.emotion = result.emotion
face.emotion_confidence = result.confidence
return result
return self.postprocess(output)

View File

@@ -11,7 +11,7 @@ from uniface.constants import FairFaceWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from uniface.types import AttributeResult, Face
from uniface.types import AttributeResult
__all__ = ['AGE_LABELS', 'RACE_LABELS', 'FairFace']
@@ -168,24 +168,29 @@ class FairFace(Attribute):
race=RACE_LABELS[race_idx],
)
def predict(self, image: np.ndarray, face: Face) -> AttributeResult:
"""Predict race, gender, and age and enrich the Face in-place.
def predict(self, image: np.ndarray, bbox: list | np.ndarray | None = None) -> AttributeResult:
"""
Predicts race, gender, and age for a face.
Args:
image: The full input image in BGR format.
face: Detected face; ``face.bbox`` is used for cropping.
image (np.ndarray): The input image in BGR format.
bbox (Optional[Union[List, np.ndarray]]): Face bounding box [x1, y1, x2, y2].
If None, uses the entire image.
Returns:
``AttributeResult`` with gender, age_group, and race.
AttributeResult: Result containing:
- gender: 0=Female, 1=Male
- age_group: Age range string like "20-29"
- race: Race/ethnicity label
"""
input_blob = self.preprocess(image, face.bbox)
outputs = self.session.run(self.output_names, {self.input_name: input_blob})
result = self.postprocess(outputs)
# Preprocess
input_blob = self.preprocess(image, bbox)
face.gender = result.gender
face.age_group = result.age_group
face.race = result.race
return result
# Inference
outputs = self.session.run(self.output_names, {self.input_name: input_blob})
# Postprocess
return self.postprocess(outputs)
@staticmethod
def _softmax(x: np.ndarray) -> np.ndarray:

View File

@@ -16,7 +16,6 @@ __all__ = [
'distance2bbox',
'distance2kps',
'generate_anchors',
'letterbox_resize',
'non_max_suppression',
'resize_image',
'xyxy_to_cxcywh',
@@ -278,70 +277,3 @@ def distance2kps(
preds.append(px)
preds.append(py)
return np.stack(preds, axis=-1)
def letterbox_resize(
image: np.ndarray,
target_size: int,
fill_value: int = 114,
) -> tuple[np.ndarray, float, tuple[int, int]]:
"""Letterbox resize with center padding for YOLO-style detectors.
Maintains aspect ratio by scaling the image to fit within target_size,
then center-pads with a constant fill value. Converts BGR to RGB,
normalizes to [0, 1], and transposes to NCHW format.
This preprocessing strategy is standard for YOLO models and ensures
no distortion while maintaining a square input size.
Args:
image: Input image in BGR format with shape (H, W, C).
target_size: Target square size (e.g., 640 for 640x640 input).
fill_value: Padding fill value (default: 114 for gray background).
Returns:
Tuple of (preprocessed_tensor, scale_ratio, padding):
- preprocessed_tensor: Shape (1, 3, target_size, target_size),
RGB, normalized [0, 1], NCHW format, float32, contiguous.
- scale_ratio: Resize scale factor for coordinate transformation.
- padding: Padding offsets as (pad_w, pad_h) for coordinate transformation.
Example:
>>> image = cv2.imread('face.jpg') # (480, 640, 3)
>>> tensor, scale, (pad_w, pad_h) = letterbox_resize(image, 640)
>>> tensor.shape
(1, 3, 640, 640)
>>> # To transform coordinates back to original:
>>> x_orig = (x_detected - pad_w) / scale
>>> y_orig = (y_detected - pad_h) / scale
"""
# Get original image shape
img_h, img_w = image.shape[:2]
# Calculate scale ratio to fit within target_size
scale = min(target_size / img_h, target_size / img_w)
new_h, new_w = int(img_h * scale), int(img_w * scale)
# Resize image maintaining aspect ratio
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
# Create padded canvas with fill_value
img_padded = np.full((target_size, target_size, 3), fill_value, dtype=np.uint8)
# Calculate padding to center the image
pad_h = (target_size - new_h) // 2
pad_w = (target_size - new_w) // 2
# Place resized image in center of canvas
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
# Convert BGR to RGB and normalize to [0, 1]
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
img_normalized = img_rgb.astype(np.float32) / 255.0
# Transpose to CHW format and add batch dimension (NCHW)
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
img_batch = np.ascontiguousarray(img_batch)
return img_batch, scale, (pad_w, pad_h)

View File

@@ -156,20 +156,6 @@ class GazeWeights(str, Enum):
MOBILEONE_S0 = "gaze_mobileone_s0"
class HeadPoseWeights(str, Enum):
"""
Head pose estimation models using 6D rotation representation.
Trained on 300W-LP dataset, evaluated on AFLW2000.
https://github.com/yakhyo/head-pose-estimation
"""
RESNET18 = "headpose_resnet18"
RESNET34 = "headpose_resnet34"
RESNET50 = "headpose_resnet50"
MOBILENET_V2 = "headpose_mobilenetv2"
MOBILENET_V3_SMALL = "headpose_mobilenetv3_small"
MOBILENET_V3_LARGE = "headpose_mobilenetv3_large"
class ParsingWeights(str, Enum):
"""
Face Parsing: Semantic Segmentation of Facial Components.
@@ -362,32 +348,6 @@ MODEL_REGISTRY: dict[Enum, ModelInfo] = {
sha256='8b4fdc4e3da44733c9a82e7776b411e4a39f94e8e285aee0fc85a548a55f7d9f'
),
# Head Pose
HeadPoseWeights.RESNET18: ModelInfo(
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/resnet18.onnx',
sha256='61c34e877989412980d1ea80c52391250b074abc00d19a6100de5c8e999212ee'
),
HeadPoseWeights.RESNET34: ModelInfo(
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/resnet34.onnx',
sha256='8da9f2ce4810298ebea68bd85fba1b6bd11716060c10534596f46be52cc908c9'
),
HeadPoseWeights.RESNET50: ModelInfo(
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/resnet50.onnx',
sha256='50c74d57b7663361b8ede83b0e4122546171119ef502ec55b790dbd7fc360260'
),
HeadPoseWeights.MOBILENET_V2: ModelInfo(
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/mobilenetv2.onnx',
sha256='1e902872868e483bd0e4f8f4a8ff2a4d61c2ccbca9dadf748e5479b5cc86a9e9'
),
HeadPoseWeights.MOBILENET_V3_SMALL: ModelInfo(
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/mobilenetv3_small.onnx',
sha256='e8ae4d932b3d13221638fc72e171603e020c6da28b770753f76146867f40e190'
),
HeadPoseWeights.MOBILENET_V3_LARGE: ModelInfo(
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/mobilenetv3_large.onnx',
sha256='3a68815fa00aba41ddc4e014bf631b637caba8619df71160383f1fee8c15a3c9'
),
# Parsing
ParsingWeights.RESNET18: ModelInfo(
url='https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',

View File

@@ -5,7 +5,7 @@
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Any, Literal
from typing import Any
import numpy as np
@@ -119,77 +119,3 @@ class BaseDetector(ABC):
List of detected Face objects.
"""
return self.detect(image, **kwargs)
def _select_top_detections(
self,
detections: np.ndarray,
landmarks: np.ndarray,
max_num: int,
original_shape: tuple[int, int],
metric: Literal['default', 'max'] = 'max',
center_weight: float = 2.0,
) -> tuple[np.ndarray, np.ndarray]:
"""Filter detections to keep only top max_num faces.
Ranks faces by area and/or distance from image center, then selects
the top max_num detections.
Args:
detections: Array of shape (N, 5) as [x1, y1, x2, y2, confidence].
landmarks: Array of shape (N, 5, 2) for 5-point landmarks.
max_num: Maximum number of faces to keep. If 0 or >= N, returns all.
original_shape: Original image shape as (height, width).
metric: Ranking metric:
- 'max': Rank by bounding box area only.
- 'default': Rank by area minus center distance penalty.
center_weight: Weight for center distance penalty (only used with 'default' metric).
Returns:
Filtered (detections, landmarks) tuple with at most max_num faces.
"""
if max_num <= 0 or detections.shape[0] <= max_num:
return detections, landmarks
# Calculate bounding box areas
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center_y, center_x = original_shape[0] // 2, original_shape[1] // 2
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center_x,
(detections[:, 1] + detections[:, 3]) / 2 - center_y,
]
)
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
# Calculate ranking scores based on metric
if metric == 'max':
scores = area
else:
scores = area - offset_dist_squared * center_weight
# Select top max_num by score
top_indices = np.argsort(scores)[::-1][:max_num]
return detections[top_indices], landmarks[top_indices]
@staticmethod
def _detections_to_faces(detections: np.ndarray, landmarks: np.ndarray) -> list[Face]:
"""Convert detection arrays to Face objects.
Args:
detections: Array of shape (N, 5) as [x1, y1, x2, y2, confidence].
landmarks: Array of shape (N, 5, 2) for 5-point landmarks.
Returns:
List of Face objects.
"""
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces

View File

@@ -208,12 +208,42 @@ class RetinaFace(BaseDetector):
# Postprocessing
detections, landmarks = self.postprocess(outputs, resize_factor, shape=(width, height))
# Filter to top max_num faces if requested
detections, landmarks = self._select_top_detections(
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
)
if max_num > 0 and detections.shape[0] > max_num:
# Calculate area of detections
areas = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
return self._detections_to_faces(detections, landmarks)
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
# Calculate scores based on the chosen metric
if metric == 'max':
scores = areas
else:
scores = areas - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(scores)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces
def postprocess(
self,

View File

@@ -272,9 +272,38 @@ class SCRFD(BaseDetector):
landmarks = landmarks[order, :, :]
landmarks = landmarks[keep, :, :].astype(np.float32)
# Filter to top max_num faces if requested
detections, landmarks = self._select_top_detections(
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
)
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
return self._detections_to_faces(detections, landmarks)
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == 'max':
values = area
else:
values = area - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces

View File

@@ -4,9 +4,10 @@
from typing import Any, Literal
import cv2
import numpy as np
from uniface.common import letterbox_resize, non_max_suppression
from uniface.common import non_max_suppression
from uniface.constants import YOLOv5FaceWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
@@ -139,15 +140,45 @@ class YOLOv5Face(BaseDetector):
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, float, tuple[int, int]]:
"""Preprocess image using letterbox resize.
"""
Preprocess image for inference.
Args:
image: Input image in BGR format.
image (np.ndarray): Input image (BGR format)
Returns:
Tuple of (preprocessed_tensor, scale_ratio, padding).
Tuple[np.ndarray, float, Tuple[int, int]]: Preprocessed image, scale ratio, and padding
"""
return letterbox_resize(image, self.input_size)
# Get original image shape
img_h, img_w = image.shape[:2]
# Calculate scale ratio
scale = min(self.input_size / img_h, self.input_size / img_w)
new_h, new_w = int(img_h * scale), int(img_w * scale)
# Resize image
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
# Create padded image
img_padded = np.full((self.input_size, self.input_size, 3), 114, dtype=np.uint8)
# Calculate padding
pad_h = (self.input_size - new_h) // 2
pad_w = (self.input_size - new_w) // 2
# Place resized image in center
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
# Convert to RGB and normalize
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
img_normalized = img_rgb.astype(np.float32) / 255.0
# Transpose to CHW format (HWC -> CHW) and add batch dimension
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
img_batch = np.ascontiguousarray(img_batch)
return img_batch, scale, (pad_w, pad_h)
def inference(self, input_tensor: np.ndarray) -> list[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
@@ -306,9 +337,38 @@ class YOLOv5Face(BaseDetector):
if len(detections) == 0:
return []
# Filter to top max_num faces if requested
detections, landmarks = self._select_top_detections(
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
)
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
return self._detections_to_faces(detections, landmarks)
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == 'max':
values = area
else:
values = area - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces

View File

@@ -11,9 +11,10 @@ Reference: https://github.com/yakhyo/yolov8-face-onnx-inference
from typing import Any, Literal
import cv2
import numpy as np
from uniface.common import letterbox_resize, non_max_suppression
from uniface.common import non_max_suppression
from uniface.constants import YOLOv8FaceWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
@@ -150,15 +151,45 @@ class YOLOv8Face(BaseDetector):
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, float, tuple[int, int]]:
"""Preprocess image using letterbox resize.
"""
Preprocess image for inference (letterbox resize with center padding).
Args:
image: Input image in BGR format.
image (np.ndarray): Input image (BGR format)
Returns:
Tuple of (preprocessed_tensor, scale_ratio, padding).
Tuple[np.ndarray, float, Tuple[int, int]]: Preprocessed image, scale ratio, and padding (pad_w, pad_h)
"""
return letterbox_resize(image, self.input_size)
# Get original image shape
img_h, img_w = image.shape[:2]
# Calculate scale ratio
scale = min(self.input_size / img_h, self.input_size / img_w)
new_h, new_w = int(img_h * scale), int(img_w * scale)
# Resize image
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
# Create padded image with gray background (114, 114, 114)
img_padded = np.full((self.input_size, self.input_size, 3), 114, dtype=np.uint8)
# Calculate padding (center the image)
pad_h = (self.input_size - new_h) // 2
pad_w = (self.input_size - new_w) // 2
# Place resized image in center
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
# Convert BGR to RGB and normalize
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
img_normalized = img_rgb.astype(np.float32) / 255.0
# Transpose to CHW format (HWC -> CHW) and add batch dimension
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
img_batch = np.ascontiguousarray(img_batch)
return img_batch, scale, (pad_w, pad_h)
def inference(self, input_tensor: np.ndarray) -> list[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
@@ -356,9 +387,38 @@ class YOLOv8Face(BaseDetector):
if len(detections) == 0:
return []
# Filter to top max_num faces if requested
detections, landmarks = self._select_top_detections(
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
)
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
return self._detections_to_faces(detections, landmarks)
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == 'max':
values = area
else:
values = area - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces

View File

@@ -21,9 +21,6 @@ __all__ = [
'draw_corner_bbox',
'draw_detections',
'draw_gaze',
'draw_head_pose',
'draw_head_pose_axis',
'draw_head_pose_cube',
'draw_text_label',
'draw_tracks',
'vis_parsing_maps',
@@ -359,212 +356,6 @@ def draw_gaze(
)
def draw_head_pose_cube(
image: np.ndarray,
yaw: float,
pitch: float,
roll: float,
bbox: list[int] | np.ndarray,
size: int | None = None,
) -> None:
"""Draw a 3D wireframe cube representing head orientation on an image.
Projects a 3D cube onto the image plane based on yaw, pitch, and roll
angles, centered on the face bounding box.
Modifies the image in-place.
Args:
image: Input image to draw on (modified in-place).
yaw: Yaw angle in degrees.
pitch: Pitch angle in degrees.
roll: Roll angle in degrees.
bbox: Bounding box as ``[x_min, y_min, x_max, y_max]``.
size: Cube size in pixels. If None, uses the bounding box width.
Example:
>>> from uniface.draw import draw_head_pose_cube
>>> draw_head_pose_cube(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
"""
x_min, y_min, x_max, y_max = map(int, bbox[:4])
if size is None:
size = x_max - x_min
h = size * 0.5
yaw_r, pitch_r, roll_r = np.radians([-yaw, pitch, roll])
cx = (x_min + x_max) * 0.5
cy = (y_min + y_max) * 0.5
cos_y, sin_y = np.cos(yaw_r), np.sin(yaw_r)
cos_p, sin_p = np.cos(pitch_r), np.sin(pitch_r)
cos_r, sin_r = np.cos(roll_r), np.sin(roll_r)
ex = np.array([cos_y * cos_r, cos_p * sin_r + cos_r * sin_p * sin_y])
ey = np.array([-cos_y * sin_r, cos_p * cos_r - sin_p * sin_y * sin_r])
ez = np.array([sin_y, -cos_y * sin_p])
center = np.array([cx, cy])
def _pt(v: np.ndarray) -> tuple[int, int]:
return (int(v[0]), int(v[1]))
f0 = center + h * (-ex - ey - ez)
f1 = center + h * (+ex - ey - ez)
f2 = center + h * (+ex + ey - ez)
f3 = center + h * (-ex + ey - ez)
b0 = center + h * (-ex - ey + ez)
b1 = center + h * (+ex - ey + ez)
b2 = center + h * (+ex + ey + ez)
b3 = center + h * (-ex + ey + ez)
red = (0, 0, 255)
green = (0, 255, 0)
blue = (255, 0, 0)
# Front face at head (red)
cv2.line(image, _pt(f0), _pt(f1), red, 2)
cv2.line(image, _pt(f1), _pt(f2), red, 2)
cv2.line(image, _pt(f2), _pt(f3), red, 2)
cv2.line(image, _pt(f3), _pt(f0), red, 2)
# Back face in looking direction (green)
cv2.line(image, _pt(b0), _pt(b1), green, 2)
cv2.line(image, _pt(b1), _pt(b2), green, 2)
cv2.line(image, _pt(b2), _pt(b3), green, 2)
cv2.line(image, _pt(b3), _pt(b0), green, 2)
# Side edges (blue)
cv2.line(image, _pt(f0), _pt(b0), blue, 2)
cv2.line(image, _pt(f1), _pt(b1), blue, 2)
cv2.line(image, _pt(f2), _pt(b2), blue, 2)
cv2.line(image, _pt(f3), _pt(b3), blue, 2)
def draw_head_pose_axis(
image: np.ndarray,
yaw: float,
pitch: float,
roll: float,
bbox: list[int] | np.ndarray,
size_ratio: float = 0.5,
) -> None:
"""Draw 3D coordinate axes representing head orientation on an image.
Draws X (red), Y (green), and Z (blue) axes from the center of the
bounding box, rotated according to yaw, pitch, and roll.
Modifies the image in-place.
Args:
image: Input image to draw on (modified in-place).
yaw: Yaw angle in degrees.
pitch: Pitch angle in degrees.
roll: Roll angle in degrees.
bbox: Bounding box as ``[x_min, y_min, x_max, y_max]``.
size_ratio: Axis length as a fraction of bbox size. Defaults to 0.5.
Example:
>>> from uniface.draw import draw_head_pose_axis
>>> draw_head_pose_axis(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
"""
x_min, y_min, x_max, y_max = map(int, bbox[:4])
yaw_r, pitch_r, roll_r = np.radians([-yaw, pitch, roll])
tdx = int(x_min + (x_max - x_min) * 0.5)
tdy = int(y_min + (y_max - y_min) * 0.5)
bbox_size = min(x_max - x_min, y_max - y_min)
size = bbox_size * size_ratio
cos_yaw, sin_yaw = np.cos(yaw_r), np.sin(yaw_r)
cos_pitch, sin_pitch = np.cos(pitch_r), np.sin(pitch_r)
cos_roll, sin_roll = np.cos(roll_r), np.sin(roll_r)
# X-Axis (red)
x1 = int(size * (cos_yaw * cos_roll) + tdx)
y1 = int(size * (cos_pitch * sin_roll + cos_roll * sin_pitch * sin_yaw) + tdy)
# Y-Axis (green)
x2 = int(size * (-cos_yaw * sin_roll) + tdx)
y2 = int(size * (cos_pitch * cos_roll - sin_pitch * sin_yaw * sin_roll) + tdy)
# Z-Axis (blue)
x3 = int(size * sin_yaw + tdx)
y3 = int(size * (-cos_yaw * sin_pitch) + tdy)
cv2.line(image, (tdx, tdy), (x1, y1), (0, 0, 255), 2)
cv2.line(image, (tdx, tdy), (x2, y2), (0, 255, 0), 2)
cv2.line(image, (tdx, tdy), (x3, y3), (255, 0, 0), 2)
def draw_head_pose(
image: np.ndarray,
bbox: np.ndarray | list[int],
pitch: float,
yaw: float,
roll: float,
*,
draw_type: str = 'cube',
draw_bbox: bool = False,
corner_bbox: bool = True,
draw_angles: bool = True,
) -> None:
"""Draw head pose visualization with optional bounding box on an image.
High-level convenience function that combines bounding box drawing with
a 3D shape visualization of head orientation.
Modifies the image in-place.
Args:
image: Input image to draw on (modified in-place).
bbox: Face bounding box in xyxy format ``[x1, y1, x2, y2]``.
pitch: Pitch angle in degrees (rotation around X-axis).
yaw: Yaw angle in degrees (rotation around Y-axis).
roll: Roll angle in degrees (rotation around Z-axis).
draw_type: Visualization type, ``'cube'`` or ``'axis'``.
Defaults to ``'cube'``.
draw_bbox: Whether to draw the bounding box. Defaults to False.
corner_bbox: Use corner-style bounding box. Defaults to True.
draw_angles: Whether to display angle values as text. Defaults to True.
Example:
>>> from uniface.headpose import HeadPose
>>> from uniface.draw import draw_head_pose
>>> estimator = HeadPose()
>>> result = estimator.estimate(face_crop)
>>> draw_head_pose(image, bbox, result.pitch, result.yaw, result.roll)
"""
x_min, y_min, x_max, y_max = map(int, bbox[:4])
line_thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
if draw_bbox:
if corner_bbox:
draw_corner_bbox(image, np.array(bbox), color=(0, 255, 0), thickness=line_thickness)
else:
cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), line_thickness)
bbox_list = [x_min, y_min, x_max, y_max]
if draw_type == 'axis':
draw_head_pose_axis(image, yaw, pitch, roll, bbox_list)
else:
draw_head_pose_cube(image, yaw, pitch, roll, bbox_list)
if draw_angles:
font_scale = max(0.4, min(0.7, (y_max - y_min) / 200))
draw_text_label(
image,
f'P:{pitch:.0f} Y:{yaw:.0f} R:{roll:.0f}',
x_min,
y_min,
bg_color=(0, 0, 255),
text_color=(255, 255, 255),
font_scale=font_scale,
)
def draw_tracks(
*,
image: np.ndarray,

View File

@@ -71,7 +71,8 @@ def estimate_norm(
alignment[:, 0] += diff_x
# Compute the transformation matrix
transform = SimilarityTransform.from_estimate(landmark, alignment)
transform = SimilarityTransform()
transform.estimate(landmark, alignment)
matrix = transform.params[0:2, :]
inverse_matrix = np.linalg.inv(transform.params)[0:2, :]

View File

@@ -1,53 +0,0 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from uniface.types import HeadPoseResult
from .base import BaseHeadPoseEstimator
from .models import HeadPose
def create_head_pose_estimator(method: str = 'headpose', **kwargs) -> BaseHeadPoseEstimator:
"""
Factory function to create head pose estimators.
This function initializes and returns a head pose estimator instance based on the
specified method. It acts as a high-level interface to the underlying model classes.
Args:
method (str): The head pose estimation method to use.
Options: 'headpose' (default).
**kwargs: Model-specific parameters passed to the estimator's constructor.
For example, `model_name` can be used to select a specific
backbone from `HeadPoseWeights` enum (RESNET18, RESNET34, RESNET50,
MOBILENET_V2, MOBILENET_V3_SMALL, MOBILENET_V3_LARGE).
Returns:
BaseHeadPoseEstimator: An initialized head pose estimator instance ready for use.
Raises:
ValueError: If the specified `method` is not supported.
Examples:
>>> # Create the default head pose estimator (ResNet18 backbone)
>>> estimator = create_head_pose_estimator()
>>> # Create with MobileNetV2 backbone
>>> from uniface.constants import HeadPoseWeights
>>> estimator = create_head_pose_estimator('headpose', model_name=HeadPoseWeights.MOBILENET_V2)
>>> # Use the estimator
>>> result = estimator.estimate(face_crop)
>>> print(f'Pitch: {result.pitch:.1f}°, Yaw: {result.yaw:.1f}°, Roll: {result.roll:.1f}°')
"""
method = method.lower()
if method in ('headpose', 'head_pose', '6drepnet'):
return HeadPose(**kwargs)
else:
available = ['headpose']
raise ValueError(f"Unsupported head pose estimation method: '{method}'. Available: {available}")
__all__ = ['BaseHeadPoseEstimator', 'HeadPose', 'HeadPoseResult', 'create_head_pose_estimator']

View File

@@ -1,115 +0,0 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
from abc import ABC, abstractmethod
import numpy as np
from uniface.types import HeadPoseResult
__all__ = ['BaseHeadPoseEstimator', 'HeadPoseResult']
class BaseHeadPoseEstimator(ABC):
"""
Abstract base class for all head pose estimation models.
This class defines the common interface that all head pose estimators must implement,
ensuring consistency across different head pose estimation methods. Head pose estimation
predicts the orientation of a person's head based on their face image.
The head orientation is represented as Euler angles in degrees:
- Pitch: Rotation around X-axis (positive = looking down, negative = looking up)
- Yaw: Rotation around Y-axis (positive = looking right, negative = looking left)
- Roll: Rotation around Z-axis (positive = tilting clockwise, negative = tilting counter-clockwise)
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess the input face image for model inference.
This method should take a raw face crop and convert it into the format
expected by the model's inference engine (e.g., normalized tensor).
Args:
face_image (np.ndarray): A cropped face image in BGR format with
shape (H, W, C).
Returns:
np.ndarray: The preprocessed image tensor ready for inference,
typically with shape (1, C, H, W).
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, rotation_matrix: np.ndarray) -> HeadPoseResult:
"""
Postprocess a rotation matrix into Euler angles.
This method takes the raw rotation matrix output from the model's
inference and converts it into pitch, yaw, and roll angles in degrees.
Args:
rotation_matrix: Rotation matrix with shape (B, 3, 3) from the
model inference.
Returns:
HeadPoseResult: Result containing pitch, yaw, and roll in degrees.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def estimate(self, face_image: np.ndarray) -> HeadPoseResult:
"""
Perform end-to-end head pose estimation on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the head orientation.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
The face should be roughly centered and
well-framed within the image.
Returns:
HeadPoseResult: Result containing Euler angles in degrees:
- pitch: Rotation around X-axis (positive = down)
- yaw: Rotation around Y-axis (positive = right)
- roll: Rotation around Z-axis (positive = clockwise)
Example:
>>> estimator = create_head_pose_estimator()
>>> result = estimator.estimate(face_crop)
>>> print(f'Pose: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
"""
raise NotImplementedError('Subclasses must implement the estimate method.')
def __call__(self, face_image: np.ndarray) -> HeadPoseResult:
"""
Provides a convenient, callable shortcut for the `estimate` method.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
Returns:
HeadPoseResult: Result containing pitch, yaw, and roll in degrees.
"""
return self.estimate(face_image)

View File

@@ -1,178 +0,0 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
from uniface.constants import HeadPoseWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from uniface.types import HeadPoseResult
from .base import BaseHeadPoseEstimator
__all__ = ['HeadPose']
class HeadPose(BaseHeadPoseEstimator):
"""
Head Pose Estimation with ONNX Runtime using 6D Rotation Representation.
This model estimates head orientation from a single face image by predicting
a 3x3 rotation matrix (via continuous 6D representation) and converting it to
Euler angles (pitch, yaw, roll) in degrees.
Supports multiple backbone architectures: ResNet-18/34/50, MobileNetV2,
and MobileNetV3 (small/large).
Reference:
https://github.com/yakhyo/head-pose-estimation
Args:
model_name (HeadPoseWeights): The enum specifying the head pose model to load.
Options: RESNET18, RESNET34, RESNET50, MOBILENET_V2, MOBILENET_V3_SMALL,
MOBILENET_V3_LARGE. Defaults to `HeadPoseWeights.RESNET18`.
input_size (tuple[int, int]): The resolution (width, height) for the model's
input. Defaults to (224, 224).
providers (list[str] | None): ONNX Runtime execution providers. If None, auto-detects
the best available provider. Example: ['CPUExecutionProvider'] to force CPU.
Attributes:
input_size (tuple[int, int]): Model input dimensions.
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
Example:
>>> from uniface.headpose import HeadPose
>>> from uniface import RetinaFace
>>>
>>> detector = RetinaFace()
>>> head_pose = HeadPose()
>>>
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face.bbox
... x1, y1, x2, y2 = map(int, bbox[:4])
... face_crop = image[y1:y2, x1:x2]
... result = head_pose.estimate(face_crop)
... print(f'Pose: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
"""
def __init__(
self,
model_name: HeadPoseWeights = HeadPoseWeights.RESNET18,
input_size: tuple[int, int] = (224, 224),
providers: list[str] | None = None,
) -> None:
Logger.info(f'Initializing HeadPose with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
self.providers = providers
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path, providers=self.providers)
input_cfg = self.session.get_inputs()[0]
input_shape = input_cfg.shape
self.input_name = input_cfg.name
self.input_size = tuple(input_shape[2:4][::-1])
outputs = self.session.get_outputs()
self.output_names = [output.name for output in outputs]
if len(self.output_names) != 1:
raise ValueError(f'Expected 1 output node (rotation_matrix), got {len(self.output_names)}')
Logger.info(f'HeadPose initialized with input size {self.input_size}')
except Exception as e:
Logger.error(f"Failed to load head pose model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize head pose model: {e}') from e
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess a face crop for head pose estimation.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
"""
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, self.input_size)
image = image.astype(np.float32) / 255.0
image = (image - self.input_mean) / self.input_std
# HWC -> CHW -> NCHW
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0).astype(np.float32)
return image
@staticmethod
def rotation_matrix_to_euler(rotation_matrix: np.ndarray) -> np.ndarray:
"""Convert (B, 3, 3) rotation matrices to Euler angles in degrees.
Uses the ZYX convention to decompose rotation matrices into
pitch (X), yaw (Y), and roll (Z) angles.
Args:
rotation_matrix: Batch of rotation matrices with shape (B, 3, 3).
Returns:
np.ndarray: Euler angles with shape (B, 3) as [pitch, yaw, roll] in degrees.
"""
R = rotation_matrix
sy = np.sqrt(R[:, 0, 0] ** 2 + R[:, 1, 0] ** 2)
singular = sy < 1e-6
x = np.where(singular, np.arctan2(-R[:, 1, 2], R[:, 1, 1]), np.arctan2(R[:, 2, 1], R[:, 2, 2]))
y = np.arctan2(-R[:, 2, 0], sy)
z = np.where(singular, np.zeros_like(sy), np.arctan2(R[:, 1, 0], R[:, 0, 0]))
return np.degrees(np.stack([x, y, z], axis=1))
def postprocess(self, rotation_matrix: np.ndarray) -> HeadPoseResult:
"""
Convert a rotation matrix into Euler angles.
Args:
rotation_matrix: Rotation matrix with shape (B, 3, 3).
Returns:
HeadPoseResult: Result containing pitch, yaw, and roll in degrees.
"""
euler = self.rotation_matrix_to_euler(rotation_matrix)
return HeadPoseResult(
pitch=float(euler[0, 0]),
yaw=float(euler[0, 1]),
roll=float(euler[0, 2]),
)
def estimate(self, face_image: np.ndarray) -> HeadPoseResult:
"""
Perform end-to-end head pose estimation on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the head orientation.
"""
input_tensor = self.preprocess(face_image)
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
rotation_matrix = outputs[0] # (1, 3, 3)
return self.postprocess(rotation_matrix)

View File

@@ -10,23 +10,17 @@ import numpy as np
class BaseFaceParser(ABC):
"""Abstract base class for all face parsing models.
"""
Abstract base class for all face parsing models.
This class defines the common interface that all face parsing models must implement,
ensuring consistency across different parsing methods. Face parsing segments a face
image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
Subclasses must define a ``mask_type`` class attribute to indicate output format:
- ``"class_ids"``: uint8 mask with discrete class labels (e.g. BiSeNet: 0-18)
- ``"probability"``: float32 mask with continuous values in [0, 1] (e.g. XSeg)
Attributes:
mask_type (str): Output format identifier. Must be set by subclasses.
The output is a segmentation mask where each pixel is assigned a class label
representing a facial component.
"""
mask_type: str
@abstractmethod
def _initialize_model(self) -> None:
"""
@@ -92,17 +86,13 @@ class BaseFaceParser(ABC):
Ignored by parsers that do not need landmarks (e.g., BiSeNet).
Returns:
np.ndarray: Segmentation mask with the same size as input image.
Format depends on ``mask_type``:
- ``"class_ids"``: uint8 with discrete class labels
- ``"probability"``: float32 with values in [0, 1]
np.ndarray: Segmentation mask with the same size as input image,
where each pixel value represents a facial component class.
Example:
>>> parser = create_face_parser()
>>> mask = parser.parse(face_crop)
>>> print(f'Mask type: {parser.mask_type}')
>>> print(f'Mask shape: {mask.shape}, dtype: {mask.dtype}')
>>> print(f'Mask shape: {mask.shape}, unique classes: {np.unique(mask)}')
"""
raise NotImplementedError('Subclasses must implement the parse method.')

View File

@@ -18,7 +18,8 @@ __all__ = ['BiSeNet']
class BiSeNet(BaseFaceParser):
"""BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
"""
BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
BiSeNet is a semantic segmentation model that segments a face image into
different facial components such as skin, eyes, nose, mouth, hair, etc. The model
@@ -44,7 +45,6 @@ class BiSeNet(BaseFaceParser):
input_size (Tuple[int, int]): Model input dimensions.
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
mask_type (str): Output type identifier - "class_ids" for BiSeNet.
Example:
>>> from uniface.parsing import BiSeNet
@@ -61,11 +61,8 @@ class BiSeNet(BaseFaceParser):
... face_crop = image[y1:y2, x1:x2]
... mask = parser.parse(face_crop)
... print(f'Mask shape: {mask.shape}, unique classes: {np.unique(mask)}')
... print(f'Output type: {parser.mask_type}') # "class_ids"
"""
mask_type = 'class_ids'
def __init__(
self,
model_name: ParsingWeights = ParsingWeights.RESNET18,

View File

@@ -19,9 +19,10 @@ __all__ = ['XSeg']
class XSeg(BaseFaceParser):
"""XSeg: Face Segmentation Model from DeepFaceLab with ONNX Runtime.
"""
XSeg: Face Segmentation Model from DeepFaceLab with ONNX Runtime.
XSeg outputs a soft probability mask for face regions. Unlike BiSeNet which works
XSeg outputs a mask for face regions. Unlike BiSeNet which works
on bbox crops, XSeg requires 5-point landmarks for face alignment. The model
uses NHWC input format and outputs values in [0, 1] range.
@@ -42,7 +43,6 @@ class XSeg(BaseFaceParser):
align_size (int): Face alignment output size.
blur_sigma (float): Blur sigma for post-processing.
input_size (tuple[int, int]): Model input dimensions (width, height).
mask_type (str): Output type identifier - "probability" for XSeg.
Example:
>>> from uniface.parsing import XSeg
@@ -56,11 +56,8 @@ class XSeg(BaseFaceParser):
... if face.landmarks is not None:
... mask = parser.parse(image, landmarks=face.landmarks)
... print(f'Mask shape: {mask.shape}')
... print(f'Output type: {parser.mask_type}') # "probability"
"""
mask_type = 'probability'
def __init__(
self,
model_name: XSegWeights = XSegWeights.DEFAULT,

View File

@@ -141,7 +141,7 @@ class BaseRecognizer(ABC):
image is already aligned.
Returns:
Face embedding with shape (1, 512) — raw ONNX output with batch dimension.
Face embedding vector (typically 512-dimensional).
"""
# If landmarks are provided, align the face first
if landmarks is not None:
@@ -164,9 +164,9 @@ class BaseRecognizer(ABC):
landmarks: Facial landmarks (5 points for alignment).
Returns:
L2-normalized face embedding as a 1D vector with shape (512,).
L2-normalized face embedding vector (typically 512-dimensional).
"""
embedding = self.get_embedding(image, landmarks).ravel()
embedding = self.get_embedding(image, landmarks)
norm = np.linalg.norm(embedding)
return embedding / norm if norm > 0 else embedding
@@ -178,6 +178,6 @@ class BaseRecognizer(ABC):
landmarks: Facial landmarks (5 points for alignment).
Returns:
L2-normalized face embedding as a 1D vector with shape (512,).
L2-normalized face embedding vector (typically 512-dimensional).
"""
return self.get_normalized_embedding(image, landmarks)

View File

@@ -28,7 +28,6 @@ __all__ = [
'EmotionResult',
'Face',
'GazeResult',
'HeadPoseResult',
'SpoofingResult',
]
@@ -49,24 +48,6 @@ class GazeResult:
return f'GazeResult(pitch={self.pitch:.4f}, yaw={self.yaw:.4f})'
@dataclass(slots=True, frozen=True)
class HeadPoseResult:
"""Result of head pose estimation.
Attributes:
pitch: Rotation around X-axis in degrees (positive = looking down).
yaw: Rotation around Y-axis in degrees (positive = looking right).
roll: Rotation around Z-axis in degrees (positive = tilting clockwise).
"""
pitch: float
yaw: float
roll: float
def __repr__(self) -> str:
return f'HeadPoseResult(pitch={self.pitch:.1f}, yaw={self.yaw:.1f}, roll={self.roll:.1f})'
@dataclass(slots=True, frozen=True)
class SpoofingResult:
"""Result of face anti-spoofing detection.
@@ -264,5 +245,5 @@ class Face:
if self.emotion is not None:
parts.append(f'emotion={self.emotion}')
if self.embedding is not None:
parts.append(f'embedding_dim={self.embedding.shape[-1]}')
parts.append(f'embedding_dim={self.embedding.shape[0]}')
return ', '.join(parts) + ')'