mirror of
https://github.com/yakhyo/uniface.git
synced 2026-05-14 20:35:28 +00:00
Compare commits
3 Commits
v3.2.0
...
feat/unifa
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
cb81d2fcf8 | ||
|
|
f0bae6dd80 | ||
|
|
eec8f99850 |
6
.github/workflows/ci.yml
vendored
6
.github/workflows/ci.yml
vendored
@@ -20,7 +20,7 @@ jobs:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
python-version: "3.10"
|
||||
- uses: pre-commit/action@v3.0.1
|
||||
|
||||
test:
|
||||
@@ -34,11 +34,9 @@ jobs:
|
||||
include:
|
||||
# Full Python range on Linux (fastest runner)
|
||||
- os: ubuntu-latest
|
||||
python-version: "3.11"
|
||||
python-version: "3.10"
|
||||
- os: ubuntu-latest
|
||||
python-version: "3.13"
|
||||
- os: ubuntu-latest
|
||||
python-version: "3.14"
|
||||
- os: macos-latest
|
||||
python-version: "3.13"
|
||||
- os: windows-latest
|
||||
|
||||
3
.github/workflows/docs.yml
vendored
3
.github/workflows/docs.yml
vendored
@@ -2,8 +2,7 @@ name: Deploy docs
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- "v*.*.*"
|
||||
branches: [main]
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
|
||||
4
.github/workflows/publish.yml
vendored
4
.github/workflows/publish.yml
vendored
@@ -54,7 +54,7 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ["3.11", "3.13"]
|
||||
python-version: ["3.10", "3.13"]
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
@@ -92,7 +92,7 @@ jobs:
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
python-version: "3.10"
|
||||
cache: 'pip'
|
||||
|
||||
- name: Install build tools
|
||||
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1,5 +1,6 @@
|
||||
tmp_*
|
||||
.vscode/
|
||||
*.onnx
|
||||
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
|
||||
@@ -59,12 +59,12 @@ This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formattin
|
||||
#### General Rules
|
||||
|
||||
- **Line length:** 120 characters maximum
|
||||
- **Python version:** 3.11+ (use modern syntax)
|
||||
- **Python version:** 3.10+ (use modern syntax)
|
||||
- **Quote style:** Single quotes for strings, double quotes for docstrings
|
||||
|
||||
#### Type Hints
|
||||
|
||||
Use modern Python 3.11+ type hints (PEP 585 and PEP 604):
|
||||
Use modern Python 3.10+ type hints (PEP 585 and PEP 604):
|
||||
|
||||
```python
|
||||
# Preferred (modern)
|
||||
@@ -184,9 +184,6 @@ Example notebooks demonstrating library usage:
|
||||
| Face Parsing | [06_face_parsing.ipynb](examples/06_face_parsing.ipynb) |
|
||||
| Face Anonymization | [07_face_anonymization.ipynb](examples/07_face_anonymization.ipynb) |
|
||||
| Gaze Estimation | [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb) |
|
||||
| Face Segmentation | [09_face_segmentation.ipynb](examples/09_face_segmentation.ipynb) |
|
||||
| Face Vector Store | [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb) |
|
||||
| Head Pose Estimation | [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) |
|
||||
|
||||
## Questions?
|
||||
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
<div align="center">
|
||||
|
||||
[](https://pypi.org/project/uniface/)
|
||||
[](https://www.python.org/)
|
||||
[](https://www.python.org/)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://github.com/yakhyo/uniface/actions)
|
||||
[](https://pepy.tech/projects/uniface)
|
||||
@@ -31,7 +31,6 @@
|
||||
- **Facial Landmarks** — 106-point landmark localization module (separate from 5-point detector landmarks)
|
||||
- **Face Parsing** — BiSeNet semantic segmentation (19 classes), XSeg face masking
|
||||
- **Gaze Estimation** — Real-time gaze direction with MobileGaze
|
||||
- **Head Pose Estimation** — 3D head orientation (pitch, yaw, roll) with 6D rotation representation
|
||||
- **Attribute Analysis** — Age, gender, race (FairFace), and emotion
|
||||
- **Vector Indexing** — FAISS-backed embedding store for fast multi-identity search
|
||||
- **Anti-Spoofing** — Face liveness detection with MiniFASNet
|
||||
@@ -186,7 +185,6 @@ Full documentation: https://yakhyo.github.io/uniface/
|
||||
| Recognition | WebFace600K | ArcFace |
|
||||
| Recognition | WebFace4M / 12M | AdaFace |
|
||||
| Gaze | Gaze360 | MobileGaze |
|
||||
| Head Pose | 300W-LP | HeadPose (ResNet, MobileNet) |
|
||||
| Parsing | CelebAMask-HQ | BiSeNet |
|
||||
| Attributes | CelebA, FairFace, AffectNet | AgeGender, FairFace, Emotion |
|
||||
|
||||
@@ -208,7 +206,6 @@ Full documentation: https://yakhyo.github.io/uniface/
|
||||
| [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | Gaze direction estimation |
|
||||
| [09_face_segmentation.ipynb](examples/09_face_segmentation.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | Face segmentation with XSeg |
|
||||
| [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
|
||||
| [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | Head pose estimation (pitch, yaw, roll) |
|
||||
|
||||
---
|
||||
|
||||
@@ -237,7 +234,6 @@ If you plan commercial use, verify model license compatibility.
|
||||
| Parsing | [face-parsing](https://github.com/yakhyo/face-parsing) | ✓ | BiSeNet Face Parsing |
|
||||
| Parsing | [face-segmentation](https://github.com/yakhyo/face-segmentation) | - | XSeg Face Segmentation |
|
||||
| Gaze | [gaze-estimation](https://github.com/yakhyo/gaze-estimation) | ✓ | MobileGaze Training |
|
||||
| Head Pose | [head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) | ✓ | Head Pose Training (6DRepNet-style) |
|
||||
| Anti-Spoofing | [face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) | - | MiniFASNet Inference |
|
||||
| Attributes | [fairface-onnx](https://github.com/yakhyo/fairface-onnx) | - | FairFace ONNX Inference |
|
||||
|
||||
|
||||
@@ -39,20 +39,16 @@ recognizer = ArcFace(providers=['CPUExecutionProvider'])
|
||||
detector = RetinaFace(providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
|
||||
```
|
||||
|
||||
All **ONNX-based** model classes accept the `providers` parameter:
|
||||
All model classes accept the `providers` parameter:
|
||||
|
||||
- Detection: `RetinaFace`, `SCRFD`, `YOLOv5Face`, `YOLOv8Face`
|
||||
- Recognition: `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace`
|
||||
- Landmarks: `Landmark106`
|
||||
- Gaze: `MobileGaze`
|
||||
- Parsing: `BiSeNet`, `XSeg`
|
||||
- Parsing: `BiSeNet`
|
||||
- Attributes: `AgeGender`, `FairFace`
|
||||
- Anti-Spoofing: `MiniFASNet`
|
||||
|
||||
!!! note "Non-ONNX components"
|
||||
- **Emotion** uses TorchScript and selects its device automatically (`mps` / `cuda` / `cpu`). It does **not** accept the `providers` parameter.
|
||||
- **BlurFace** is a pure OpenCV utility and does not load any model.
|
||||
|
||||
---
|
||||
|
||||
## Check Available Providers
|
||||
|
||||
@@ -106,27 +106,6 @@ print(f"Yaw: {np.degrees(result.yaw):.1f}°")
|
||||
|
||||
---
|
||||
|
||||
### HeadPoseResult
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class HeadPoseResult:
|
||||
pitch: float # Rotation around X-axis (degrees), + = looking down
|
||||
yaw: float # Rotation around Y-axis (degrees), + = looking right
|
||||
roll: float # Rotation around Z-axis (degrees), + = tilting clockwise
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```python
|
||||
result = head_pose.estimate(face_crop)
|
||||
print(f"Pitch: {result.pitch:.1f}°")
|
||||
print(f"Yaw: {result.yaw:.1f}°")
|
||||
print(f"Roll: {result.roll:.1f}°")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SpoofingResult
|
||||
|
||||
```python
|
||||
@@ -165,11 +144,11 @@ class AttributeResult:
|
||||
|
||||
```python
|
||||
# AgeGender model
|
||||
result = age_gender.predict(image, face)
|
||||
result = age_gender.predict(image, face.bbox)
|
||||
print(f"{result.sex}, {result.age} years old")
|
||||
|
||||
# FairFace model
|
||||
result = fairface.predict(image, face)
|
||||
result = fairface.predict(image, face.bbox)
|
||||
print(f"{result.sex}, {result.age_group}, {result.race}")
|
||||
```
|
||||
|
||||
@@ -192,7 +171,7 @@ Face recognition models return normalized 512-dimensional embeddings:
|
||||
|
||||
```python
|
||||
embedding = recognizer.get_normalized_embedding(image, landmarks)
|
||||
print(f"Shape: {embedding.shape}") # (512,)
|
||||
print(f"Shape: {embedding.shape}") # (1, 512)
|
||||
print(f"Norm: {np.linalg.norm(embedding):.4f}") # ~1.0
|
||||
```
|
||||
|
||||
|
||||
@@ -23,7 +23,6 @@ graph TB
|
||||
LMK[Landmarks]
|
||||
ATTR[Attributes]
|
||||
GAZE[Gaze]
|
||||
HPOSE[Head Pose]
|
||||
PARSE[Parsing]
|
||||
SPOOF[Anti-Spoofing]
|
||||
PRIV[Privacy]
|
||||
@@ -46,7 +45,6 @@ graph TB
|
||||
DET --> LMK
|
||||
DET --> ATTR
|
||||
DET --> GAZE
|
||||
DET --> HPOSE
|
||||
DET --> PARSE
|
||||
DET --> SPOOF
|
||||
DET --> PRIV
|
||||
@@ -121,11 +119,10 @@ uniface/
|
||||
├── attribute/ # Age, gender, emotion, race
|
||||
├── parsing/ # Face semantic segmentation
|
||||
├── gaze/ # Gaze estimation
|
||||
├── headpose/ # Head pose estimation
|
||||
├── spoofing/ # Anti-spoofing
|
||||
├── privacy/ # Face anonymization
|
||||
├── indexing/ # Vector indexing (FAISS)
|
||||
├── types.py # Dataclasses (Face, GazeResult, HeadPoseResult, etc.)
|
||||
├── types.py # Dataclasses (Face, GazeResult, etc.)
|
||||
├── constants.py # Model weights and URLs
|
||||
├── model_store.py # Model download and caching
|
||||
├── onnx_utils.py # ONNX Runtime utilities
|
||||
@@ -161,7 +158,7 @@ for face in faces:
|
||||
embedding = recognizer.get_normalized_embedding(image, face.landmarks)
|
||||
|
||||
# Attributes
|
||||
attrs = age_gender.predict(image, face)
|
||||
attrs = age_gender.predict(image, face.bbox)
|
||||
|
||||
print(f"Face: {attrs.sex}, {attrs.age} years")
|
||||
```
|
||||
@@ -186,7 +183,8 @@ fairface = FairFace()
|
||||
analyzer = FaceAnalyzer(
|
||||
detector,
|
||||
recognizer=recognizer,
|
||||
attributes=[age_gender, fairface],
|
||||
age_gender=age_gender,
|
||||
fairface=fairface,
|
||||
)
|
||||
|
||||
faces = analyzer.analyze(image)
|
||||
|
||||
@@ -32,7 +32,7 @@ ruff check . --fix
|
||||
**Guidelines:**
|
||||
|
||||
- Line length: 120
|
||||
- Python 3.11+ type hints
|
||||
- Python 3.10+ type hints
|
||||
- Google-style docstrings
|
||||
|
||||
---
|
||||
|
||||
@@ -183,30 +183,6 @@ data/
|
||||
|
||||
---
|
||||
|
||||
### Head Pose Estimation
|
||||
|
||||
#### 300W-LP
|
||||
|
||||
Large-scale synthesized face dataset with large pose variations, generated from 300W by face profiling. Used for training head pose estimation models.
|
||||
|
||||
| Property | Value |
|
||||
| ----------- | ----------------------------- |
|
||||
| Images | ~122,000 (synthesized) |
|
||||
| Source | 300W (profiled) |
|
||||
| Pose range | ±90° yaw |
|
||||
| Evaluation | AFLW2000 |
|
||||
| Used by | All HeadPose models |
|
||||
|
||||
!!! info "Download & Reference"
|
||||
**Paper**: [Face Alignment Across Large Poses: A 3D Solution](https://arxiv.org/abs/1511.07212)
|
||||
|
||||
**Training code**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation)
|
||||
|
||||
!!! note "UniFace Models"
|
||||
All HeadPose models shipped with UniFace are trained on 300W-LP and evaluated on AFLW2000.
|
||||
|
||||
---
|
||||
|
||||
### Face Parsing
|
||||
|
||||
#### CelebAMask-HQ
|
||||
|
||||
@@ -13,7 +13,7 @@ template: home.html
|
||||
<p class="hero-subtitle">All-in-One Open-Source Face Analysis Library</p>
|
||||
|
||||
[](https://pypi.org/project/uniface/)
|
||||
[](https://www.python.org/)
|
||||
[](https://www.python.org/)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://github.com/yakhyo/uniface/actions)
|
||||
[](https://pepy.tech/projects/uniface)
|
||||
@@ -59,11 +59,6 @@ BiSeNet semantic segmentation with 19 facial component classes.
|
||||
Real-time gaze direction prediction with MobileGaze models.
|
||||
</div>
|
||||
|
||||
<div class="feature-card" markdown>
|
||||
### :material-axis-arrow: Head Pose
|
||||
3D head orientation (pitch, yaw, roll) estimation with 6D rotation models.
|
||||
</div>
|
||||
|
||||
<div class="feature-card" markdown>
|
||||
### :material-motion-play: Tracking
|
||||
Multi-object tracking with BYTETracker for persistent face IDs across video frames.
|
||||
|
||||
@@ -6,7 +6,7 @@ This guide covers all installation options for UniFace.
|
||||
|
||||
## Requirements
|
||||
|
||||
- **Python**: 3.11 or higher
|
||||
- **Python**: 3.10 or higher
|
||||
- **Operating Systems**: macOS, Linux, Windows
|
||||
|
||||
---
|
||||
@@ -159,11 +159,11 @@ print("Installation successful!")
|
||||
|
||||
### Import Errors
|
||||
|
||||
If you encounter import errors, ensure you're using Python 3.11+:
|
||||
If you encounter import errors, ensure you're using Python 3.10+:
|
||||
|
||||
```bash
|
||||
python --version
|
||||
# Should show: Python 3.11.x or higher
|
||||
# Should show: Python 3.10.x or higher
|
||||
```
|
||||
|
||||
### Model Download Issues
|
||||
|
||||
@@ -257,33 +257,6 @@ Gaze direction prediction models trained on [Gaze360](datasets.md#gaze360) datas
|
||||
|
||||
---
|
||||
|
||||
## Head Pose Estimation Models
|
||||
|
||||
### HeadPose Family
|
||||
|
||||
Head pose estimation models using 6D rotation representation. Trained on [300W-LP](datasets.md#300w-lp) dataset, evaluated on AFLW2000. Returns pitch, yaw, and roll angles in degrees.
|
||||
|
||||
| Model Name | Backbone | Size | MAE* |
|
||||
| -------------- | -------- | ------- | ----- |
|
||||
| `RESNET18` :material-check-circle: | ResNet18 | 43 MB | 5.22° |
|
||||
| `RESNET34` | ResNet34 | 82 MB | 5.07° |
|
||||
| `RESNET50` | ResNet50 | 91 MB | 4.83° |
|
||||
| `MOBILENET_V2` | MobileNetV2 | 9.6 MB | 5.72° |
|
||||
| `MOBILENET_V3_SMALL` | MobileNetV3-Small | 4.8 MB | 6.31° |
|
||||
| `MOBILENET_V3_LARGE` | MobileNetV3-Large | 16 MB | 5.58° |
|
||||
|
||||
*MAE (Mean Absolute Error) in degrees on AFLW2000 test set — lower is better
|
||||
|
||||
!!! info "Training Data"
|
||||
**Dataset**: Trained on [300W-LP](datasets.md#300w-lp) (synthesized large-pose faces from 300W)
|
||||
|
||||
**Method**: 6D rotation representation (rotation matrix → Euler angles)
|
||||
|
||||
!!! note "Input Requirements"
|
||||
Requires face crop as input. Use face detection first to obtain bounding boxes.
|
||||
|
||||
---
|
||||
|
||||
## Face Parsing Models
|
||||
|
||||
### BiSeNet Family
|
||||
@@ -399,7 +372,6 @@ See [Model Cache & Offline Use](concepts/model-cache-offline.md) for full detail
|
||||
- **AdaFace ONNX**: [yakhyo/adaface-onnx](https://github.com/yakhyo/adaface-onnx) - ONNX export and inference
|
||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
||||
- **Head Pose Estimation**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) - 6D rotation head pose estimation training and ONNX models
|
||||
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
|
||||
- **Face Segmentation**: [yakhyo/face-segmentation](https://github.com/yakhyo/face-segmentation) - XSeg ONNX Inference
|
||||
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
|
||||
|
||||
@@ -30,10 +30,9 @@ age_gender = AgeGender()
|
||||
faces = detector.detect(image)
|
||||
|
||||
for face in faces:
|
||||
result = age_gender.predict(image, face)
|
||||
result = age_gender.predict(image, face.bbox)
|
||||
print(f"Gender: {result.sex}") # "Female" or "Male"
|
||||
print(f"Age: {result.age} years")
|
||||
# face.gender and face.age are also set automatically
|
||||
```
|
||||
|
||||
### Output
|
||||
@@ -65,11 +64,10 @@ fairface = FairFace()
|
||||
faces = detector.detect(image)
|
||||
|
||||
for face in faces:
|
||||
result = fairface.predict(image, face)
|
||||
result = fairface.predict(image, face.bbox)
|
||||
print(f"Gender: {result.sex}")
|
||||
print(f"Age Group: {result.age_group}")
|
||||
print(f"Race: {result.race}")
|
||||
# face.gender, face.age_group, face.race are also set automatically
|
||||
```
|
||||
|
||||
### Output
|
||||
@@ -134,7 +132,7 @@ emotion = Emotion(model_name=DDAMFNWeights.AFFECNET7)
|
||||
faces = detector.detect(image)
|
||||
|
||||
for face in faces:
|
||||
result = emotion.predict(image, face)
|
||||
result = emotion.predict(image, face.landmarks)
|
||||
print(f"Emotion: {result.emotion}")
|
||||
print(f"Confidence: {result.confidence:.2%}")
|
||||
```
|
||||
@@ -181,22 +179,6 @@ emotion = Emotion(model_name=DDAMFNWeights.AFFECNET8)
|
||||
|
||||
---
|
||||
|
||||
## Factory Function
|
||||
|
||||
Use `create_attribute_predictor()` for dynamic model selection:
|
||||
|
||||
```python
|
||||
from uniface import create_attribute_predictor
|
||||
|
||||
age_gender = create_attribute_predictor('age_gender')
|
||||
fairface = create_attribute_predictor('fairface')
|
||||
emotion = create_attribute_predictor('emotion')
|
||||
```
|
||||
|
||||
Available model names: `'age_gender'`, `'fairface'`, `'emotion'`.
|
||||
|
||||
---
|
||||
|
||||
## Combining Models
|
||||
|
||||
### Full Attribute Analysis
|
||||
@@ -213,10 +195,10 @@ faces = detector.detect(image)
|
||||
|
||||
for face in faces:
|
||||
# Get exact age from AgeGender
|
||||
ag_result = age_gender.predict(image, face)
|
||||
ag_result = age_gender.predict(image, face.bbox)
|
||||
|
||||
# Get race from FairFace
|
||||
ff_result = fairface.predict(image, face)
|
||||
ff_result = fairface.predict(image, face.bbox)
|
||||
|
||||
print(f"Gender: {ag_result.sex}")
|
||||
print(f"Exact Age: {ag_result.age}")
|
||||
@@ -233,7 +215,7 @@ from uniface.detection import RetinaFace
|
||||
|
||||
analyzer = FaceAnalyzer(
|
||||
RetinaFace(),
|
||||
attributes=[AgeGender()],
|
||||
age_gender=AgeGender(),
|
||||
)
|
||||
|
||||
faces = analyzer.analyze(image)
|
||||
@@ -275,7 +257,7 @@ def draw_attributes(image, face, result):
|
||||
|
||||
# Usage
|
||||
for face in faces:
|
||||
result = age_gender.predict(image, face)
|
||||
result = age_gender.predict(image, face.bbox)
|
||||
image = draw_attributes(image, face, result)
|
||||
|
||||
cv2.imwrite("attributes.jpg", image)
|
||||
|
||||
@@ -267,7 +267,6 @@ gaze = create_gaze_estimator() # Returns MobileGaze
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Head Pose Estimation](headpose.md) - 3D head orientation
|
||||
- [Anti-Spoofing](spoofing.md) - Face liveness detection
|
||||
- [Privacy](privacy.md) - Face anonymization
|
||||
- [Video Recipe](../recipes/video-webcam.md) - Real-time processing
|
||||
|
||||
@@ -1,232 +0,0 @@
|
||||
# Head Pose Estimation
|
||||
|
||||
Head pose estimation predicts the 3D orientation of a person's head (pitch, yaw, and roll angles).
|
||||
|
||||
---
|
||||
|
||||
## Available Models
|
||||
|
||||
| Model | Backbone | Size | MAE* |
|
||||
|-------|----------|------|------|
|
||||
| **ResNet18** :material-check-circle: | ResNet18 | 43 MB | 5.22° |
|
||||
| ResNet34 | ResNet34 | 82 MB | 5.07° |
|
||||
| ResNet50 | ResNet50 | 91 MB | 4.83° |
|
||||
| MobileNetV2 | MobileNetV2 | 9.6 MB | 5.72° |
|
||||
| MobileNetV3-Small | MobileNetV3 | 4.8 MB | 6.31° |
|
||||
| MobileNetV3-Large | MobileNetV3 | 16 MB | 5.58° |
|
||||
|
||||
*MAE = Mean Absolute Error on AFLW2000 test set (lower is better)
|
||||
|
||||
---
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
import cv2
|
||||
from uniface.detection import RetinaFace
|
||||
from uniface.headpose import HeadPose
|
||||
|
||||
detector = RetinaFace()
|
||||
head_pose = HeadPose()
|
||||
|
||||
image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
for face in faces:
|
||||
# Crop face
|
||||
x1, y1, x2, y2 = map(int, face.bbox)
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size > 0:
|
||||
# Estimate head pose
|
||||
result = head_pose.estimate(face_crop)
|
||||
print(f"Pitch: {result.pitch:.1f}°, Yaw: {result.yaw:.1f}°, Roll: {result.roll:.1f}°")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Variants
|
||||
|
||||
```python
|
||||
from uniface.headpose import HeadPose
|
||||
from uniface.constants import HeadPoseWeights
|
||||
|
||||
# Default (ResNet18, recommended balance of speed and accuracy)
|
||||
hp = HeadPose()
|
||||
|
||||
# Lightweight for mobile/edge
|
||||
hp = HeadPose(model_name=HeadPoseWeights.MOBILENET_V3_SMALL)
|
||||
|
||||
# Higher accuracy
|
||||
hp = HeadPose(model_name=HeadPoseWeights.RESNET50)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
```python
|
||||
result = head_pose.estimate(face_crop)
|
||||
|
||||
# HeadPoseResult dataclass
|
||||
result.pitch # Rotation around X-axis in degrees
|
||||
result.yaw # Rotation around Y-axis in degrees
|
||||
result.roll # Rotation around Z-axis in degrees
|
||||
```
|
||||
|
||||
### Angle Convention
|
||||
|
||||
```
|
||||
pitch > 0 (looking down)
|
||||
│
|
||||
│
|
||||
yaw < 0 ─────┼───── yaw > 0
|
||||
(looking left) │ (looking right)
|
||||
│
|
||||
pitch < 0 (looking up)
|
||||
|
||||
roll > 0 = clockwise tilt
|
||||
roll < 0 = counter-clockwise tilt
|
||||
```
|
||||
|
||||
- **Pitch**: Rotation around X-axis (positive = looking down)
|
||||
- **Yaw**: Rotation around Y-axis (positive = looking right)
|
||||
- **Roll**: Rotation around Z-axis (positive = tilting clockwise)
|
||||
|
||||
---
|
||||
|
||||
## Visualization
|
||||
|
||||
### 3D Cube (default)
|
||||
|
||||
The default visualization draws a wireframe cube oriented to match the head pose.
|
||||
|
||||
```python
|
||||
from uniface.draw import draw_head_pose
|
||||
|
||||
faces = detector.detect(image)
|
||||
|
||||
for face in faces:
|
||||
x1, y1, x2, y2 = map(int, face.bbox)
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size > 0:
|
||||
result = head_pose.estimate(face_crop)
|
||||
|
||||
# Draw cube on image (default)
|
||||
draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll)
|
||||
|
||||
cv2.imwrite("headpose_output.jpg", image)
|
||||
```
|
||||
|
||||
### Axis Visualization
|
||||
|
||||
```python
|
||||
from uniface.draw import draw_head_pose
|
||||
|
||||
# X/Y/Z coordinate axes
|
||||
draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll, draw_type='axis')
|
||||
```
|
||||
|
||||
### Low-Level Drawing Functions
|
||||
|
||||
```python
|
||||
from uniface.draw import draw_head_pose_cube, draw_head_pose_axis
|
||||
|
||||
# Draw cube directly
|
||||
draw_head_pose_cube(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
|
||||
|
||||
# Draw axes directly
|
||||
draw_head_pose_axis(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Real-Time Head Pose Tracking
|
||||
|
||||
```python
|
||||
import cv2
|
||||
from uniface.detection import RetinaFace
|
||||
from uniface.headpose import HeadPose
|
||||
from uniface.draw import draw_head_pose
|
||||
|
||||
detector = RetinaFace()
|
||||
head_pose = HeadPose()
|
||||
|
||||
cap = cv2.VideoCapture(0)
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
faces = detector.detect(frame)
|
||||
|
||||
for face in faces:
|
||||
x1, y1, x2, y2 = map(int, face.bbox)
|
||||
face_crop = frame[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size > 0:
|
||||
result = head_pose.estimate(face_crop)
|
||||
draw_head_pose(frame, face.bbox, result.pitch, result.yaw, result.roll)
|
||||
|
||||
cv2.imshow("Head Pose Estimation", frame)
|
||||
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Driver Drowsiness Detection
|
||||
|
||||
```python
|
||||
def is_head_drooping(result, pitch_threshold=-15):
|
||||
"""Check if the head is drooping (looking down significantly)."""
|
||||
return result.pitch < pitch_threshold
|
||||
|
||||
result = head_pose.estimate(face_crop)
|
||||
if is_head_drooping(result):
|
||||
print("Warning: Head drooping detected")
|
||||
```
|
||||
|
||||
### Attention Monitoring
|
||||
|
||||
```python
|
||||
def is_facing_forward(result, threshold=20):
|
||||
"""Check if the person is facing roughly forward."""
|
||||
return (
|
||||
abs(result.pitch) < threshold
|
||||
and abs(result.yaw) < threshold
|
||||
and abs(result.roll) < threshold
|
||||
)
|
||||
|
||||
result = head_pose.estimate(face_crop)
|
||||
if is_facing_forward(result):
|
||||
print("Facing forward")
|
||||
else:
|
||||
print("Looking away")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Factory Function
|
||||
|
||||
```python
|
||||
from uniface.headpose import create_head_pose_estimator
|
||||
|
||||
hp = create_head_pose_estimator() # Returns HeadPose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Gaze Estimation](gaze.md) - Eye gaze direction
|
||||
- [Anti-Spoofing](spoofing.md) - Face liveness detection
|
||||
- [Video Recipe](../recipes/video-webcam.md) - Real-time processing
|
||||
@@ -18,7 +18,6 @@ Run UniFace examples directly in your browser with Google Colab, or download and
|
||||
| [Gaze Estimation](https://github.com/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | Gaze direction estimation |
|
||||
| [Face Segmentation](https://github.com/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | Face segmentation with XSeg |
|
||||
| [Face Vector Store](https://github.com/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
|
||||
| [Head Pose Estimation](https://github.com/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | [](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | 3D head orientation estimation |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -80,6 +80,7 @@ Compare two faces:
|
||||
|
||||
```python
|
||||
import cv2
|
||||
import numpy as np
|
||||
from uniface.detection import RetinaFace
|
||||
from uniface.recognition import ArcFace
|
||||
|
||||
@@ -96,13 +97,12 @@ faces1 = detector.detect(image1)
|
||||
faces2 = detector.detect(image2)
|
||||
|
||||
if faces1 and faces2:
|
||||
# Extract embeddings (normalized 1-D vectors)
|
||||
# Extract embeddings
|
||||
emb1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
|
||||
emb2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)
|
||||
|
||||
# Compute cosine similarity
|
||||
from uniface import compute_similarity
|
||||
similarity = compute_similarity(emb1, emb2, normalized=True)
|
||||
# Compute similarity (cosine similarity)
|
||||
similarity = np.dot(emb1, emb2.T)[0][0]
|
||||
|
||||
# Interpret result
|
||||
if similarity > 0.6:
|
||||
@@ -135,7 +135,7 @@ faces = detector.detect(image)
|
||||
|
||||
# Predict attributes
|
||||
for i, face in enumerate(faces):
|
||||
result = age_gender.predict(image, face)
|
||||
result = age_gender.predict(image, face.bbox)
|
||||
print(f"Face {i+1}: {result.sex}, {result.age} years old")
|
||||
```
|
||||
|
||||
@@ -164,7 +164,7 @@ image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
result = fairface.predict(image, face)
|
||||
result = fairface.predict(image, face.bbox)
|
||||
print(f"Face {i+1}: {result.sex}, {result.age_group}, {result.race}")
|
||||
```
|
||||
|
||||
@@ -234,36 +234,6 @@ cv2.imwrite("gaze_output.jpg", image)
|
||||
|
||||
---
|
||||
|
||||
## Head Pose Estimation
|
||||
|
||||
```python
|
||||
import cv2
|
||||
from uniface.detection import RetinaFace
|
||||
from uniface.headpose import HeadPose
|
||||
from uniface.draw import draw_head_pose
|
||||
|
||||
detector = RetinaFace()
|
||||
head_pose = HeadPose()
|
||||
|
||||
image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
x1, y1, x2, y2 = map(int, face.bbox[:4])
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size > 0:
|
||||
result = head_pose.estimate(face_crop)
|
||||
print(f"Face {i+1}: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°")
|
||||
|
||||
# Draw 3D cube visualization
|
||||
draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll)
|
||||
|
||||
cv2.imwrite("headpose_output.jpg", image)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Face Parsing
|
||||
|
||||
Segment face into semantic components:
|
||||
@@ -454,7 +424,6 @@ For detailed model comparisons and benchmarks, see the [Model Zoo](models.md).
|
||||
| Recognition | `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace` |
|
||||
| Tracking | `BYTETracker` |
|
||||
| Gaze | `MobileGaze` (ResNet18/34/50, MobileNetV2, MobileOneS0) |
|
||||
| Head Pose | `HeadPose` (ResNet18/34/50, MobileNetV2/V3) |
|
||||
| Parsing | `BiSeNet` (ResNet18/34) |
|
||||
| Attributes | `AgeGender`, `FairFace`, `Emotion` |
|
||||
| Anti-Spoofing | `MiniFASNet` (V1SE, V2) |
|
||||
@@ -501,7 +470,6 @@ from uniface.recognition import ArcFace, AdaFace
|
||||
from uniface.attribute import AgeGender, FairFace
|
||||
from uniface.landmark import Landmark106
|
||||
from uniface.gaze import MobileGaze
|
||||
from uniface.headpose import HeadPose
|
||||
from uniface.parsing import BiSeNet, XSeg
|
||||
from uniface.privacy import BlurFace
|
||||
from uniface.spoofing import MiniFASNet
|
||||
|
||||
@@ -34,7 +34,7 @@ def process_image(image_path):
|
||||
embedding = recognizer.get_normalized_embedding(image, face.landmarks)
|
||||
|
||||
# Step 3: Predict attributes
|
||||
attrs = age_gender.predict(image, face)
|
||||
attrs = age_gender.predict(image, face.bbox)
|
||||
|
||||
results.append({
|
||||
'face_id': i,
|
||||
@@ -83,7 +83,7 @@ age_gender = AgeGender()
|
||||
analyzer = FaceAnalyzer(
|
||||
detector,
|
||||
recognizer=recognizer,
|
||||
attributes=[age_gender],
|
||||
age_gender=age_gender,
|
||||
)
|
||||
|
||||
# Process image
|
||||
@@ -109,12 +109,11 @@ import numpy as np
|
||||
from uniface.attribute import AgeGender, FairFace
|
||||
from uniface.detection import RetinaFace
|
||||
from uniface.gaze import MobileGaze
|
||||
from uniface.headpose import HeadPose
|
||||
from uniface.landmark import Landmark106
|
||||
from uniface.recognition import ArcFace
|
||||
from uniface.parsing import BiSeNet
|
||||
from uniface.spoofing import MiniFASNet
|
||||
from uniface.draw import draw_detections, draw_gaze, draw_head_pose
|
||||
from uniface.draw import draw_detections, draw_gaze
|
||||
|
||||
class FaceAnalysisPipeline:
|
||||
def __init__(self):
|
||||
@@ -125,7 +124,6 @@ class FaceAnalysisPipeline:
|
||||
self.fairface = FairFace()
|
||||
self.landmarker = Landmark106()
|
||||
self.gaze = MobileGaze()
|
||||
self.head_pose = HeadPose()
|
||||
self.parser = BiSeNet()
|
||||
self.spoofer = MiniFASNet()
|
||||
|
||||
@@ -147,12 +145,12 @@ class FaceAnalysisPipeline:
|
||||
)
|
||||
|
||||
# Attributes
|
||||
ag_result = self.age_gender.predict(image, face)
|
||||
ag_result = self.age_gender.predict(image, face.bbox)
|
||||
result['age'] = ag_result.age
|
||||
result['gender'] = ag_result.sex
|
||||
|
||||
# FairFace attributes
|
||||
ff_result = self.fairface.predict(image, face)
|
||||
ff_result = self.fairface.predict(image, face.bbox)
|
||||
result['age_group'] = ff_result.age_group
|
||||
result['race'] = ff_result.race
|
||||
|
||||
@@ -169,13 +167,6 @@ class FaceAnalysisPipeline:
|
||||
result['gaze_pitch'] = gaze_result.pitch
|
||||
result['gaze_yaw'] = gaze_result.yaw
|
||||
|
||||
# Head pose estimation
|
||||
if face_crop.size > 0:
|
||||
hp_result = self.head_pose.estimate(face_crop)
|
||||
result['head_pitch'] = hp_result.pitch
|
||||
result['head_yaw'] = hp_result.yaw
|
||||
result['head_roll'] = hp_result.roll
|
||||
|
||||
# Face parsing
|
||||
if face_crop.size > 0:
|
||||
result['parsing_mask'] = self.parser.parse(face_crop)
|
||||
@@ -198,7 +189,6 @@ for i, r in enumerate(results):
|
||||
print(f" Gender: {r['gender']}, Age: {r['age']}")
|
||||
print(f" Race: {r['race']}, Age Group: {r['age_group']}")
|
||||
print(f" Gaze: pitch={np.degrees(r['gaze_pitch']):.1f}°")
|
||||
print(f" Head Pose: P={r['head_pitch']:.1f}° Y={r['head_yaw']:.1f}° R={r['head_roll']:.1f}°")
|
||||
print(f" Real: {r['is_real']} ({r['spoof_confidence']:.1%})")
|
||||
```
|
||||
|
||||
@@ -230,7 +220,7 @@ def visualize_analysis(image_path, output_path):
|
||||
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
|
||||
|
||||
# Age and gender
|
||||
attrs = age_gender.predict(image, face)
|
||||
attrs = age_gender.predict(image, face.bbox)
|
||||
label = f"{attrs.sex}, {attrs.age}y"
|
||||
cv2.putText(image, label, (x1, y1 - 10),
|
||||
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
|
||||
@@ -278,11 +268,6 @@ def results_to_json(results):
|
||||
'gaze': {
|
||||
'pitch_deg': float(np.degrees(r['gaze_pitch'])) if 'gaze_pitch' in r else None,
|
||||
'yaw_deg': float(np.degrees(r['gaze_yaw'])) if 'gaze_yaw' in r else None
|
||||
},
|
||||
'head_pose': {
|
||||
'pitch': float(r['head_pitch']) if 'head_pitch' in r else None,
|
||||
'yaw': float(r['head_yaw']) if 'head_yaw' in r else None,
|
||||
'roll': float(r['head_roll']) if 'head_roll' in r else None
|
||||
}
|
||||
}
|
||||
output.append(item)
|
||||
@@ -306,4 +291,3 @@ with open('results.json', 'w') as f:
|
||||
- [Face Search](face-search.md) - Build a search system
|
||||
- [Detection Module](../modules/detection.md) - Detection options
|
||||
- [Recognition Module](../modules/recognition.md) - Recognition details
|
||||
- [Head Pose Module](../modules/headpose.md) - Head orientation estimation
|
||||
|
||||
@@ -175,4 +175,3 @@ while True:
|
||||
- [Batch Processing](batch-processing.md) - Process multiple files
|
||||
- [Detection Module](../modules/detection.md) - Detection options
|
||||
- [Gaze Module](../modules/gaze.md) - Gaze estimation
|
||||
- [Head Pose Module](../modules/headpose.md) - Head orientation estimation
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"3.2.0\n"
|
||||
"3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -55,7 +55,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"3.2.0\n"
|
||||
"3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -44,7 +44,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"3.2.0\n"
|
||||
"3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -53,7 +53,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UniFace version: 3.2.0\n"
|
||||
"UniFace version: 3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UniFace version: 3.2.0\n"
|
||||
"UniFace version: 3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UniFace version: 3.2.0\n"
|
||||
"UniFace version: 3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -53,7 +53,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UniFace version: 3.2.0\n"
|
||||
"UniFace version: 3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -54,7 +54,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UniFace version: 3.2.0\n"
|
||||
"UniFace version: 3.0.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -151,7 +151,6 @@ nav:
|
||||
- Attributes: modules/attributes.md
|
||||
- Parsing: modules/parsing.md
|
||||
- Gaze: modules/gaze.md
|
||||
- Head Pose: modules/headpose.md
|
||||
- Anti-Spoofing: modules/spoofing.md
|
||||
- Privacy: modules/privacy.md
|
||||
- Indexing: modules/indexing.md
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
[project]
|
||||
name = "uniface"
|
||||
version = "3.2.0"
|
||||
version = "3.1.0"
|
||||
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Tracking, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
|
||||
readme = "README.md"
|
||||
license = "MIT"
|
||||
@@ -9,7 +9,7 @@ maintainers = [
|
||||
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" },
|
||||
]
|
||||
|
||||
requires-python = ">=3.11,<3.15"
|
||||
requires-python = ">=3.10,<3.14"
|
||||
keywords = [
|
||||
"face-detection",
|
||||
"face-recognition",
|
||||
@@ -34,17 +34,17 @@ classifiers = [
|
||||
"Intended Audience :: Science/Research",
|
||||
"Operating System :: OS Independent",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
"Programming Language :: Python :: 3.13",
|
||||
"Programming Language :: Python :: 3.14",
|
||||
]
|
||||
|
||||
dependencies = [
|
||||
"numpy>=1.21.0",
|
||||
"opencv-python>=4.5.0",
|
||||
"onnxruntime>=1.16.0",
|
||||
"scikit-image>=0.26.0",
|
||||
"scikit-image>=0.19.0",
|
||||
"scipy>=1.7.0",
|
||||
"requests>=2.28.0",
|
||||
"tqdm>=4.64.0",
|
||||
@@ -73,7 +73,7 @@ uniface = ["py.typed"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 120
|
||||
target-version = "py311"
|
||||
target-version = "py310"
|
||||
exclude = [
|
||||
".git",
|
||||
".ruff_cache",
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
numpy>=1.21.0
|
||||
opencv-python>=4.5.0
|
||||
onnxruntime>=1.16.0
|
||||
scikit-image>=0.26.0
|
||||
scikit-image>=0.19.0
|
||||
scipy>=1.7.0
|
||||
requests>=2.28.0
|
||||
tqdm>=4.64.0
|
||||
|
||||
@@ -9,14 +9,6 @@ import numpy as np
|
||||
import pytest
|
||||
|
||||
from uniface.attribute import AgeGender, AttributeResult
|
||||
from uniface.types import Face
|
||||
|
||||
|
||||
def _make_face(bbox: list[int] | np.ndarray) -> Face:
|
||||
"""Helper: build a minimal Face from a bounding box."""
|
||||
bbox = np.asarray(bbox)
|
||||
landmarks = np.zeros((5, 2), dtype=np.float32)
|
||||
return Face(bbox=bbox, confidence=0.99, landmarks=landmarks)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
@@ -30,30 +22,30 @@ def mock_image():
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_face():
|
||||
return _make_face([100, 100, 300, 300])
|
||||
def mock_bbox():
|
||||
return [100, 100, 300, 300]
|
||||
|
||||
|
||||
def test_model_initialization(age_gender_model):
|
||||
assert age_gender_model is not None, 'AgeGender model initialization failed.'
|
||||
|
||||
|
||||
def test_prediction_output_format(age_gender_model, mock_image, mock_face):
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
def test_prediction_output_format(age_gender_model, mock_image, mock_bbox):
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
assert isinstance(result, AttributeResult), f'Result should be AttributeResult, got {type(result)}'
|
||||
assert isinstance(result.gender, int), f'Gender should be int, got {type(result.gender)}'
|
||||
assert isinstance(result.age, int), f'Age should be int, got {type(result.age)}'
|
||||
assert isinstance(result.sex, str), f'Sex should be str, got {type(result.sex)}'
|
||||
|
||||
|
||||
def test_gender_values(age_gender_model, mock_image, mock_face):
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
def test_gender_values(age_gender_model, mock_image, mock_bbox):
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
assert result.gender in [0, 1], f'Gender should be 0 (Female) or 1 (Male), got {result.gender}'
|
||||
assert result.sex in ['Female', 'Male'], f'Sex should be Female or Male, got {result.sex}'
|
||||
|
||||
|
||||
def test_age_range(age_gender_model, mock_image, mock_face):
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
def test_age_range(age_gender_model, mock_image, mock_bbox):
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
assert 0 <= result.age <= 120, f'Age should be between 0 and 120, got {result.age}'
|
||||
|
||||
|
||||
@@ -65,52 +57,39 @@ def test_different_bbox_sizes(age_gender_model, mock_image):
|
||||
]
|
||||
|
||||
for bbox in test_bboxes:
|
||||
face = _make_face(bbox)
|
||||
result = age_gender_model.predict(mock_image, face)
|
||||
result = age_gender_model.predict(mock_image, bbox)
|
||||
assert result.gender in [0, 1], f'Failed for bbox {bbox}'
|
||||
assert 0 <= result.age <= 120, f'Age out of range for bbox {bbox}'
|
||||
|
||||
|
||||
def test_different_image_sizes(age_gender_model):
|
||||
def test_different_image_sizes(age_gender_model, mock_bbox):
|
||||
test_sizes = [(480, 640, 3), (720, 1280, 3), (1080, 1920, 3)]
|
||||
face = _make_face([100, 100, 300, 300])
|
||||
|
||||
for size in test_sizes:
|
||||
mock_image = np.random.randint(0, 255, size, dtype=np.uint8)
|
||||
result = age_gender_model.predict(mock_image, face)
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
assert result.gender in [0, 1], f'Failed for image size {size}'
|
||||
assert 0 <= result.age <= 120, f'Age out of range for image size {size}'
|
||||
|
||||
|
||||
def test_consistency(age_gender_model, mock_image, mock_face):
|
||||
result1 = age_gender_model.predict(mock_image, mock_face)
|
||||
result2 = age_gender_model.predict(mock_image, mock_face)
|
||||
def test_consistency(age_gender_model, mock_image, mock_bbox):
|
||||
result1 = age_gender_model.predict(mock_image, mock_bbox)
|
||||
result2 = age_gender_model.predict(mock_image, mock_bbox)
|
||||
|
||||
assert result1.gender == result2.gender, 'Same input should produce same gender prediction'
|
||||
assert result1.age == result2.age, 'Same input should produce same age prediction'
|
||||
|
||||
|
||||
def test_face_enrichment(age_gender_model, mock_image, mock_face):
|
||||
"""predict() must write gender & age back to the Face object."""
|
||||
assert mock_face.gender is None
|
||||
assert mock_face.age is None
|
||||
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
|
||||
assert mock_face.gender == result.gender
|
||||
assert mock_face.age == result.age
|
||||
|
||||
|
||||
def test_bbox_list_format(age_gender_model, mock_image):
|
||||
face = _make_face([100, 100, 300, 300])
|
||||
result = age_gender_model.predict(mock_image, face)
|
||||
bbox_list = [100, 100, 300, 300]
|
||||
result = age_gender_model.predict(mock_image, bbox_list)
|
||||
assert result.gender in [0, 1], 'Should work with bbox as list'
|
||||
assert 0 <= result.age <= 120, 'Age should be in valid range'
|
||||
|
||||
|
||||
def test_bbox_array_format(age_gender_model, mock_image):
|
||||
face = _make_face(np.array([100, 100, 300, 300]))
|
||||
result = age_gender_model.predict(mock_image, face)
|
||||
bbox_array = np.array([100, 100, 300, 300])
|
||||
result = age_gender_model.predict(mock_image, bbox_array)
|
||||
assert result.gender in [0, 1], 'Should work with bbox as numpy array'
|
||||
assert 0 <= result.age <= 120, 'Age should be in valid range'
|
||||
|
||||
@@ -124,8 +103,7 @@ def test_multiple_predictions(age_gender_model, mock_image):
|
||||
|
||||
results = []
|
||||
for bbox in bboxes:
|
||||
face = _make_face(bbox)
|
||||
result = age_gender_model.predict(mock_image, face)
|
||||
result = age_gender_model.predict(mock_image, bbox)
|
||||
results.append(result)
|
||||
|
||||
assert len(results) == 3, 'Should have 3 predictions'
|
||||
@@ -134,26 +112,28 @@ def test_multiple_predictions(age_gender_model, mock_image):
|
||||
assert 0 <= result.age <= 120
|
||||
|
||||
|
||||
def test_age_is_positive(age_gender_model, mock_image, mock_face):
|
||||
def test_age_is_positive(age_gender_model, mock_image, mock_bbox):
|
||||
for _ in range(5):
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
assert result.age >= 0, f'Age should be non-negative, got {result.age}'
|
||||
|
||||
|
||||
def test_output_format_for_visualization(age_gender_model, mock_image, mock_face):
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
def test_output_format_for_visualization(age_gender_model, mock_image, mock_bbox):
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
text = f'{result.sex}, {result.age}y'
|
||||
assert isinstance(text, str), 'Should be able to format as string'
|
||||
assert 'Male' in text or 'Female' in text, 'Text should contain gender'
|
||||
assert 'y' in text, "Text should contain 'y' for years"
|
||||
|
||||
|
||||
def test_attribute_result_fields(age_gender_model, mock_image, mock_face):
|
||||
def test_attribute_result_fields(age_gender_model, mock_image, mock_bbox):
|
||||
"""Test that AttributeResult has correct fields for AgeGender model."""
|
||||
result = age_gender_model.predict(mock_image, mock_face)
|
||||
result = age_gender_model.predict(mock_image, mock_bbox)
|
||||
|
||||
# AgeGender should set gender and age
|
||||
assert result.gender is not None
|
||||
assert result.age is not None
|
||||
|
||||
# AgeGender should NOT set race and age_group (FairFace only)
|
||||
assert result.race is None
|
||||
assert result.age_group is None
|
||||
|
||||
@@ -9,14 +9,12 @@ import numpy as np
|
||||
import pytest
|
||||
|
||||
from uniface import (
|
||||
create_attribute_predictor,
|
||||
create_detector,
|
||||
create_landmarker,
|
||||
create_recognizer,
|
||||
list_available_detectors,
|
||||
)
|
||||
from uniface.attribute import AgeGender, FairFace
|
||||
from uniface.constants import AgeGenderWeights, FairFaceWeights, RetinaFaceWeights, SCRFDWeights
|
||||
from uniface.constants import RetinaFaceWeights, SCRFDWeights
|
||||
from uniface.spoofing import MiniFASNet, create_spoofer
|
||||
|
||||
|
||||
@@ -167,7 +165,7 @@ def test_recognizer_inference_from_factory():
|
||||
|
||||
embedding = recognizer.get_embedding(mock_image)
|
||||
assert embedding is not None, 'Recognizer should return embedding'
|
||||
assert embedding.shape == (1, 512), 'get_embedding should return (1, 512) with batch dimension'
|
||||
assert embedding.shape[1] == 512, 'Should return 512-dimensional embedding'
|
||||
|
||||
|
||||
def test_landmarker_inference_from_factory():
|
||||
@@ -238,19 +236,3 @@ def test_create_spoofer_with_providers():
|
||||
"""Test that create_spoofer forwards providers kwarg without TypeError."""
|
||||
spoofer = create_spoofer(providers=['CPUExecutionProvider'])
|
||||
assert isinstance(spoofer, MiniFASNet), 'Should return MiniFASNet instance'
|
||||
|
||||
|
||||
# create_attribute_predictor tests
|
||||
def test_create_attribute_predictor_age_gender():
|
||||
predictor = create_attribute_predictor(AgeGenderWeights.DEFAULT)
|
||||
assert isinstance(predictor, AgeGender), 'Should return AgeGender instance'
|
||||
|
||||
|
||||
def test_create_attribute_predictor_fairface():
|
||||
predictor = create_attribute_predictor(FairFaceWeights.DEFAULT)
|
||||
assert isinstance(predictor, FairFace), 'Should return FairFace instance'
|
||||
|
||||
|
||||
def test_create_attribute_predictor_invalid():
|
||||
with pytest.raises(ValueError, match='Unsupported attribute model'):
|
||||
create_attribute_predictor('invalid_model')
|
||||
|
||||
@@ -1,115 +0,0 @@
|
||||
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
from uniface import HeadPose, HeadPoseResult, create_head_pose_estimator
|
||||
from uniface.headpose import BaseHeadPoseEstimator
|
||||
from uniface.headpose.models import HeadPose as HeadPoseModel
|
||||
|
||||
|
||||
def test_create_head_pose_estimator_default():
|
||||
"""Test creating a head pose estimator with default parameters."""
|
||||
estimator = create_head_pose_estimator()
|
||||
assert isinstance(estimator, HeadPose), 'Should return HeadPose instance'
|
||||
|
||||
|
||||
def test_create_head_pose_estimator_aliases():
|
||||
"""Test that factory accepts all documented aliases."""
|
||||
for alias in ('headpose', 'head_pose', '6drepnet'):
|
||||
estimator = create_head_pose_estimator(alias)
|
||||
assert isinstance(estimator, HeadPose), f"Alias '{alias}' should return HeadPose"
|
||||
|
||||
|
||||
def test_create_head_pose_estimator_invalid():
|
||||
"""Test that invalid method raises ValueError."""
|
||||
with pytest.raises(ValueError, match='Unsupported head pose estimation method'):
|
||||
create_head_pose_estimator('invalid_method')
|
||||
|
||||
|
||||
def test_head_pose_inference():
|
||||
"""Test that HeadPose can run inference on a mock image."""
|
||||
estimator = HeadPose()
|
||||
mock_image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
|
||||
result = estimator.estimate(mock_image)
|
||||
|
||||
assert isinstance(result, HeadPoseResult), 'Should return HeadPoseResult'
|
||||
assert isinstance(result.pitch, float), 'pitch should be float'
|
||||
assert isinstance(result.yaw, float), 'yaw should be float'
|
||||
assert isinstance(result.roll, float), 'roll should be float'
|
||||
|
||||
|
||||
def test_head_pose_callable():
|
||||
"""Test that HeadPose is callable via __call__."""
|
||||
estimator = HeadPose()
|
||||
mock_image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
|
||||
result = estimator(mock_image)
|
||||
|
||||
assert isinstance(result, HeadPoseResult), '__call__ should return HeadPoseResult'
|
||||
|
||||
|
||||
def test_head_pose_result_repr():
|
||||
"""Test HeadPoseResult repr formatting."""
|
||||
result = HeadPoseResult(pitch=10.5, yaw=-20.3, roll=5.1)
|
||||
repr_str = repr(result)
|
||||
assert 'HeadPoseResult' in repr_str
|
||||
assert '10.5' in repr_str
|
||||
assert '-20.3' in repr_str
|
||||
assert '5.1' in repr_str
|
||||
|
||||
|
||||
def test_head_pose_result_frozen():
|
||||
"""Test that HeadPoseResult is immutable."""
|
||||
result = HeadPoseResult(pitch=1.0, yaw=2.0, roll=3.0)
|
||||
with pytest.raises(AttributeError):
|
||||
result.pitch = 99.0 # type: ignore[misc]
|
||||
|
||||
|
||||
def test_rotation_matrix_to_euler_identity():
|
||||
"""Test that identity rotation matrix gives zero angles."""
|
||||
identity = np.eye(3).reshape(1, 3, 3)
|
||||
euler = HeadPoseModel.rotation_matrix_to_euler(identity)
|
||||
|
||||
assert euler.shape == (1, 3), 'Should return (1, 3) shaped array'
|
||||
np.testing.assert_allclose(euler[0], [0.0, 0.0, 0.0], atol=1e-5)
|
||||
|
||||
|
||||
def test_rotation_matrix_to_euler_90deg_yaw():
|
||||
"""Test 90-degree yaw rotation."""
|
||||
angle = np.radians(90)
|
||||
R = np.array(
|
||||
[
|
||||
[np.cos(angle), 0, np.sin(angle)],
|
||||
[0, 1, 0],
|
||||
[-np.sin(angle), 0, np.cos(angle)],
|
||||
]
|
||||
).reshape(1, 3, 3)
|
||||
euler = HeadPoseModel.rotation_matrix_to_euler(R)
|
||||
|
||||
np.testing.assert_allclose(euler[0, 1], 90.0, atol=1e-3)
|
||||
|
||||
|
||||
def test_rotation_matrix_to_euler_batch():
|
||||
"""Test batch processing of rotation matrices."""
|
||||
batch = np.stack([np.eye(3), np.eye(3), np.eye(3)], axis=0)
|
||||
euler = HeadPoseModel.rotation_matrix_to_euler(batch)
|
||||
|
||||
assert euler.shape == (3, 3), 'Batch of 3 should return (3, 3)'
|
||||
np.testing.assert_allclose(euler, 0.0, atol=1e-5)
|
||||
|
||||
|
||||
def test_factory_returns_correct_type():
|
||||
"""Test that factory function returns BaseHeadPoseEstimator subclass."""
|
||||
estimator = create_head_pose_estimator()
|
||||
assert isinstance(estimator, BaseHeadPoseEstimator), 'Should be BaseHeadPoseEstimator subclass'
|
||||
|
||||
|
||||
def test_head_pose_with_providers():
|
||||
"""Test that HeadPose accepts providers kwarg."""
|
||||
estimator = HeadPose(providers=['CPUExecutionProvider'])
|
||||
assert isinstance(estimator, HeadPose), 'Should create with explicit providers'
|
||||
@@ -74,7 +74,7 @@ def test_arcface_embedding_shape(arcface_model, mock_aligned_face):
|
||||
"""
|
||||
embedding = arcface_model.get_embedding(mock_aligned_face)
|
||||
|
||||
# ArcFace get_embedding returns raw ONNX output with batch dimension
|
||||
# ArcFace typically produces 512-dimensional embeddings
|
||||
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
|
||||
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
|
||||
|
||||
@@ -88,8 +88,7 @@ def test_arcface_normalized_embedding(arcface_model, mock_landmarks):
|
||||
|
||||
embedding = arcface_model.get_normalized_embedding(mock_image, mock_landmarks)
|
||||
|
||||
# Check shape and normalization
|
||||
assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
|
||||
# Check that embedding is normalized (L2 norm ≈ 1.0)
|
||||
norm = np.linalg.norm(embedding)
|
||||
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
|
||||
|
||||
@@ -126,7 +125,7 @@ def test_mobileface_embedding_shape(mobileface_model, mock_aligned_face):
|
||||
"""
|
||||
embedding = mobileface_model.get_embedding(mock_aligned_face)
|
||||
|
||||
# MobileFace get_embedding returns raw ONNX output with batch dimension
|
||||
# MobileFace typically produces 512-dimensional embeddings
|
||||
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
|
||||
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
|
||||
|
||||
@@ -139,7 +138,6 @@ def test_mobileface_normalized_embedding(mobileface_model, mock_landmarks):
|
||||
|
||||
embedding = mobileface_model.get_normalized_embedding(mock_image, mock_landmarks)
|
||||
|
||||
assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
|
||||
norm = np.linalg.norm(embedding)
|
||||
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
|
||||
|
||||
@@ -158,7 +156,7 @@ def test_sphereface_embedding_shape(sphereface_model, mock_aligned_face):
|
||||
"""
|
||||
embedding = sphereface_model.get_embedding(mock_aligned_face)
|
||||
|
||||
# SphereFace get_embedding returns raw ONNX output with batch dimension
|
||||
# SphereFace typically produces 512-dimensional embeddings
|
||||
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
|
||||
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
|
||||
|
||||
@@ -171,7 +169,6 @@ def test_sphereface_normalized_embedding(sphereface_model, mock_landmarks):
|
||||
|
||||
embedding = sphereface_model.get_normalized_embedding(mock_image, mock_landmarks)
|
||||
|
||||
assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
|
||||
norm = np.linalg.norm(embedding)
|
||||
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
|
||||
|
||||
|
||||
@@ -12,11 +12,9 @@ CLI utilities for testing and running UniFace features.
|
||||
| `anonymize.py` | Face anonymization/blurring for privacy |
|
||||
| `emotion.py` | Emotion detection (7 or 8 emotions) |
|
||||
| `gaze.py` | Gaze direction estimation |
|
||||
| `headpose.py` | Head pose estimation (pitch, yaw, roll) |
|
||||
| `landmarks.py` | 106-point facial landmark detection |
|
||||
| `recognize.py` | Face embedding extraction and comparison |
|
||||
| `search.py` | Real-time face matching against reference |
|
||||
| `faiss_search.py` | FAISS index build and multi-identity face search |
|
||||
| `fairface.py` | FairFace attribute prediction (race, gender, age) |
|
||||
| `attribute.py` | Age and gender prediction |
|
||||
| `spoofing.py` | Face anti-spoofing detection |
|
||||
@@ -63,11 +61,6 @@ python tools/emotion.py --source 0
|
||||
python tools/gaze.py --source assets/test.jpg
|
||||
python tools/gaze.py --source 0
|
||||
|
||||
# Head pose estimation
|
||||
python tools/headpose.py --source assets/test.jpg
|
||||
python tools/headpose.py --source 0
|
||||
python tools/headpose.py --source 0 --draw-type axis
|
||||
|
||||
# Landmarks
|
||||
python tools/landmarks.py --source assets/test.jpg
|
||||
python tools/landmarks.py --source 0
|
||||
@@ -115,7 +108,7 @@ python tools/download_model.py # downloads all
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--source` | Input source: image/video path or camera ID (0, 1, ...) |
|
||||
| `--detector` | Choose detector: `retinaface`, `scrfd`, `yolov5face`, `yolov8face` |
|
||||
| `--detector` | Choose detector: `retinaface`, `scrfd`, `yolov5face` |
|
||||
| `--threshold` | Visualization confidence threshold (default: varies) |
|
||||
| `--save-dir` | Output directory (default: `outputs`) |
|
||||
|
||||
|
||||
@@ -167,9 +167,9 @@ def run_camera(analyzer, camera_id: int = 0):
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = analyzer.analyze(frame)
|
||||
|
||||
@@ -201,7 +201,7 @@ def main():
|
||||
detector = RetinaFace()
|
||||
recognizer = ArcFace()
|
||||
age_gender = AgeGender()
|
||||
analyzer = FaceAnalyzer(detector, recognizer=recognizer, attributes=[age_gender])
|
||||
analyzer = FaceAnalyzer(detector, recognizer, age_gender)
|
||||
|
||||
source_type = get_source_type(args.source)
|
||||
|
||||
|
||||
@@ -121,9 +121,9 @@ def run_camera(detector, blurrer: BlurFace, camera_id: int = 0):
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = detector.detect(frame)
|
||||
if faces:
|
||||
|
||||
@@ -60,7 +60,7 @@ def process_image(
|
||||
)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
result = age_gender.predict(image, face)
|
||||
result = age_gender.predict(image, face.bbox)
|
||||
print(f' Face {i + 1}: {result.sex}, {result.age} years old')
|
||||
draw_age_gender_label(image, face.bbox, result.sex, result.age)
|
||||
|
||||
@@ -112,7 +112,7 @@ def process_video(
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
result = age_gender.predict(frame, face)
|
||||
result = age_gender.predict(frame, face.bbox)
|
||||
draw_age_gender_label(frame, face.bbox, result.sex, result.age)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
@@ -137,9 +137,9 @@ def run_camera(detector, age_gender, camera_id: int = 0, threshold: float = 0.6)
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = detector.detect(frame)
|
||||
|
||||
@@ -151,7 +151,7 @@ def run_camera(detector, age_gender, camera_id: int = 0, threshold: float = 0.6)
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
result = age_gender.predict(frame, face)
|
||||
result = age_gender.predict(frame, face.bbox)
|
||||
draw_age_gender_label(frame, face.bbox, result.sex, result.age)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
|
||||
@@ -135,9 +135,9 @@ def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):
|
||||
prev_time = time.perf_counter()
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = detector.detect(frame)
|
||||
|
||||
|
||||
@@ -4,7 +4,6 @@ from uniface.constants import (
|
||||
AgeGenderWeights,
|
||||
ArcFaceWeights,
|
||||
DDAMFNWeights,
|
||||
HeadPoseWeights,
|
||||
LandmarkWeights,
|
||||
MobileFaceWeights,
|
||||
RetinaFaceWeights,
|
||||
@@ -22,7 +21,6 @@ MODEL_TYPES = {
|
||||
'ddamfn': DDAMFNWeights,
|
||||
'agegender': AgeGenderWeights,
|
||||
'landmark': LandmarkWeights,
|
||||
'headpose': HeadPoseWeights,
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -60,7 +60,7 @@ def process_image(
|
||||
)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
result = emotion_predictor.predict(image, face)
|
||||
result = emotion_predictor.predict(image, face.landmarks)
|
||||
print(f' Face {i + 1}: {result.emotion} (confidence: {result.confidence:.3f})')
|
||||
draw_emotion_label(image, face.bbox, result.emotion, result.confidence)
|
||||
|
||||
@@ -112,7 +112,7 @@ def process_video(
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
result = emotion_predictor.predict(frame, face)
|
||||
result = emotion_predictor.predict(frame, face.landmarks)
|
||||
draw_emotion_label(frame, face.bbox, result.emotion, result.confidence)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
@@ -137,9 +137,9 @@ def run_camera(detector, emotion_predictor, camera_id: int = 0, threshold: float
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = detector.detect(frame)
|
||||
|
||||
@@ -151,7 +151,7 @@ def run_camera(detector, emotion_predictor, camera_id: int = 0, threshold: float
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
result = emotion_predictor.predict(frame, face)
|
||||
result = emotion_predictor.predict(frame, face.landmarks)
|
||||
draw_emotion_label(frame, face.bbox, result.emotion, result.confidence)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
|
||||
@@ -60,7 +60,7 @@ def process_image(
|
||||
)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
result = fairface.predict(image, face)
|
||||
result = fairface.predict(image, face.bbox)
|
||||
print(f' Face {i + 1}: {result.sex}, {result.age_group}, {result.race}')
|
||||
draw_fairface_label(image, face.bbox, result.sex, result.age_group, result.race)
|
||||
|
||||
@@ -112,7 +112,7 @@ def process_video(
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
result = fairface.predict(frame, face)
|
||||
result = fairface.predict(frame, face.bbox)
|
||||
draw_fairface_label(frame, face.bbox, result.sex, result.age_group, result.race)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
@@ -137,9 +137,9 @@ def run_camera(detector, fairface, camera_id: int = 0, threshold: float = 0.6):
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = detector.detect(frame)
|
||||
|
||||
@@ -151,7 +151,7 @@ def run_camera(detector, fairface, camera_id: int = 0, threshold: float = 0.6):
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
result = fairface.predict(frame, face)
|
||||
result = fairface.predict(frame, face.bbox)
|
||||
draw_fairface_label(frame, face.bbox, result.sex, result.age_group, result.race)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
|
||||
@@ -97,9 +97,9 @@ def run_camera(detector, recognizer, store: FAISS, camera_id: int = 0, threshold
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
frame = process_frame(frame, detector, recognizer, store, threshold)
|
||||
|
||||
|
||||
@@ -1,181 +0,0 @@
|
||||
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
"""Head pose estimation on detected faces.
|
||||
|
||||
Usage:
|
||||
python tools/headpose.py --source path/to/image.jpg
|
||||
python tools/headpose.py --source path/to/video.mp4
|
||||
python tools/headpose.py --source 0 # webcam
|
||||
python tools/headpose.py --source path/to/image.jpg --draw-type axis
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
from _common import get_source_type
|
||||
import cv2
|
||||
|
||||
from uniface.detection import RetinaFace
|
||||
from uniface.draw import draw_head_pose
|
||||
from uniface.headpose import HeadPose
|
||||
|
||||
|
||||
def process_image(detector, head_pose_estimator, image_path: str, save_dir: str = 'outputs', draw_type: str = 'cube'):
|
||||
"""Process a single image."""
|
||||
image = cv2.imread(image_path)
|
||||
if image is None:
|
||||
print(f"Error: Failed to load image from '{image_path}'")
|
||||
return
|
||||
|
||||
faces = detector.detect(image)
|
||||
print(f'Detected {len(faces)} face(s)')
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
bbox = face.bbox
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
result = head_pose_estimator.estimate(face_crop)
|
||||
print(f' Face {i + 1}: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
|
||||
|
||||
draw_head_pose(image, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
|
||||
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_headpose.jpg')
|
||||
cv2.imwrite(output_path, image)
|
||||
print(f'Output saved: {output_path}')
|
||||
|
||||
|
||||
def process_video(detector, head_pose_estimator, video_path: str, save_dir: str = 'outputs', draw_type: str = 'cube'):
|
||||
"""Process a video file."""
|
||||
cap = cv2.VideoCapture(video_path)
|
||||
if not cap.isOpened():
|
||||
print(f"Error: Cannot open video file '{video_path}'")
|
||||
return
|
||||
|
||||
fps = cap.get(cv2.CAP_PROP_FPS)
|
||||
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
output_path = os.path.join(save_dir, f'{Path(video_path).stem}_headpose.mp4')
|
||||
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
|
||||
|
||||
print(f'Processing video: {video_path} ({total_frames} frames)')
|
||||
frame_count = 0
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
frame_count += 1
|
||||
faces = detector.detect(frame)
|
||||
|
||||
for face in faces:
|
||||
bbox = face.bbox
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = frame[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
result = head_pose_estimator.estimate(face_crop)
|
||||
draw_head_pose(frame, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
out.write(frame)
|
||||
|
||||
if frame_count % 100 == 0:
|
||||
print(f' Processed {frame_count}/{total_frames} frames...')
|
||||
|
||||
cap.release()
|
||||
out.release()
|
||||
print(f'Done! Output saved: {output_path}')
|
||||
|
||||
|
||||
def run_camera(detector, head_pose_estimator, camera_id: int = 0, draw_type: str = 'cube'):
|
||||
"""Run real-time detection on webcam."""
|
||||
cap = cv2.VideoCapture(camera_id)
|
||||
if not cap.isOpened():
|
||||
print(f'Cannot open camera {camera_id}')
|
||||
return
|
||||
|
||||
print("Press 'q' to quit")
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
frame = cv2.flip(frame, 1)
|
||||
faces = detector.detect(frame)
|
||||
|
||||
for face in faces:
|
||||
bbox = face.bbox
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = frame[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
result = head_pose_estimator.estimate(face_crop)
|
||||
draw_head_pose(frame, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
cv2.imshow('Head Pose Estimation', frame)
|
||||
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Run head pose estimation')
|
||||
parser.add_argument('--source', type=str, required=True, help='Image/video path or camera ID (0, 1, ...)')
|
||||
parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
|
||||
parser.add_argument(
|
||||
'--draw-type',
|
||||
type=str,
|
||||
default='cube',
|
||||
choices=['cube', 'axis'],
|
||||
help='Visualization type: cube (default) or axis',
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
detector = RetinaFace()
|
||||
head_pose_estimator = HeadPose()
|
||||
|
||||
source_type = get_source_type(args.source)
|
||||
|
||||
if source_type == 'camera':
|
||||
run_camera(detector, head_pose_estimator, int(args.source), args.draw_type)
|
||||
elif source_type == 'image':
|
||||
if not os.path.exists(args.source):
|
||||
print(f'Error: Image not found: {args.source}')
|
||||
return
|
||||
process_image(detector, head_pose_estimator, args.source, args.save_dir, args.draw_type)
|
||||
elif source_type == 'video':
|
||||
if not os.path.exists(args.source):
|
||||
print(f'Error: Video not found: {args.source}')
|
||||
return
|
||||
process_video(detector, head_pose_estimator, args.source, args.save_dir, args.draw_type)
|
||||
else:
|
||||
print(f"Error: Unknown source type for '{args.source}'")
|
||||
print('Supported formats: images (.jpg, .png, ...), videos (.mp4, .avi, ...), or camera ID (0, 1, ...)')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -114,9 +114,9 @@ def run_camera(detector, landmarker, camera_id: int = 0):
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
faces = detector.detect(frame)
|
||||
|
||||
|
||||
@@ -41,13 +41,12 @@ def run_inference(detector, recognizer, image_path: str):
|
||||
|
||||
print(f'Detected {len(faces)} face(s). Extracting embedding for the first face...')
|
||||
|
||||
landmarks = faces[0].landmarks
|
||||
landmarks = faces[0].landmarks # 5-point landmarks for alignment (already np.ndarray)
|
||||
embedding = recognizer.get_embedding(image, landmarks)
|
||||
raw_norm = np.linalg.norm(embedding)
|
||||
norm_embedding = embedding.ravel() / raw_norm if raw_norm > 0 else embedding.ravel()
|
||||
norm_embedding = recognizer.get_normalized_embedding(image, landmarks) # L2 normalized
|
||||
|
||||
print(f' Embedding shape: {embedding.shape}')
|
||||
print(f' L2 norm (raw): {raw_norm:.4f}')
|
||||
print(f' L2 norm (raw): {np.linalg.norm(embedding):.4f}')
|
||||
print(f' L2 norm (normalized): {np.linalg.norm(norm_embedding):.4f}')
|
||||
|
||||
|
||||
|
||||
@@ -109,9 +109,9 @@ def run_camera(detector, recognizer, ref_embedding: np.ndarray, camera_id: int =
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
frame = process_frame(frame, detector, recognizer, ref_embedding, threshold)
|
||||
|
||||
|
||||
@@ -134,9 +134,9 @@ def run_camera(
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1)
|
||||
if not ret:
|
||||
break
|
||||
frame = cv2.flip(frame, 1)
|
||||
|
||||
# Detect faces
|
||||
faces = detector.detect(frame)
|
||||
|
||||
180
uniface-cpp/.clang-format
Normal file
180
uniface-cpp/.clang-format
Normal file
@@ -0,0 +1,180 @@
|
||||
---
|
||||
# Modern C++ style based on Google with enhancements
|
||||
Language: Cpp
|
||||
Standard: c++17
|
||||
|
||||
BasedOnStyle: Google
|
||||
ColumnLimit: 100
|
||||
IndentWidth: 4
|
||||
TabWidth: 4
|
||||
UseTab: Never
|
||||
|
||||
# Access modifiers
|
||||
AccessModifierOffset: -4
|
||||
IndentAccessModifiers: false
|
||||
|
||||
# Alignment
|
||||
AlignAfterOpenBracket: BlockIndent
|
||||
AlignArrayOfStructures: Right
|
||||
AlignConsecutiveAssignments:
|
||||
Enabled: false
|
||||
AlignConsecutiveBitFields:
|
||||
Enabled: true
|
||||
AlignConsecutiveDeclarations:
|
||||
Enabled: false
|
||||
AlignConsecutiveMacros:
|
||||
Enabled: true
|
||||
AlignEscapedNewlines: Left
|
||||
AlignOperands: AlignAfterOperator
|
||||
AlignTrailingComments:
|
||||
Kind: Always
|
||||
OverEmptyLines: 1
|
||||
|
||||
# Arguments and parameters
|
||||
AllowAllArgumentsOnNextLine: true
|
||||
AllowAllParametersOfDeclarationOnNextLine: true
|
||||
BinPackArguments: false
|
||||
BinPackParameters: false
|
||||
|
||||
# Short forms
|
||||
AllowShortBlocksOnASingleLine: Empty
|
||||
AllowShortCaseLabelsOnASingleLine: false
|
||||
AllowShortEnumsOnASingleLine: false
|
||||
AllowShortFunctionsOnASingleLine: Inline
|
||||
AllowShortIfStatementsOnASingleLine: Never
|
||||
AllowShortLambdasOnASingleLine: Inline
|
||||
AllowShortLoopsOnASingleLine: false
|
||||
|
||||
# Break behavior
|
||||
AlwaysBreakAfterReturnType: None
|
||||
AlwaysBreakBeforeMultilineStrings: true
|
||||
AlwaysBreakTemplateDeclarations: Yes
|
||||
BreakAfterAttributes: Leave
|
||||
BreakBeforeBinaryOperators: None
|
||||
BreakBeforeBraces: Attach
|
||||
BreakBeforeConceptDeclarations: Always
|
||||
BreakBeforeTernaryOperators: true
|
||||
BreakConstructorInitializers: BeforeComma
|
||||
BreakInheritanceList: BeforeComma
|
||||
BreakStringLiterals: true
|
||||
|
||||
# Braces
|
||||
InsertBraces: false
|
||||
RemoveBracesLLVM: false
|
||||
|
||||
# Constructors
|
||||
PackConstructorInitializers: CurrentLine
|
||||
ConstructorInitializerIndentWidth: 4
|
||||
|
||||
# Empty lines
|
||||
EmptyLineAfterAccessModifier: Never
|
||||
EmptyLineBeforeAccessModifier: LogicalBlock
|
||||
KeepEmptyLinesAtTheStartOfBlocks: false
|
||||
MaxEmptyLinesToKeep: 1
|
||||
SeparateDefinitionBlocks: Always
|
||||
|
||||
# Includes
|
||||
IncludeBlocks: Regroup
|
||||
IncludeCategories:
|
||||
# Main header (same name as source file)
|
||||
- Regex: '^"([a-zA-Z0-9_]+)\.(h|hpp)"$'
|
||||
Priority: 1
|
||||
SortPriority: 1
|
||||
CaseSensitive: true
|
||||
# Project headers
|
||||
- Regex: '^".*"$'
|
||||
Priority: 2
|
||||
SortPriority: 2
|
||||
# C system headers
|
||||
- Regex: '^<(assert|complex|ctype|errno|fenv|float|inttypes|iso646|limits|locale|math|setjmp|signal|stdalign|stdarg|stdatomic|stdbool|stddef|stdint|stdio|stdlib|stdnoreturn|string|tgmath|threads|time|uchar|wchar|wctype)\.h>$'
|
||||
Priority: 3
|
||||
SortPriority: 3
|
||||
# C++ standard library
|
||||
- Regex: '^<[a-z_]+>$'
|
||||
Priority: 4
|
||||
SortPriority: 4
|
||||
# External libraries
|
||||
- Regex: '^<.*>$'
|
||||
Priority: 5
|
||||
SortPriority: 5
|
||||
SortIncludes: CaseSensitive
|
||||
|
||||
# Indentation
|
||||
IndentCaseBlocks: false
|
||||
IndentCaseLabels: true
|
||||
IndentExternBlock: NoIndent
|
||||
IndentGotoLabels: false
|
||||
IndentPPDirectives: AfterHash
|
||||
IndentRequiresClause: true
|
||||
IndentWrappedFunctionNames: false
|
||||
|
||||
# Lambdas
|
||||
LambdaBodyIndentation: Signature
|
||||
|
||||
# Namespaces
|
||||
CompactNamespaces: false
|
||||
FixNamespaceComments: true
|
||||
NamespaceIndentation: None
|
||||
ShortNamespaceLines: 0
|
||||
|
||||
# Penalties (guide formatting decisions)
|
||||
PenaltyBreakAssignment: 25
|
||||
PenaltyBreakBeforeFirstCallParameter: 19
|
||||
PenaltyBreakComment: 300
|
||||
PenaltyBreakFirstLessLess: 120
|
||||
PenaltyBreakOpenParenthesis: 0
|
||||
PenaltyBreakString: 1000
|
||||
PenaltyBreakTemplateDeclaration: 10
|
||||
PenaltyExcessCharacter: 1000000
|
||||
PenaltyIndentedWhitespace: 0
|
||||
PenaltyReturnTypeOnItsOwnLine: 200
|
||||
|
||||
# Pointers and references
|
||||
DerivePointerAlignment: false
|
||||
PointerAlignment: Left
|
||||
ReferenceAlignment: Pointer
|
||||
QualifierAlignment: Leave
|
||||
|
||||
# Requires clause (C++20 concepts)
|
||||
RequiresClausePosition: OwnLine
|
||||
RequiresExpressionIndentation: OuterScope
|
||||
|
||||
# Spacing
|
||||
BitFieldColonSpacing: Both
|
||||
SpaceAfterCStyleCast: false
|
||||
SpaceAfterLogicalNot: false
|
||||
SpaceAfterTemplateKeyword: true
|
||||
SpaceAroundPointerQualifiers: Default
|
||||
SpaceBeforeAssignmentOperators: true
|
||||
SpaceBeforeCaseColon: false
|
||||
SpaceBeforeCpp11BracedList: false
|
||||
SpaceBeforeCtorInitializerColon: true
|
||||
SpaceBeforeInheritanceColon: true
|
||||
SpaceBeforeParens: ControlStatements
|
||||
SpaceBeforeRangeBasedForLoopColon: true
|
||||
SpaceBeforeSquareBrackets: false
|
||||
SpaceInEmptyBlock: false
|
||||
SpaceInEmptyParentheses: false
|
||||
SpacesBeforeTrailingComments: 2
|
||||
SpacesInAngles: Never
|
||||
SpacesInCStyleCastParentheses: false
|
||||
SpacesInConditionalStatement: false
|
||||
SpacesInContainerLiterals: false
|
||||
SpacesInLineCommentPrefix:
|
||||
Minimum: 1
|
||||
Maximum: -1
|
||||
SpacesInParentheses: false
|
||||
SpacesInSquareBrackets: false
|
||||
|
||||
# Other
|
||||
Cpp11BracedListStyle: true
|
||||
InsertNewlineAtEOF: true
|
||||
InsertTrailingCommas: None
|
||||
IntegerLiteralSeparator:
|
||||
Binary: 4
|
||||
Decimal: 3
|
||||
Hex: 4
|
||||
ReflowComments: true
|
||||
RemoveSemicolon: false
|
||||
SortUsingDeclarations: LexicographicNumeric
|
||||
...
|
||||
51
uniface-cpp/CMakeLists.txt
Normal file
51
uniface-cpp/CMakeLists.txt
Normal file
@@ -0,0 +1,51 @@
|
||||
cmake_minimum_required(VERSION 3.14)
|
||||
|
||||
project(uniface
|
||||
VERSION 1.0.0
|
||||
DESCRIPTION "Uniface C++ face analysis library"
|
||||
LANGUAGES CXX
|
||||
)
|
||||
|
||||
# Options
|
||||
option(UNIFACE_BUILD_EXAMPLES "Build example programs" ON)
|
||||
|
||||
# C++ standard
|
||||
set(CMAKE_CXX_STANDARD 17)
|
||||
set(CMAKE_CXX_STANDARD_REQUIRED ON)
|
||||
set(CMAKE_CXX_EXTENSIONS OFF)
|
||||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
|
||||
|
||||
# Compiler warnings
|
||||
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU|Clang")
|
||||
add_compile_options(-Wall -Wextra -Wpedantic)
|
||||
elseif(MSVC)
|
||||
add_compile_options(/W4)
|
||||
endif()
|
||||
|
||||
# Find dependencies
|
||||
find_package(OpenCV REQUIRED COMPONENTS core imgproc dnn calib3d)
|
||||
|
||||
# Library
|
||||
add_library(uniface
|
||||
src/utils.cpp
|
||||
src/detector.cpp
|
||||
src/recognizer.cpp
|
||||
src/landmarker.cpp
|
||||
src/analyzer.cpp
|
||||
)
|
||||
|
||||
target_include_directories(uniface
|
||||
PUBLIC
|
||||
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
|
||||
$<INSTALL_INTERFACE:include>
|
||||
)
|
||||
|
||||
target_link_libraries(uniface
|
||||
PUBLIC
|
||||
${OpenCV_LIBS}
|
||||
)
|
||||
|
||||
# Examples
|
||||
if(UNIFACE_BUILD_EXAMPLES)
|
||||
add_subdirectory(examples)
|
||||
endif()
|
||||
69
uniface-cpp/README.md
Normal file
69
uniface-cpp/README.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Uniface C++
|
||||
|
||||
C++ implementation of the Uniface face analysis library.
|
||||
|
||||
## Features
|
||||
|
||||
- **Face Detection** - RetinaFace detector with 5-point landmarks
|
||||
|
||||
## Requirements
|
||||
|
||||
- C++17 compiler
|
||||
- CMake 3.14+
|
||||
- OpenCV 4.x
|
||||
|
||||
## Build
|
||||
|
||||
```bash
|
||||
mkdir build && cd build
|
||||
cmake ..
|
||||
make -j$(nproc)
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Image Detection
|
||||
|
||||
```bash
|
||||
./examples/detect <model_path> <image_path>
|
||||
```
|
||||
|
||||
### Webcam Demo
|
||||
|
||||
```bash
|
||||
./examples/webcam <model_path> [camera_id]
|
||||
```
|
||||
|
||||
### Code Example
|
||||
|
||||
```cpp
|
||||
#include <uniface/uniface.hpp>
|
||||
#include <opencv2/highgui.hpp>
|
||||
|
||||
int main() {
|
||||
uniface::RetinaFace detector("retinaface.onnx");
|
||||
|
||||
cv::Mat image = cv::imread("photo.jpg");
|
||||
auto faces = detector.detect(image);
|
||||
|
||||
for (const auto& face : faces) {
|
||||
cv::rectangle(image, face.bbox, cv::Scalar(0, 255, 0), 2);
|
||||
}
|
||||
|
||||
cv::imwrite("result.jpg", image);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
## Models
|
||||
|
||||
Download models from the main uniface repository or use:
|
||||
|
||||
```bash
|
||||
# RetinaFace MobileNet V2
|
||||
wget https://github.com/your-repo/uniface/releases/download/v1.0/retinaface_mv2.onnx -P models/
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Same license as the main uniface project.
|
||||
23
uniface-cpp/examples/CMakeLists.txt
Normal file
23
uniface-cpp/examples/CMakeLists.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
# Examples
|
||||
|
||||
find_package(OpenCV REQUIRED COMPONENTS highgui imgcodecs videoio)
|
||||
|
||||
# Image detection example
|
||||
add_executable(detect detect.cpp)
|
||||
target_link_libraries(detect PRIVATE uniface ${OpenCV_LIBS})
|
||||
|
||||
# Face recognition example
|
||||
add_executable(recognize recognize.cpp)
|
||||
target_link_libraries(recognize PRIVATE uniface ${OpenCV_LIBS})
|
||||
|
||||
# Facial landmarks example
|
||||
add_executable(landmarks landmarks.cpp)
|
||||
target_link_libraries(landmarks PRIVATE uniface ${OpenCV_LIBS})
|
||||
|
||||
# Face analyzer example
|
||||
add_executable(analyzer analyzer.cpp)
|
||||
target_link_libraries(analyzer PRIVATE uniface ${OpenCV_LIBS})
|
||||
|
||||
# Webcam example
|
||||
add_executable(webcam webcam.cpp)
|
||||
target_link_libraries(webcam PRIVATE uniface ${OpenCV_LIBS})
|
||||
113
uniface-cpp/examples/analyzer.cpp
Normal file
113
uniface-cpp/examples/analyzer.cpp
Normal file
@@ -0,0 +1,113 @@
|
||||
#include <iomanip>
|
||||
#include <iostream>
|
||||
|
||||
#include <opencv2/highgui.hpp>
|
||||
#include <opencv2/imgproc.hpp>
|
||||
#include <uniface/uniface.hpp>
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
if (argc < 3) {
|
||||
std::cout << "Usage: " << argv[0]
|
||||
<< " <detector_model> <image_path> [recognizer_model] [landmark_model]"
|
||||
<< std::endl;
|
||||
std::cout << "\nAnalyzes faces in an image using available models." << std::endl;
|
||||
std::cout << " - detector_model: Required. Path to face detector ONNX model." << std::endl;
|
||||
std::cout << " - recognizer_model: Optional. Path to face recognizer ONNX model."
|
||||
<< std::endl;
|
||||
std::cout << " - landmark_model: Optional. Path to 106-point landmark ONNX model."
|
||||
<< std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const std::string detector_path = argv[1];
|
||||
const std::string image_path = argv[2];
|
||||
const std::string recognizer_path = (argc > 3) ? argv[3] : "";
|
||||
const std::string landmark_path = (argc > 4) ? argv[4] : "";
|
||||
|
||||
try {
|
||||
// Create analyzer and load components
|
||||
uniface::FaceAnalyzer analyzer;
|
||||
|
||||
std::cout << "Loading detector: " << detector_path << std::endl;
|
||||
analyzer.loadDetector(detector_path);
|
||||
|
||||
if (!recognizer_path.empty()) {
|
||||
std::cout << "Loading recognizer: " << recognizer_path << std::endl;
|
||||
analyzer.loadRecognizer(recognizer_path);
|
||||
}
|
||||
|
||||
if (!landmark_path.empty()) {
|
||||
std::cout << "Loading landmarker: " << landmark_path << std::endl;
|
||||
analyzer.loadLandmarker(landmark_path);
|
||||
}
|
||||
|
||||
// Load image
|
||||
cv::Mat image = cv::imread(image_path);
|
||||
if (image.empty()) {
|
||||
std::cerr << "Failed to load image: " << image_path << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
std::cout << "\nAnalyzing image..." << std::endl;
|
||||
|
||||
// Analyze faces
|
||||
auto results = analyzer.analyze(image);
|
||||
|
||||
std::cout << "Found " << results.size() << " face(s)\n" << std::endl;
|
||||
|
||||
// Process each face
|
||||
for (size_t i = 0; i < results.size(); ++i) {
|
||||
const auto& result = results[i];
|
||||
|
||||
std::cout << "Face " << (i + 1) << ":" << std::endl;
|
||||
std::cout << " BBox: [" << result.face.bbox.x << ", " << result.face.bbox.y << ", "
|
||||
<< result.face.bbox.width << ", " << result.face.bbox.height << "]"
|
||||
<< std::endl;
|
||||
std::cout << std::fixed << std::setprecision(3);
|
||||
std::cout << " Confidence: " << result.face.confidence << std::endl;
|
||||
|
||||
// Draw bounding box
|
||||
cv::rectangle(image, result.face.bbox, cv::Scalar(0, 255, 0), 2);
|
||||
|
||||
// Draw 5-point landmarks from detector
|
||||
for (const auto& pt : result.face.landmarks) {
|
||||
cv::circle(image, pt, 3, cv::Scalar(0, 0, 255), -1);
|
||||
}
|
||||
|
||||
// If 106-point landmarks available
|
||||
if (result.landmarks) {
|
||||
std::cout << " Landmarks: 106 points detected" << std::endl;
|
||||
for (const auto& pt : result.landmarks->points) {
|
||||
cv::circle(image, pt, 1, cv::Scalar(0, 255, 255), -1);
|
||||
}
|
||||
}
|
||||
|
||||
// If embedding available
|
||||
if (result.embedding) {
|
||||
// Show first few values of embedding
|
||||
std::cout << " Embedding: [";
|
||||
for (size_t j = 0; j < 5; ++j) {
|
||||
std::cout << (*result.embedding)[j];
|
||||
if (j < 4)
|
||||
std::cout << ", ";
|
||||
}
|
||||
std::cout << ", ... ] (512-dim)" << std::endl;
|
||||
}
|
||||
|
||||
std::cout << std::endl;
|
||||
}
|
||||
|
||||
// Save result
|
||||
cv::imwrite("analyzer_result.jpg", image);
|
||||
std::cout << "Saved result to analyzer_result.jpg" << std::endl;
|
||||
|
||||
} catch (const cv::Exception& e) {
|
||||
std::cerr << "OpenCV Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
} catch (const std::exception& e) {
|
||||
std::cerr << "Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
48
uniface-cpp/examples/detect.cpp
Normal file
48
uniface-cpp/examples/detect.cpp
Normal file
@@ -0,0 +1,48 @@
|
||||
#include <iostream>
|
||||
|
||||
#include <opencv2/highgui.hpp>
|
||||
#include <opencv2/imgproc.hpp>
|
||||
#include <uniface/uniface.hpp>
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
if (argc < 3) {
|
||||
std::cout << "Usage: " << argv[0] << " <model_path> <image_path>" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const std::string model_path = argv[1];
|
||||
const std::string image_path = argv[2];
|
||||
|
||||
try {
|
||||
uniface::RetinaFace detector(model_path);
|
||||
|
||||
cv::Mat image = cv::imread(image_path);
|
||||
if (image.empty()) {
|
||||
std::cerr << "Failed to load image: " << image_path << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const auto faces = detector.detect(image);
|
||||
std::cout << "Detected " << faces.size() << " faces." << std::endl;
|
||||
|
||||
// Draw results
|
||||
for (const auto& face : faces) {
|
||||
cv::rectangle(image, face.bbox, cv::Scalar(0, 255, 0), 2);
|
||||
for (const auto& pt : face.landmarks) {
|
||||
cv::circle(image, pt, 2, cv::Scalar(0, 0, 255), -1);
|
||||
}
|
||||
}
|
||||
|
||||
cv::imwrite("result.jpg", image);
|
||||
std::cout << "Saved result to result.jpg" << std::endl;
|
||||
|
||||
} catch (const cv::Exception& e) {
|
||||
std::cerr << "OpenCV Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
} catch (const std::exception& e) {
|
||||
std::cerr << "Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
66
uniface-cpp/examples/landmarks.cpp
Normal file
66
uniface-cpp/examples/landmarks.cpp
Normal file
@@ -0,0 +1,66 @@
|
||||
#include <iostream>
|
||||
|
||||
#include <opencv2/highgui.hpp>
|
||||
#include <opencv2/imgproc.hpp>
|
||||
#include <uniface/uniface.hpp>
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
if (argc < 4) {
|
||||
std::cout << "Usage: " << argv[0] << " <detector_model> <landmark_model> <image_path>"
|
||||
<< std::endl;
|
||||
std::cout << "\nDetects 106-point facial landmarks and saves visualization." << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const std::string detector_path = argv[1];
|
||||
const std::string landmark_path = argv[2];
|
||||
const std::string image_path = argv[3];
|
||||
|
||||
try {
|
||||
// Load models
|
||||
uniface::RetinaFace detector(detector_path);
|
||||
uniface::Landmark106 landmarker(landmark_path);
|
||||
|
||||
// Load image
|
||||
cv::Mat image = cv::imread(image_path);
|
||||
if (image.empty()) {
|
||||
std::cerr << "Failed to load image: " << image_path << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Detect faces
|
||||
auto faces = detector.detect(image);
|
||||
std::cout << "Detected " << faces.size() << " face(s)" << std::endl;
|
||||
|
||||
// Process each face
|
||||
for (size_t i = 0; i < faces.size(); ++i) {
|
||||
const auto& face = faces[i];
|
||||
|
||||
// Draw bounding box
|
||||
cv::rectangle(image, face.bbox, cv::Scalar(0, 255, 0), 2);
|
||||
|
||||
// Get 106-point landmarks
|
||||
auto landmarks = landmarker.getLandmarks(image, face.bbox);
|
||||
|
||||
// Draw all 106 points
|
||||
for (const auto& pt : landmarks.points) {
|
||||
cv::circle(image, pt, 1, cv::Scalar(0, 255, 255), -1);
|
||||
}
|
||||
|
||||
std::cout << "Face " << (i + 1) << ": 106 landmarks detected" << std::endl;
|
||||
}
|
||||
|
||||
// Save result
|
||||
cv::imwrite("landmarks_result.jpg", image);
|
||||
std::cout << "Saved result to landmarks_result.jpg" << std::endl;
|
||||
|
||||
} catch (const cv::Exception& e) {
|
||||
std::cerr << "OpenCV Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
} catch (const std::exception& e) {
|
||||
std::cerr << "Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
81
uniface-cpp/examples/recognize.cpp
Normal file
81
uniface-cpp/examples/recognize.cpp
Normal file
@@ -0,0 +1,81 @@
|
||||
#include <iomanip>
|
||||
#include <iostream>
|
||||
|
||||
#include <opencv2/highgui.hpp>
|
||||
#include <opencv2/imgproc.hpp>
|
||||
#include <uniface/uniface.hpp>
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
if (argc < 5) {
|
||||
std::cout << "Usage: " << argv[0]
|
||||
<< " <detector_model> <recognizer_model> <image1> <image2>" << std::endl;
|
||||
std::cout << "\nCompares faces from two images and outputs similarity score." << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const std::string detector_path = argv[1];
|
||||
const std::string recognizer_path = argv[2];
|
||||
const std::string image1_path = argv[3];
|
||||
const std::string image2_path = argv[4];
|
||||
|
||||
try {
|
||||
// Load models
|
||||
uniface::RetinaFace detector(detector_path);
|
||||
uniface::ArcFace recognizer(recognizer_path);
|
||||
|
||||
// Load images
|
||||
cv::Mat image1 = cv::imread(image1_path);
|
||||
cv::Mat image2 = cv::imread(image2_path);
|
||||
|
||||
if (image1.empty()) {
|
||||
std::cerr << "Failed to load image: " << image1_path << std::endl;
|
||||
return 1;
|
||||
}
|
||||
if (image2.empty()) {
|
||||
std::cerr << "Failed to load image: " << image2_path << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Detect faces
|
||||
auto faces1 = detector.detect(image1);
|
||||
auto faces2 = detector.detect(image2);
|
||||
|
||||
if (faces1.empty()) {
|
||||
std::cerr << "No face detected in image1" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
if (faces2.empty()) {
|
||||
std::cerr << "No face detected in image2" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
std::cout << "Detected " << faces1.size() << " face(s) in image1" << std::endl;
|
||||
std::cout << "Detected " << faces2.size() << " face(s) in image2" << std::endl;
|
||||
|
||||
// Get embeddings for first face in each image
|
||||
auto embedding1 = recognizer.getNormalizedEmbedding(image1, faces1[0].landmarks);
|
||||
auto embedding2 = recognizer.getNormalizedEmbedding(image2, faces2[0].landmarks);
|
||||
|
||||
// Compute similarity
|
||||
float similarity = uniface::cosineSimilarity(embedding1, embedding2);
|
||||
|
||||
std::cout << std::fixed << std::setprecision(4);
|
||||
std::cout << "\nCosine Similarity: " << similarity << std::endl;
|
||||
|
||||
// Interpretation
|
||||
if (similarity > 0.4f) {
|
||||
std::cout << "Result: Same person (similarity > 0.4)" << std::endl;
|
||||
} else {
|
||||
std::cout << "Result: Different persons (similarity <= 0.4)" << std::endl;
|
||||
}
|
||||
|
||||
} catch (const cv::Exception& e) {
|
||||
std::cerr << "OpenCV Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
} catch (const std::exception& e) {
|
||||
std::cerr << "Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
209
uniface-cpp/examples/webcam.cpp
Normal file
209
uniface-cpp/examples/webcam.cpp
Normal file
@@ -0,0 +1,209 @@
|
||||
#include <chrono>
|
||||
#include <iostream>
|
||||
#include <memory>
|
||||
|
||||
#include <opencv2/highgui.hpp>
|
||||
#include <opencv2/imgproc.hpp>
|
||||
#include <opencv2/videoio.hpp>
|
||||
#include <uniface/uniface.hpp>
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
if (argc < 2) {
|
||||
std::cout << "Usage: " << argv[0] << " <detector_model> [landmark_model] [camera_id]"
|
||||
<< std::endl;
|
||||
std::cout << "\nArguments:" << std::endl;
|
||||
std::cout << " detector_model : Path to face detector ONNX model (required)" << std::endl;
|
||||
std::cout << " landmark_model : Path to 106-point landmark ONNX model (optional)"
|
||||
<< std::endl;
|
||||
std::cout << " camera_id : Camera device ID, default 0 (optional)" << std::endl;
|
||||
std::cout << "\nExamples:" << std::endl;
|
||||
std::cout << " " << argv[0] << " detector.onnx" << std::endl;
|
||||
std::cout << " " << argv[0] << " detector.onnx landmark.onnx" << std::endl;
|
||||
std::cout << " " << argv[0] << " detector.onnx landmark.onnx 1" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const std::string detector_path = argv[1];
|
||||
std::string landmark_path;
|
||||
int camera_id = 0;
|
||||
|
||||
// Parse arguments - landmark_model is optional
|
||||
if (argc >= 3) {
|
||||
// Check if argv[2] is a number (camera_id) or a path (landmark_model)
|
||||
if (std::isdigit(argv[2][0]) && strlen(argv[2]) <= 2) {
|
||||
camera_id = std::atoi(argv[2]);
|
||||
} else {
|
||||
landmark_path = argv[2];
|
||||
if (argc >= 4) {
|
||||
camera_id = std::atoi(argv[3]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
// Load detector
|
||||
std::cout << "Loading detector: " << detector_path << std::endl;
|
||||
uniface::RetinaFace detector(detector_path);
|
||||
std::cout << "Detector loaded!" << std::endl;
|
||||
|
||||
// Load landmark model if provided
|
||||
std::unique_ptr<uniface::Landmark106> landmarker;
|
||||
if (!landmark_path.empty()) {
|
||||
std::cout << "Loading landmarker: " << landmark_path << std::endl;
|
||||
landmarker = std::make_unique<uniface::Landmark106>(landmark_path);
|
||||
std::cout << "Landmarker loaded!" << std::endl;
|
||||
}
|
||||
|
||||
// Open camera
|
||||
cv::VideoCapture cap(camera_id);
|
||||
if (!cap.isOpened()) {
|
||||
std::cerr << "Error: Cannot open camera " << camera_id << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
const int frame_width = static_cast<int>(cap.get(cv::CAP_PROP_FRAME_WIDTH));
|
||||
const int frame_height = static_cast<int>(cap.get(cv::CAP_PROP_FRAME_HEIGHT));
|
||||
std::cout << "\nCamera opened: " << frame_width << "x" << frame_height << std::endl;
|
||||
std::cout << "Press 'q' to quit, 's' to save screenshot, 'l' to toggle landmarks"
|
||||
<< std::endl;
|
||||
|
||||
cv::Mat frame;
|
||||
int frame_count = 0;
|
||||
double total_time = 0.0;
|
||||
bool show_landmarks = true; // Toggle for 106-point landmarks
|
||||
|
||||
while (true) {
|
||||
cap >> frame;
|
||||
if (frame.empty()) {
|
||||
std::cerr << "Error: Empty frame captured" << std::endl;
|
||||
break;
|
||||
}
|
||||
|
||||
const auto start = std::chrono::high_resolution_clock::now();
|
||||
|
||||
// Detect faces
|
||||
const auto faces = detector.detect(frame);
|
||||
|
||||
// Get 106-point landmarks if available
|
||||
std::vector<uniface::Landmarks> all_landmarks;
|
||||
if (landmarker && show_landmarks) {
|
||||
all_landmarks.reserve(faces.size());
|
||||
for (const auto& face : faces) {
|
||||
all_landmarks.push_back(landmarker->getLandmarks(frame, face.bbox));
|
||||
}
|
||||
}
|
||||
|
||||
const auto end = std::chrono::high_resolution_clock::now();
|
||||
const std::chrono::duration<double, std::milli> elapsed = end - start;
|
||||
const double inference_time = elapsed.count();
|
||||
|
||||
++frame_count;
|
||||
total_time += inference_time;
|
||||
const double avg_time = total_time / static_cast<double>(frame_count);
|
||||
const double fps = 1000.0 / avg_time;
|
||||
|
||||
// Draw results
|
||||
for (size_t i = 0; i < faces.size(); ++i) {
|
||||
const auto& face = faces[i];
|
||||
|
||||
// Draw bounding box
|
||||
cv::rectangle(frame, face.bbox, cv::Scalar(0, 255, 0), 2);
|
||||
|
||||
// Draw 5-point landmarks from detector
|
||||
for (size_t j = 0; j < face.landmarks.size(); ++j) {
|
||||
cv::Scalar color;
|
||||
if (j < 2) {
|
||||
color = cv::Scalar(255, 0, 0); // Eyes - Blue
|
||||
} else if (j == 2) {
|
||||
color = cv::Scalar(0, 255, 0); // Nose - Green
|
||||
} else {
|
||||
color = cv::Scalar(0, 0, 255); // Mouth - Red
|
||||
}
|
||||
cv::circle(frame, face.landmarks[j], 3, color, -1);
|
||||
}
|
||||
|
||||
// Draw 106-point landmarks if available
|
||||
if (i < all_landmarks.size()) {
|
||||
const auto& lm = all_landmarks[i];
|
||||
|
||||
// Draw all 106 points
|
||||
for (const auto& pt : lm.points) {
|
||||
cv::circle(frame, pt, 1, cv::Scalar(0, 255, 255), -1);
|
||||
}
|
||||
}
|
||||
|
||||
// Draw confidence
|
||||
const std::string conf_text = cv::format("%.2f", face.confidence);
|
||||
const cv::Point text_org(
|
||||
static_cast<int>(face.bbox.x), static_cast<int>(face.bbox.y) - 5
|
||||
);
|
||||
cv::putText(
|
||||
frame,
|
||||
conf_text,
|
||||
text_org,
|
||||
cv::FONT_HERSHEY_SIMPLEX,
|
||||
0.5,
|
||||
cv::Scalar(0, 255, 0),
|
||||
1
|
||||
);
|
||||
}
|
||||
|
||||
// Draw info overlay
|
||||
std::string mode = landmarker
|
||||
? (show_landmarks ? "Detection + 106 Landmarks" : "Detection Only")
|
||||
: "Detection Only";
|
||||
const std::string info_text = cv::format(
|
||||
"FPS: %.1f | Faces: %zu | Time: %.1fms", fps, faces.size(), inference_time
|
||||
);
|
||||
cv::putText(
|
||||
frame,
|
||||
info_text,
|
||||
cv::Point(10, 30),
|
||||
cv::FONT_HERSHEY_SIMPLEX,
|
||||
0.7,
|
||||
cv::Scalar(0, 255, 0),
|
||||
2
|
||||
);
|
||||
cv::putText(
|
||||
frame,
|
||||
mode,
|
||||
cv::Point(10, 60),
|
||||
cv::FONT_HERSHEY_SIMPLEX,
|
||||
0.6,
|
||||
cv::Scalar(255, 255, 0),
|
||||
2
|
||||
);
|
||||
|
||||
cv::imshow("Uniface - Face Detection & Landmarks", frame);
|
||||
|
||||
const char key = static_cast<char>(cv::waitKey(1));
|
||||
if (key == 'q' || key == 27) {
|
||||
break;
|
||||
} else if (key == 's') {
|
||||
const std::string filename = cv::format("screenshot_%d.jpg", frame_count);
|
||||
cv::imwrite(filename, frame);
|
||||
std::cout << "Screenshot saved: " << filename << std::endl;
|
||||
} else if (key == 'l' && landmarker) {
|
||||
show_landmarks = !show_landmarks;
|
||||
std::cout << "106-point landmarks: " << (show_landmarks ? "ON" : "OFF")
|
||||
<< std::endl;
|
||||
}
|
||||
}
|
||||
|
||||
cap.release();
|
||||
cv::destroyAllWindows();
|
||||
|
||||
std::cout << "\n=== Statistics ===" << std::endl;
|
||||
std::cout << "Total frames: " << frame_count << std::endl;
|
||||
std::cout << "Average inference time: " << (total_time / frame_count) << " ms" << std::endl;
|
||||
|
||||
} catch (const cv::Exception& e) {
|
||||
std::cerr << "OpenCV Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
} catch (const std::exception& e) {
|
||||
std::cerr << "Error: " << e.what() << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
63
uniface-cpp/include/uniface/analyzer.hpp
Normal file
63
uniface-cpp/include/uniface/analyzer.hpp
Normal file
@@ -0,0 +1,63 @@
|
||||
#ifndef UNIFACE_ANALYZER_HPP_
|
||||
#define UNIFACE_ANALYZER_HPP_
|
||||
|
||||
#include "uniface/detector.hpp"
|
||||
#include "uniface/landmarker.hpp"
|
||||
#include "uniface/recognizer.hpp"
|
||||
#include "uniface/types.hpp"
|
||||
|
||||
#include <memory>
|
||||
#include <optional>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
// Result of face analysis
|
||||
struct AnalyzedFace {
|
||||
Face face; // detection result (bbox, confidence, 5-point landmarks)
|
||||
std::optional<Landmarks> landmarks; // 106-point landmarks (if landmarker loaded)
|
||||
std::optional<Embedding> embedding; // face embedding (if recognizer loaded)
|
||||
};
|
||||
|
||||
// Unified face analysis combining detection, recognition, and landmarks
|
||||
class FaceAnalyzer {
|
||||
public:
|
||||
FaceAnalyzer() = default;
|
||||
~FaceAnalyzer() = default;
|
||||
|
||||
FaceAnalyzer(const FaceAnalyzer&) = delete;
|
||||
FaceAnalyzer& operator=(const FaceAnalyzer&) = delete;
|
||||
FaceAnalyzer(FaceAnalyzer&&) = default;
|
||||
FaceAnalyzer& operator=(FaceAnalyzer&&) = default;
|
||||
|
||||
// Load components (returns *this for chaining)
|
||||
FaceAnalyzer& loadDetector(const std::string& path, const DetectorConfig& config = DetectorConfig{});
|
||||
FaceAnalyzer& loadRecognizer(const std::string& path, const RecognizerConfig& config = RecognizerConfig{});
|
||||
FaceAnalyzer& loadLandmarker(const std::string& path, const LandmarkerConfig& config = LandmarkerConfig{});
|
||||
|
||||
// Analyze faces in BGR image (throws if detector not loaded)
|
||||
[[nodiscard]] std::vector<AnalyzedFace> analyze(const cv::Mat& image);
|
||||
|
||||
// Component checks
|
||||
[[nodiscard]] bool hasDetector() const noexcept { return detector_ != nullptr; }
|
||||
[[nodiscard]] bool hasRecognizer() const noexcept { return recognizer_ != nullptr; }
|
||||
[[nodiscard]] bool hasLandmarker() const noexcept { return landmarker_ != nullptr; }
|
||||
|
||||
// Direct component access
|
||||
[[nodiscard]] RetinaFace* detector() noexcept { return detector_.get(); }
|
||||
[[nodiscard]] ArcFace* recognizer() noexcept { return recognizer_.get(); }
|
||||
[[nodiscard]] Landmark106* landmarker() noexcept { return landmarker_.get(); }
|
||||
[[nodiscard]] const RetinaFace* detector() const noexcept { return detector_.get(); }
|
||||
[[nodiscard]] const ArcFace* recognizer() const noexcept { return recognizer_.get(); }
|
||||
[[nodiscard]] const Landmark106* landmarker() const noexcept { return landmarker_.get(); }
|
||||
|
||||
private:
|
||||
std::unique_ptr<RetinaFace> detector_;
|
||||
std::unique_ptr<ArcFace> recognizer_;
|
||||
std::unique_ptr<Landmark106> landmarker_;
|
||||
};
|
||||
|
||||
} // namespace uniface
|
||||
|
||||
#endif // UNIFACE_ANALYZER_HPP_
|
||||
47
uniface-cpp/include/uniface/detector.hpp
Normal file
47
uniface-cpp/include/uniface/detector.hpp
Normal file
@@ -0,0 +1,47 @@
|
||||
#ifndef UNIFACE_DETECTOR_HPP_
|
||||
#define UNIFACE_DETECTOR_HPP_
|
||||
|
||||
#include "uniface/types.hpp"
|
||||
|
||||
#include <array>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include <opencv2/dnn.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
// RetinaFace detector using OpenCV DNN backend
|
||||
class RetinaFace {
|
||||
public:
|
||||
explicit RetinaFace(
|
||||
const std::string& model_path,
|
||||
float conf_thresh = 0.5f,
|
||||
float nms_thresh = 0.4f,
|
||||
cv::Size input_size = cv::Size(640, 640)
|
||||
);
|
||||
|
||||
// Detect faces in BGR image, returns bboxes + 5-point landmarks
|
||||
[[nodiscard]] std::vector<Face> detect(const cv::Mat& image);
|
||||
|
||||
// Accessors
|
||||
[[nodiscard]] float getConfidenceThreshold() const noexcept { return confidence_threshold_; }
|
||||
[[nodiscard]] float getNmsThreshold() const noexcept { return nms_threshold_; }
|
||||
[[nodiscard]] cv::Size getInputSize() const noexcept { return input_size_; }
|
||||
|
||||
void setConfidenceThreshold(float threshold) noexcept { confidence_threshold_ = threshold; }
|
||||
void setNmsThreshold(float threshold) noexcept { nms_threshold_ = threshold; }
|
||||
|
||||
private:
|
||||
cv::dnn::Net net_;
|
||||
float confidence_threshold_;
|
||||
float nms_threshold_;
|
||||
cv::Size input_size_;
|
||||
std::vector<std::array<float, 4>> anchors_;
|
||||
|
||||
void generateAnchors();
|
||||
};
|
||||
|
||||
} // namespace uniface
|
||||
|
||||
#endif // UNIFACE_DETECTOR_HPP_
|
||||
32
uniface-cpp/include/uniface/landmarker.hpp
Normal file
32
uniface-cpp/include/uniface/landmarker.hpp
Normal file
@@ -0,0 +1,32 @@
|
||||
#ifndef UNIFACE_LANDMARKER_HPP_
|
||||
#define UNIFACE_LANDMARKER_HPP_
|
||||
|
||||
#include "uniface/types.hpp"
|
||||
|
||||
#include <string>
|
||||
|
||||
#include <opencv2/dnn.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
// 106-point facial landmark detector
|
||||
class Landmark106 {
|
||||
public:
|
||||
explicit Landmark106(const std::string& model_path, const LandmarkerConfig& config = LandmarkerConfig{});
|
||||
|
||||
// Detect 106 landmarks for a face, returns points in original image coordinates
|
||||
[[nodiscard]] Landmarks getLandmarks(const cv::Mat& image, const cv::Rect2f& bbox);
|
||||
|
||||
[[nodiscard]] cv::Size getInputSize() const noexcept { return config_.input_size; }
|
||||
|
||||
private:
|
||||
cv::dnn::Net net_;
|
||||
LandmarkerConfig config_;
|
||||
|
||||
[[nodiscard]] cv::Mat preprocess(const cv::Mat& image, const cv::Rect2f& bbox, cv::Mat& transform);
|
||||
[[nodiscard]] Landmarks postprocess(const cv::Mat& predictions, const cv::Mat& transform);
|
||||
};
|
||||
|
||||
} // namespace uniface
|
||||
|
||||
#endif // UNIFACE_LANDMARKER_HPP_
|
||||
37
uniface-cpp/include/uniface/recognizer.hpp
Normal file
37
uniface-cpp/include/uniface/recognizer.hpp
Normal file
@@ -0,0 +1,37 @@
|
||||
#ifndef UNIFACE_RECOGNIZER_HPP_
|
||||
#define UNIFACE_RECOGNIZER_HPP_
|
||||
|
||||
#include "uniface/types.hpp"
|
||||
|
||||
#include <string>
|
||||
|
||||
#include <opencv2/dnn.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
// ArcFace face recognition (MobileNet/ResNet backbones)
|
||||
class ArcFace {
|
||||
public:
|
||||
explicit ArcFace(const std::string& model_path, const RecognizerConfig& config = RecognizerConfig{});
|
||||
|
||||
// Get 512-dim embedding from pre-aligned 112x112 face
|
||||
[[nodiscard]] Embedding getEmbedding(const cv::Mat& aligned_face);
|
||||
|
||||
// Get 512-dim embedding with automatic alignment
|
||||
[[nodiscard]] Embedding getEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks);
|
||||
|
||||
// Get L2-normalized embedding with automatic alignment
|
||||
[[nodiscard]] Embedding getNormalizedEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks);
|
||||
|
||||
[[nodiscard]] cv::Size getInputSize() const noexcept { return config_.input_size; }
|
||||
|
||||
private:
|
||||
cv::dnn::Net net_;
|
||||
RecognizerConfig config_;
|
||||
|
||||
[[nodiscard]] cv::Mat preprocess(const cv::Mat& face_image);
|
||||
};
|
||||
|
||||
} // namespace uniface
|
||||
|
||||
#endif // UNIFACE_RECOGNIZER_HPP_
|
||||
45
uniface-cpp/include/uniface/types.hpp
Normal file
45
uniface-cpp/include/uniface/types.hpp
Normal file
@@ -0,0 +1,45 @@
|
||||
#ifndef UNIFACE_TYPES_HPP_
|
||||
#define UNIFACE_TYPES_HPP_
|
||||
|
||||
#include <array>
|
||||
#include <vector>
|
||||
|
||||
#include <opencv2/core.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
// Detected face with bbox, confidence, and 5-point landmarks
|
||||
struct Face {
|
||||
cv::Rect2f bbox;
|
||||
float confidence;
|
||||
std::array<cv::Point2f, 5> landmarks; // left_eye, right_eye, nose, left_mouth, right_mouth
|
||||
};
|
||||
|
||||
// 512-dimensional face embedding
|
||||
using Embedding = std::array<float, 512>;
|
||||
|
||||
// 106-point facial landmarks
|
||||
struct Landmarks {
|
||||
std::array<cv::Point2f, 106> points;
|
||||
};
|
||||
|
||||
// Configuration structs
|
||||
struct DetectorConfig {
|
||||
float conf_thresh = 0.5f;
|
||||
float nms_thresh = 0.4f;
|
||||
cv::Size input_size = cv::Size(640, 640);
|
||||
};
|
||||
|
||||
struct RecognizerConfig {
|
||||
float input_mean = 127.5f;
|
||||
float input_std = 127.5f;
|
||||
cv::Size input_size = cv::Size(112, 112);
|
||||
};
|
||||
|
||||
struct LandmarkerConfig {
|
||||
cv::Size input_size = cv::Size(192, 192);
|
||||
};
|
||||
|
||||
} // namespace uniface
|
||||
|
||||
#endif // UNIFACE_TYPES_HPP_
|
||||
11
uniface-cpp/include/uniface/uniface.hpp
Normal file
11
uniface-cpp/include/uniface/uniface.hpp
Normal file
@@ -0,0 +1,11 @@
|
||||
#ifndef UNIFACE_HPP_
|
||||
#define UNIFACE_HPP_
|
||||
|
||||
#include "uniface/analyzer.hpp"
|
||||
#include "uniface/detector.hpp"
|
||||
#include "uniface/landmarker.hpp"
|
||||
#include "uniface/recognizer.hpp"
|
||||
#include "uniface/types.hpp"
|
||||
#include "uniface/utils.hpp"
|
||||
|
||||
#endif // UNIFACE_HPP_
|
||||
58
uniface-cpp/include/uniface/utils.hpp
Normal file
58
uniface-cpp/include/uniface/utils.hpp
Normal file
@@ -0,0 +1,58 @@
|
||||
#ifndef UNIFACE_UTILS_HPP_
|
||||
#define UNIFACE_UTILS_HPP_
|
||||
|
||||
#include "uniface/types.hpp"
|
||||
|
||||
#include <array>
|
||||
#include <cmath>
|
||||
|
||||
#include <opencv2/core.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
// Reference 5-point landmarks for ArcFace alignment (112x112)
|
||||
inline constexpr std::array<float, 10> kReferenceAlignment = {
|
||||
38.2946f, 51.6963f, // left eye
|
||||
73.5318f, 51.5014f, // right eye
|
||||
56.0252f, 71.7366f, // nose
|
||||
41.5493f, 92.3655f, // left mouth
|
||||
70.7299f, 92.2041f // right mouth
|
||||
};
|
||||
|
||||
// Align face using 5-point landmarks (default 112x112 for ArcFace)
|
||||
[[nodiscard]] cv::Mat alignFace(
|
||||
const cv::Mat& image,
|
||||
const std::array<cv::Point2f, 5>& landmarks,
|
||||
cv::Size output_size = cv::Size(112, 112)
|
||||
);
|
||||
|
||||
// Cosine similarity between embeddings, returns [-1, 1]
|
||||
[[nodiscard]] float cosineSimilarity(const Embedding& a, const Embedding& b) noexcept;
|
||||
|
||||
// Apply 2x3 affine transform to points
|
||||
template <size_t N>
|
||||
[[nodiscard]] std::array<cv::Point2f, N> transformPoints2D(
|
||||
const std::array<cv::Point2f, N>& points, const cv::Mat& transform
|
||||
) {
|
||||
std::array<cv::Point2f, N> result{};
|
||||
for (size_t i = 0; i < N; ++i) {
|
||||
const float x = points[i].x;
|
||||
const float y = points[i].y;
|
||||
result[i].x = static_cast<float>(
|
||||
transform.at<double>(0, 0) * x + transform.at<double>(0, 1) * y +
|
||||
transform.at<double>(0, 2)
|
||||
);
|
||||
result[i].y = static_cast<float>(
|
||||
transform.at<double>(1, 0) * x + transform.at<double>(1, 1) * y +
|
||||
transform.at<double>(1, 2)
|
||||
);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
// Letterbox resize preserving aspect ratio, returns scale factor
|
||||
[[nodiscard]] float letterboxResize(const cv::Mat& src, cv::Mat& dst, cv::Size target_size);
|
||||
|
||||
} // namespace uniface
|
||||
|
||||
#endif // UNIFACE_UTILS_HPP_
|
||||
0
uniface-cpp/models/.gitkeep
Normal file
0
uniface-cpp/models/.gitkeep
Normal file
55
uniface-cpp/src/analyzer.cpp
Normal file
55
uniface-cpp/src/analyzer.cpp
Normal file
@@ -0,0 +1,55 @@
|
||||
#include "uniface/analyzer.hpp"
|
||||
|
||||
#include <stdexcept>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
FaceAnalyzer& FaceAnalyzer::loadDetector(const std::string& path, const DetectorConfig& config) {
|
||||
detector_ = std::make_unique<RetinaFace>(
|
||||
path, config.conf_thresh, config.nms_thresh, config.input_size
|
||||
);
|
||||
return *this;
|
||||
}
|
||||
|
||||
FaceAnalyzer& FaceAnalyzer::loadRecognizer(
|
||||
const std::string& path, const RecognizerConfig& config
|
||||
) {
|
||||
recognizer_ = std::make_unique<ArcFace>(path, config);
|
||||
return *this;
|
||||
}
|
||||
|
||||
FaceAnalyzer& FaceAnalyzer::loadLandmarker(
|
||||
const std::string& path, const LandmarkerConfig& config
|
||||
) {
|
||||
landmarker_ = std::make_unique<Landmark106>(path, config);
|
||||
return *this;
|
||||
}
|
||||
|
||||
std::vector<AnalyzedFace> FaceAnalyzer::analyze(const cv::Mat& image) {
|
||||
if (!detector_) {
|
||||
throw std::runtime_error("FaceAnalyzer: detector not loaded. Call loadDetector() first.");
|
||||
}
|
||||
|
||||
auto faces = detector_->detect(image);
|
||||
|
||||
std::vector<AnalyzedFace> results;
|
||||
results.reserve(faces.size());
|
||||
|
||||
for (const auto& face : faces) {
|
||||
AnalyzedFace result;
|
||||
result.face = face;
|
||||
|
||||
if (landmarker_) {
|
||||
result.landmarks = landmarker_->getLandmarks(image, face.bbox);
|
||||
}
|
||||
if (recognizer_) {
|
||||
result.embedding = recognizer_->getNormalizedEmbedding(image, face.landmarks);
|
||||
}
|
||||
|
||||
results.push_back(std::move(result));
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
|
||||
} // namespace uniface
|
||||
204
uniface-cpp/src/detector.cpp
Normal file
204
uniface-cpp/src/detector.cpp
Normal file
@@ -0,0 +1,204 @@
|
||||
#include "uniface/detector.hpp"
|
||||
|
||||
#include "uniface/utils.hpp"
|
||||
|
||||
#include <cmath>
|
||||
#include <iostream>
|
||||
|
||||
#include <opencv2/imgproc.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
namespace {
|
||||
|
||||
// Model configuration constants
|
||||
constexpr std::array<int, 3> kFeatureStrides = {8, 16, 32};
|
||||
constexpr std::array<float, 2> kVariance = {0.1f, 0.2f};
|
||||
constexpr int kNumLandmarks = 5;
|
||||
|
||||
// BGR mean values for image normalization
|
||||
constexpr float kMeanB = 104.0f;
|
||||
constexpr float kMeanG = 117.0f;
|
||||
constexpr float kMeanR = 123.0f;
|
||||
|
||||
// Anchor min sizes for each feature map level
|
||||
const std::vector<std::vector<int>> kMinSizes = {
|
||||
{ 16, 32},
|
||||
{ 64, 128},
|
||||
{256, 512}
|
||||
};
|
||||
|
||||
} // namespace
|
||||
|
||||
RetinaFace::RetinaFace(
|
||||
const std::string& model_path, float conf_thresh, float nms_thresh, cv::Size input_size
|
||||
)
|
||||
: net_(cv::dnn::readNetFromONNX(model_path))
|
||||
, confidence_threshold_(conf_thresh)
|
||||
, nms_threshold_(nms_thresh)
|
||||
, input_size_(input_size) {
|
||||
generateAnchors();
|
||||
}
|
||||
|
||||
void RetinaFace::generateAnchors() {
|
||||
anchors_.clear();
|
||||
|
||||
size_t estimated_anchors = 0;
|
||||
for (size_t k = 0; k < kFeatureStrides.size(); ++k) {
|
||||
const int step = kFeatureStrides[k];
|
||||
const auto feature_h = static_cast<size_t>(
|
||||
std::ceil(static_cast<float>(input_size_.height) / static_cast<float>(step))
|
||||
);
|
||||
const auto feature_w = static_cast<size_t>(
|
||||
std::ceil(static_cast<float>(input_size_.width) / static_cast<float>(step))
|
||||
);
|
||||
estimated_anchors += feature_h * feature_w * kMinSizes[k].size();
|
||||
}
|
||||
anchors_.reserve(estimated_anchors);
|
||||
|
||||
for (size_t k = 0; k < kFeatureStrides.size(); ++k) {
|
||||
const int step = kFeatureStrides[k];
|
||||
const int feature_h = static_cast<int>(
|
||||
std::ceil(static_cast<float>(input_size_.height) / static_cast<float>(step))
|
||||
);
|
||||
const int feature_w = static_cast<int>(
|
||||
std::ceil(static_cast<float>(input_size_.width) / static_cast<float>(step))
|
||||
);
|
||||
|
||||
for (int i = 0; i < feature_h; ++i) {
|
||||
for (int j = 0; j < feature_w; ++j) {
|
||||
for (const int min_size : kMinSizes[k]) {
|
||||
const float s_kx = static_cast<float>(min_size) /
|
||||
static_cast<float>(input_size_.height);
|
||||
const float s_ky = static_cast<float>(min_size) /
|
||||
static_cast<float>(input_size_.width);
|
||||
const float cx = (static_cast<float>(j) + 0.5f) * static_cast<float>(step) /
|
||||
static_cast<float>(input_size_.height);
|
||||
const float cy = (static_cast<float>(i) + 0.5f) * static_cast<float>(step) /
|
||||
static_cast<float>(input_size_.width);
|
||||
|
||||
anchors_.push_back({cx, cy, s_kx, s_ky});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<Face> RetinaFace::detect(const cv::Mat& image) {
|
||||
cv::Mat input_blob;
|
||||
const float resize_factor = letterboxResize(image, input_blob, input_size_);
|
||||
|
||||
const cv::Mat blob = cv::dnn::blobFromImage(
|
||||
input_blob, 1.0, cv::Size(), cv::Scalar(kMeanB, kMeanG, kMeanR), false, false
|
||||
);
|
||||
|
||||
net_.setInput(blob);
|
||||
const auto output_names = net_.getUnconnectedOutLayersNames();
|
||||
std::vector<cv::Mat> outputs;
|
||||
net_.forward(outputs, output_names);
|
||||
|
||||
if (outputs.size() < 3) {
|
||||
std::cerr << "Error: Model output count mismatch. Expected at least 3, got "
|
||||
<< outputs.size() << std::endl;
|
||||
return {};
|
||||
}
|
||||
|
||||
// Identify outputs by shape: loc(N,4), conf(N,2), landmarks(N,10)
|
||||
cv::Mat loc_output, conf_output, land_output;
|
||||
|
||||
for (const auto& output : outputs) {
|
||||
switch (output.size[2]) {
|
||||
case 4: loc_output = output; break;
|
||||
case 2: conf_output = output; break;
|
||||
case 10: land_output = output; break;
|
||||
default: break;
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback to positional outputs
|
||||
if (loc_output.empty()) loc_output = outputs[0];
|
||||
if (conf_output.empty()) conf_output = outputs[1];
|
||||
if (land_output.empty()) land_output = outputs[2];
|
||||
|
||||
const auto* loc_data = reinterpret_cast<const float*>(loc_output.data);
|
||||
const auto* conf_data = reinterpret_cast<const float*>(conf_output.data);
|
||||
const auto* land_data = reinterpret_cast<const float*>(land_output.data);
|
||||
const auto num_priors = static_cast<size_t>(loc_output.size[1]);
|
||||
|
||||
if (num_priors != anchors_.size()) {
|
||||
std::cerr << "Error: Anchor count mismatch! Expected " << anchors_.size()
|
||||
<< " anchors but model output has " << num_priors << " priors.\n"
|
||||
<< "This usually means the input size doesn't match the model's "
|
||||
<< "expected size." << std::endl;
|
||||
return {};
|
||||
}
|
||||
|
||||
std::vector<cv::Rect2f> decoded_boxes;
|
||||
std::vector<float> scores;
|
||||
std::vector<std::array<cv::Point2f, 5>> decoded_landmarks;
|
||||
decoded_boxes.reserve(num_priors);
|
||||
scores.reserve(num_priors);
|
||||
decoded_landmarks.reserve(num_priors);
|
||||
|
||||
const auto scale_w = static_cast<float>(input_size_.width);
|
||||
const auto scale_h = static_cast<float>(input_size_.height);
|
||||
|
||||
for (size_t i = 0; i < num_priors; ++i) {
|
||||
const float score = conf_data[i * 2 + 1];
|
||||
if (score < confidence_threshold_) continue;
|
||||
|
||||
const float px = anchors_[i][0];
|
||||
const float py = anchors_[i][1];
|
||||
const float pw = anchors_[i][2];
|
||||
const float ph = anchors_[i][3];
|
||||
|
||||
const float dx = loc_data[i * 4 + 0];
|
||||
const float dy = loc_data[i * 4 + 1];
|
||||
const float dw = loc_data[i * 4 + 2];
|
||||
const float dh = loc_data[i * 4 + 3];
|
||||
|
||||
const float cx = px + dx * kVariance[0] * pw;
|
||||
const float cy = py + dy * kVariance[0] * ph;
|
||||
const float w = pw * std::exp(dw * kVariance[1]);
|
||||
const float h = ph * std::exp(dh * kVariance[1]);
|
||||
|
||||
const float x1 = (cx - w / 2.0f) * scale_w / resize_factor;
|
||||
const float y1 = (cy - h / 2.0f) * scale_h / resize_factor;
|
||||
const float x2 = (cx + w / 2.0f) * scale_w / resize_factor;
|
||||
const float y2 = (cy + h / 2.0f) * scale_h / resize_factor;
|
||||
|
||||
decoded_boxes.emplace_back(x1, y1, x2 - x1, y2 - y1);
|
||||
scores.push_back(score);
|
||||
|
||||
std::array<cv::Point2f, 5> landmarks{};
|
||||
for (int k = 0; k < kNumLandmarks; ++k) {
|
||||
const float ldx = land_data[i * 10 + static_cast<size_t>(k) * 2 + 0];
|
||||
const float ldy = land_data[i * 10 + static_cast<size_t>(k) * 2 + 1];
|
||||
const float lx = (px + ldx * kVariance[0] * pw) * scale_w / resize_factor;
|
||||
const float ly = (py + ldy * kVariance[0] * ph) * scale_h / resize_factor;
|
||||
landmarks[static_cast<size_t>(k)] = cv::Point2f(lx, ly);
|
||||
}
|
||||
decoded_landmarks.push_back(landmarks);
|
||||
}
|
||||
|
||||
// NMS
|
||||
std::vector<cv::Rect2d> boxes_for_nms;
|
||||
boxes_for_nms.reserve(decoded_boxes.size());
|
||||
for (const auto& box : decoded_boxes) {
|
||||
boxes_for_nms.emplace_back(box.x, box.y, box.width, box.height);
|
||||
}
|
||||
|
||||
std::vector<int> nms_indices;
|
||||
cv::dnn::NMSBoxes(boxes_for_nms, scores, confidence_threshold_, nms_threshold_, nms_indices);
|
||||
|
||||
std::vector<Face> results;
|
||||
results.reserve(nms_indices.size());
|
||||
for (const int idx : nms_indices) {
|
||||
const auto uidx = static_cast<size_t>(idx);
|
||||
results.push_back({decoded_boxes[uidx], scores[uidx], decoded_landmarks[uidx]});
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
|
||||
} // namespace uniface
|
||||
90
uniface-cpp/src/landmarker.cpp
Normal file
90
uniface-cpp/src/landmarker.cpp
Normal file
@@ -0,0 +1,90 @@
|
||||
#include "uniface/landmarker.hpp"
|
||||
|
||||
#include <cmath>
|
||||
|
||||
#include <opencv2/imgproc.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
namespace {
|
||||
|
||||
constexpr int kNumLandmarks = 106;
|
||||
|
||||
cv::Mat computeCenterTransform(const cv::Point2f& center, float scale, int output_size) {
|
||||
cv::Mat transform = cv::Mat::zeros(2, 3, CV_64F);
|
||||
|
||||
transform.at<double>(0, 0) = scale;
|
||||
transform.at<double>(1, 1) = scale;
|
||||
transform.at<double>(0, 2) = -center.x * scale + output_size / 2.0;
|
||||
transform.at<double>(1, 2) = -center.y * scale + output_size / 2.0;
|
||||
|
||||
return transform;
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
Landmark106::Landmark106(const std::string& model_path, const LandmarkerConfig& config)
|
||||
: net_(cv::dnn::readNetFromONNX(model_path))
|
||||
, config_(config) {}
|
||||
|
||||
cv::Mat Landmark106::preprocess(const cv::Mat& image, const cv::Rect2f& bbox, cv::Mat& transform) {
|
||||
const float width = bbox.width;
|
||||
const float height = bbox.height;
|
||||
const float center_x = bbox.x + width / 2.0f;
|
||||
const float center_y = bbox.y + height / 2.0f;
|
||||
|
||||
const float max_dim = std::max(width, height);
|
||||
const float scale = static_cast<float>(config_.input_size.width) / (max_dim * 1.5f);
|
||||
|
||||
transform = computeCenterTransform(cv::Point2f(center_x, center_y), scale, config_.input_size.width);
|
||||
|
||||
cv::Mat aligned;
|
||||
cv::warpAffine(image, aligned, transform, config_.input_size, cv::INTER_LINEAR, cv::BORDER_CONSTANT);
|
||||
|
||||
cv::Mat blob = cv::dnn::blobFromImage(aligned, 1.0, config_.input_size, cv::Scalar(0, 0, 0), true, false);
|
||||
|
||||
return blob;
|
||||
}
|
||||
|
||||
Landmarks Landmark106::postprocess(const cv::Mat& predictions, const cv::Mat& transform) {
|
||||
Landmarks result{};
|
||||
const auto* pred_data = reinterpret_cast<const float*>(predictions.data);
|
||||
|
||||
cv::Mat inverse_transform;
|
||||
cv::invertAffineTransform(transform, inverse_transform);
|
||||
|
||||
const int input_size = config_.input_size.width;
|
||||
const float half_size = static_cast<float>(input_size) / 2.0f;
|
||||
|
||||
for (int i = 0; i < kNumLandmarks; ++i) {
|
||||
// Denormalize from [-1, 1] to pixel coordinates
|
||||
float x = (pred_data[i * 2 + 0] + 1.0f) * half_size;
|
||||
float y = (pred_data[i * 2 + 1] + 1.0f) * half_size;
|
||||
|
||||
// Transform back to original image coordinates
|
||||
const float orig_x = static_cast<float>(
|
||||
inverse_transform.at<double>(0, 0) * x + inverse_transform.at<double>(0, 1) * y +
|
||||
inverse_transform.at<double>(0, 2)
|
||||
);
|
||||
const float orig_y = static_cast<float>(
|
||||
inverse_transform.at<double>(1, 0) * x + inverse_transform.at<double>(1, 1) * y +
|
||||
inverse_transform.at<double>(1, 2)
|
||||
);
|
||||
|
||||
result.points[static_cast<size_t>(i)] = cv::Point2f(orig_x, orig_y);
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
Landmarks Landmark106::getLandmarks(const cv::Mat& image, const cv::Rect2f& bbox) {
|
||||
cv::Mat transform;
|
||||
cv::Mat blob = preprocess(image, bbox, transform);
|
||||
|
||||
net_.setInput(blob);
|
||||
cv::Mat output = net_.forward();
|
||||
|
||||
return postprocess(output, transform);
|
||||
}
|
||||
|
||||
} // namespace uniface
|
||||
73
uniface-cpp/src/recognizer.cpp
Normal file
73
uniface-cpp/src/recognizer.cpp
Normal file
@@ -0,0 +1,73 @@
|
||||
#include "uniface/recognizer.hpp"
|
||||
|
||||
#include "uniface/utils.hpp"
|
||||
|
||||
#include <cmath>
|
||||
|
||||
#include <opencv2/imgproc.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
ArcFace::ArcFace(const std::string& model_path, const RecognizerConfig& config)
|
||||
: net_(cv::dnn::readNetFromONNX(model_path))
|
||||
, config_(config) {}
|
||||
|
||||
cv::Mat ArcFace::preprocess(const cv::Mat& face_image) {
|
||||
cv::Mat resized;
|
||||
if (face_image.size() != config_.input_size) {
|
||||
cv::resize(face_image, resized, config_.input_size);
|
||||
} else {
|
||||
resized = face_image;
|
||||
}
|
||||
|
||||
// Normalize: (pixel - mean) / std, BGR -> RGB
|
||||
cv::Mat blob = cv::dnn::blobFromImage(
|
||||
resized, 1.0 / config_.input_std, config_.input_size,
|
||||
cv::Scalar(config_.input_mean, config_.input_mean, config_.input_mean), true, false
|
||||
);
|
||||
|
||||
return blob;
|
||||
}
|
||||
|
||||
Embedding ArcFace::getEmbedding(const cv::Mat& aligned_face) {
|
||||
cv::Mat blob = preprocess(aligned_face);
|
||||
|
||||
net_.setInput(blob);
|
||||
cv::Mat output = net_.forward();
|
||||
|
||||
Embedding embedding{};
|
||||
const auto* output_data = reinterpret_cast<const float*>(output.data);
|
||||
const size_t embedding_size = std::min(static_cast<size_t>(output.total()), embedding.size());
|
||||
|
||||
for (size_t i = 0; i < embedding_size; ++i) {
|
||||
embedding[i] = output_data[i];
|
||||
}
|
||||
|
||||
return embedding;
|
||||
}
|
||||
|
||||
Embedding ArcFace::getEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks) {
|
||||
cv::Mat aligned = alignFace(image, landmarks, config_.input_size);
|
||||
return getEmbedding(aligned);
|
||||
}
|
||||
|
||||
Embedding ArcFace::getNormalizedEmbedding(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks) {
|
||||
Embedding embedding = getEmbedding(image, landmarks);
|
||||
|
||||
// L2 normalize
|
||||
float norm = 0.0f;
|
||||
for (const float val : embedding) {
|
||||
norm += val * val;
|
||||
}
|
||||
norm = std::sqrt(norm);
|
||||
|
||||
if (norm > 1e-8f) {
|
||||
for (float& val : embedding) {
|
||||
val /= norm;
|
||||
}
|
||||
}
|
||||
|
||||
return embedding;
|
||||
}
|
||||
|
||||
} // namespace uniface
|
||||
82
uniface-cpp/src/utils.cpp
Normal file
82
uniface-cpp/src/utils.cpp
Normal file
@@ -0,0 +1,82 @@
|
||||
#include "uniface/utils.hpp"
|
||||
|
||||
#include <opencv2/calib3d.hpp>
|
||||
#include <opencv2/imgproc.hpp>
|
||||
|
||||
namespace uniface {
|
||||
|
||||
cv::Mat alignFace(const cv::Mat& image, const std::array<cv::Point2f, 5>& landmarks, cv::Size output_size) {
|
||||
const float ratio = static_cast<float>(output_size.width) / 112.0f;
|
||||
|
||||
std::vector<cv::Point2f> dst_points(5);
|
||||
for (int i = 0; i < 5; ++i) {
|
||||
dst_points[i].x = kReferenceAlignment[static_cast<size_t>(i) * 2] * ratio;
|
||||
dst_points[i].y = kReferenceAlignment[static_cast<size_t>(i) * 2 + 1] * ratio;
|
||||
}
|
||||
|
||||
std::vector<cv::Point2f> src_points(landmarks.begin(), landmarks.end());
|
||||
cv::Mat transform = cv::estimateAffinePartial2D(src_points, dst_points);
|
||||
|
||||
if (transform.empty()) {
|
||||
cv::Mat resized;
|
||||
cv::resize(image, resized, output_size);
|
||||
return resized;
|
||||
}
|
||||
|
||||
cv::Mat aligned;
|
||||
cv::warpAffine(image, aligned, transform, output_size, cv::INTER_LINEAR, cv::BORDER_CONSTANT);
|
||||
|
||||
return aligned;
|
||||
}
|
||||
|
||||
float cosineSimilarity(const Embedding& a, const Embedding& b) noexcept {
|
||||
float dot = 0.0f;
|
||||
float norm_a = 0.0f;
|
||||
float norm_b = 0.0f;
|
||||
|
||||
for (size_t i = 0; i < a.size(); ++i) {
|
||||
dot += a[i] * b[i];
|
||||
norm_a += a[i] * a[i];
|
||||
norm_b += b[i] * b[i];
|
||||
}
|
||||
|
||||
const float denom = std::sqrt(norm_a) * std::sqrt(norm_b);
|
||||
if (denom < 1e-8f) {
|
||||
return 0.0f;
|
||||
}
|
||||
|
||||
return dot / denom;
|
||||
}
|
||||
|
||||
float letterboxResize(const cv::Mat& src, cv::Mat& dst, cv::Size target_size) {
|
||||
const auto src_height = static_cast<float>(src.rows);
|
||||
const auto src_width = static_cast<float>(src.cols);
|
||||
const auto target_height = static_cast<float>(target_size.height);
|
||||
const auto target_width = static_cast<float>(target_size.width);
|
||||
|
||||
const float im_ratio = src_height / src_width;
|
||||
const float model_ratio = target_height / target_width;
|
||||
|
||||
int new_width = 0;
|
||||
int new_height = 0;
|
||||
|
||||
if (im_ratio > model_ratio) {
|
||||
new_height = static_cast<int>(target_height);
|
||||
new_width = static_cast<int>(static_cast<float>(new_height) / im_ratio);
|
||||
} else {
|
||||
new_width = static_cast<int>(target_width);
|
||||
new_height = static_cast<int>(static_cast<float>(new_width) * im_ratio);
|
||||
}
|
||||
|
||||
const float resize_factor = static_cast<float>(new_height) / src_height;
|
||||
|
||||
cv::Mat resized;
|
||||
cv::resize(src, resized, cv::Size(new_width, new_height));
|
||||
|
||||
dst = cv::Mat::zeros(target_size, src.type());
|
||||
resized.copyTo(dst(cv::Rect(0, 0, new_width, new_height)));
|
||||
|
||||
return resize_factor;
|
||||
}
|
||||
|
||||
} // namespace uniface
|
||||
@@ -20,7 +20,6 @@ This library provides unified APIs for:
|
||||
- Facial landmarks (106-point detection)
|
||||
- Face parsing (semantic segmentation)
|
||||
- Gaze estimation
|
||||
- Head pose estimation
|
||||
- Age, gender, and emotion prediction
|
||||
- Face anti-spoofing
|
||||
- Privacy/anonymization
|
||||
@@ -30,7 +29,7 @@ from __future__ import annotations
|
||||
|
||||
__license__ = 'MIT'
|
||||
__author__ = 'Yakhyokhuja Valikhujaev'
|
||||
__version__ = '3.2.0'
|
||||
__version__ = '3.1.0'
|
||||
|
||||
import contextlib
|
||||
|
||||
@@ -39,7 +38,7 @@ from uniface.log import Logger, enable_logging
|
||||
from uniface.model_store import download_models, get_cache_dir, set_cache_dir, verify_model_weights
|
||||
|
||||
from .analyzer import FaceAnalyzer
|
||||
from .attribute import AgeGender, Emotion, FairFace, create_attribute_predictor
|
||||
from .attribute import AgeGender, Emotion, FairFace
|
||||
from .detection import (
|
||||
SCRFD,
|
||||
RetinaFace,
|
||||
@@ -49,14 +48,13 @@ from .detection import (
|
||||
list_available_detectors,
|
||||
)
|
||||
from .gaze import MobileGaze, create_gaze_estimator
|
||||
from .headpose import HeadPose, create_head_pose_estimator
|
||||
from .landmark import Landmark106, create_landmarker
|
||||
from .parsing import BiSeNet, XSeg, create_face_parser
|
||||
from .privacy import BlurFace
|
||||
from .recognition import AdaFace, ArcFace, MobileFace, SphereFace, create_recognizer
|
||||
from .spoofing import MiniFASNet, create_spoofer
|
||||
from .tracking import BYTETracker
|
||||
from .types import AttributeResult, EmotionResult, Face, GazeResult, HeadPoseResult, SpoofingResult
|
||||
from .types import AttributeResult, EmotionResult, Face, GazeResult, SpoofingResult
|
||||
|
||||
# Optional: FAISS vector store (requires `pip install faiss-cpu`)
|
||||
with contextlib.suppress(ImportError):
|
||||
@@ -74,7 +72,6 @@ __all__ = [
|
||||
'create_detector',
|
||||
'create_face_parser',
|
||||
'create_gaze_estimator',
|
||||
'create_head_pose_estimator',
|
||||
'create_landmarker',
|
||||
'create_recognizer',
|
||||
'create_spoofer',
|
||||
@@ -94,16 +91,12 @@ __all__ = [
|
||||
# Gaze models
|
||||
'GazeResult',
|
||||
'MobileGaze',
|
||||
# Head pose models
|
||||
'HeadPose',
|
||||
'HeadPoseResult',
|
||||
# Parsing models
|
||||
'BiSeNet',
|
||||
'XSeg',
|
||||
# Attribute models
|
||||
'AgeGender',
|
||||
'AttributeResult',
|
||||
'create_attribute_predictor',
|
||||
'Emotion',
|
||||
'EmotionResult',
|
||||
'FairFace',
|
||||
|
||||
@@ -6,7 +6,8 @@ from __future__ import annotations
|
||||
|
||||
import numpy as np
|
||||
|
||||
from uniface.attribute.base import Attribute
|
||||
from uniface.attribute.age_gender import AgeGender
|
||||
from uniface.attribute.fairface import FairFace
|
||||
from uniface.detection.base import BaseDetector
|
||||
from uniface.log import Logger
|
||||
from uniface.recognition.base import BaseRecognizer
|
||||
@@ -20,24 +21,19 @@ class FaceAnalyzer:
|
||||
|
||||
This class provides a high-level interface for face analysis by combining
|
||||
multiple components: face detection, recognition (embedding extraction),
|
||||
and an extensible list of attribute predictors (age, gender, race,
|
||||
emotion, etc.).
|
||||
|
||||
Any :class:`~uniface.attribute.base.Attribute` subclass can be passed
|
||||
via the ``attributes`` list. Each predictor's ``predict(image, face)``
|
||||
is called once per detected face, enriching the :class:`Face` in-place.
|
||||
and attribute prediction (age, gender, race).
|
||||
|
||||
Args:
|
||||
detector: Face detector instance for detecting faces in images.
|
||||
recognizer: Optional face recognizer for extracting embeddings.
|
||||
attributes: Optional list of ``Attribute`` predictors to run on
|
||||
each detected face (e.g. ``[AgeGender(), FairFace(), Emotion()]``).
|
||||
age_gender: Optional age/gender predictor.
|
||||
fairface: Optional FairFace predictor for demographics.
|
||||
|
||||
Example:
|
||||
>>> from uniface import RetinaFace, ArcFace, AgeGender, FaceAnalyzer
|
||||
>>> from uniface import RetinaFace, ArcFace, FaceAnalyzer
|
||||
>>> detector = RetinaFace()
|
||||
>>> recognizer = ArcFace()
|
||||
>>> analyzer = FaceAnalyzer(detector, recognizer=recognizer, attributes=[AgeGender()])
|
||||
>>> analyzer = FaceAnalyzer(detector, recognizer=recognizer)
|
||||
>>> faces = analyzer.analyze(image)
|
||||
"""
|
||||
|
||||
@@ -45,23 +41,27 @@ class FaceAnalyzer:
|
||||
self,
|
||||
detector: BaseDetector,
|
||||
recognizer: BaseRecognizer | None = None,
|
||||
attributes: list[Attribute] | None = None,
|
||||
age_gender: AgeGender | None = None,
|
||||
fairface: FairFace | None = None,
|
||||
) -> None:
|
||||
self.detector = detector
|
||||
self.recognizer = recognizer
|
||||
self.attributes: list[Attribute] = attributes or []
|
||||
self.age_gender = age_gender
|
||||
self.fairface = fairface
|
||||
|
||||
Logger.info(f'Initialized FaceAnalyzer with detector={detector.__class__.__name__}')
|
||||
if recognizer:
|
||||
Logger.info(f' - Recognition enabled: {recognizer.__class__.__name__}')
|
||||
for attr in self.attributes:
|
||||
Logger.info(f' - Attribute enabled: {attr.__class__.__name__}')
|
||||
if age_gender:
|
||||
Logger.info(f' - Age/Gender enabled: {age_gender.__class__.__name__}')
|
||||
if fairface:
|
||||
Logger.info(f' - FairFace enabled: {fairface.__class__.__name__}')
|
||||
|
||||
def analyze(self, image: np.ndarray) -> list[Face]:
|
||||
"""Analyze faces in an image.
|
||||
|
||||
Performs face detection, optionally extracts embeddings, and runs
|
||||
every registered attribute predictor on each detected face.
|
||||
Performs face detection and optionally extracts embeddings and
|
||||
predicts attributes for each detected face.
|
||||
|
||||
Args:
|
||||
image: Input image as numpy array with shape (H, W, C) in BGR format.
|
||||
@@ -80,13 +80,24 @@ class FaceAnalyzer:
|
||||
except Exception as e:
|
||||
Logger.warning(f' Face {idx + 1}: Failed to extract embedding: {e}')
|
||||
|
||||
for attr in self.attributes:
|
||||
attr_name = attr.__class__.__name__
|
||||
if self.age_gender is not None:
|
||||
try:
|
||||
attr.predict(image, face)
|
||||
Logger.debug(f' Face {idx + 1}: {attr_name} prediction succeeded')
|
||||
result = self.age_gender.predict(image, face.bbox)
|
||||
face.gender = result.gender
|
||||
face.age = result.age
|
||||
Logger.debug(f' Face {idx + 1}: Age={face.age}, Gender={face.sex}')
|
||||
except Exception as e:
|
||||
Logger.warning(f' Face {idx + 1}: {attr_name} prediction failed: {e}')
|
||||
Logger.warning(f' Face {idx + 1}: Failed to predict age/gender: {e}')
|
||||
|
||||
if self.fairface is not None:
|
||||
try:
|
||||
result = self.fairface.predict(image, face.bbox)
|
||||
face.gender = result.gender
|
||||
face.age_group = result.age_group
|
||||
face.race = result.race
|
||||
Logger.debug(f' Face {idx + 1}: AgeGroup={face.age_group}, Gender={face.sex}, Race={face.race}')
|
||||
except Exception as e:
|
||||
Logger.warning(f' Face {idx + 1}: Failed to predict FairFace attributes: {e}')
|
||||
|
||||
Logger.info(f'Analysis complete: {len(faces)} face(s) processed')
|
||||
return faces
|
||||
@@ -95,6 +106,8 @@ class FaceAnalyzer:
|
||||
parts = [f'FaceAnalyzer(detector={self.detector.__class__.__name__}']
|
||||
if self.recognizer:
|
||||
parts.append(f'recognizer={self.recognizer.__class__.__name__}')
|
||||
for attr in self.attributes:
|
||||
parts.append(f'{attr.__class__.__name__}')
|
||||
if self.age_gender:
|
||||
parts.append(f'age_gender={self.age_gender.__class__.__name__}')
|
||||
if self.fairface:
|
||||
parts.append(f'fairface={self.fairface.__class__.__name__}')
|
||||
return ', '.join(parts) + ')'
|
||||
|
||||
@@ -12,7 +12,7 @@ from uniface.attribute.age_gender import AgeGender
|
||||
from uniface.attribute.base import Attribute
|
||||
from uniface.attribute.fairface import FairFace
|
||||
from uniface.constants import AgeGenderWeights, DDAMFNWeights, FairFaceWeights
|
||||
from uniface.types import AttributeResult, EmotionResult, Face
|
||||
from uniface.types import AttributeResult, EmotionResult
|
||||
|
||||
try:
|
||||
from uniface.attribute.emotion import Emotion
|
||||
@@ -30,7 +30,7 @@ except ImportError:
|
||||
def _initialize_model(self) -> None: ...
|
||||
def preprocess(self, image: np.ndarray, *args: Any) -> Any: ...
|
||||
def postprocess(self, prediction: Any) -> Any: ...
|
||||
def predict(self, image: np.ndarray, face: Face) -> Any: ...
|
||||
def predict(self, image: np.ndarray, *args: Any) -> Any: ...
|
||||
|
||||
|
||||
__all__ = [
|
||||
|
||||
@@ -12,7 +12,7 @@ from uniface.face_utils import bbox_center_alignment
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
from uniface.types import AttributeResult, Face
|
||||
from uniface.types import AttributeResult
|
||||
|
||||
__all__ = ['AgeGender']
|
||||
|
||||
@@ -133,20 +133,17 @@ class AgeGender(Attribute):
|
||||
age = int(np.round(prediction[2] * 100))
|
||||
return AttributeResult(gender=gender, age=age)
|
||||
|
||||
def predict(self, image: np.ndarray, face: Face) -> AttributeResult:
|
||||
"""Predict age and gender and enrich the Face in-place.
|
||||
def predict(self, image: np.ndarray, bbox: list | np.ndarray) -> AttributeResult:
|
||||
"""
|
||||
Predicts age and gender for a single face specified by a bounding box.
|
||||
|
||||
Args:
|
||||
image: The full input image in BGR format.
|
||||
face: Detected face; ``face.bbox`` is used for alignment.
|
||||
image (np.ndarray): The full input image in BGR format.
|
||||
bbox (Union[List, np.ndarray]): The face bounding box coordinates [x1, y1, x2, y2].
|
||||
|
||||
Returns:
|
||||
``AttributeResult`` with gender (0=Female, 1=Male) and age (years).
|
||||
AttributeResult: Result containing gender (0=Female, 1=Male) and age (in years).
|
||||
"""
|
||||
face_blob = self.preprocess(image, face.bbox)
|
||||
face_blob = self.preprocess(image, bbox)
|
||||
prediction = self.session.run(self.output_names, {self.input_name: face_blob})[0][0]
|
||||
result = self.postprocess(prediction)
|
||||
|
||||
face.gender = result.gender
|
||||
face.age = result.age
|
||||
return result
|
||||
return self.postprocess(prediction)
|
||||
|
||||
@@ -2,78 +2,95 @@
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
|
||||
from uniface.types import AttributeResult, EmotionResult, Face
|
||||
from uniface.types import AttributeResult, EmotionResult
|
||||
|
||||
__all__ = ['Attribute', 'AttributeResult', 'EmotionResult']
|
||||
|
||||
|
||||
class Attribute(ABC):
|
||||
"""Abstract base class for face attribute models.
|
||||
"""
|
||||
Abstract base class for face attribute models.
|
||||
|
||||
All attribute models (age-gender, emotion, FairFace, etc.) implement this
|
||||
interface so they can be used interchangeably inside ``FaceAnalyzer``.
|
||||
|
||||
The ``predict`` method accepts an image and a :class:`Face` object. Each
|
||||
subclass extracts what it needs (bbox, landmarks) from the Face, runs
|
||||
inference, writes the results back to the Face **and** returns a typed
|
||||
result dataclass.
|
||||
This class defines the common interface that all attribute models
|
||||
(e.g., age-gender, emotion) must implement. It ensures a consistent API
|
||||
across different attribute prediction modules in the library, making them
|
||||
interchangeable and easy to use.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def _initialize_model(self) -> None:
|
||||
"""Load model weights and create the inference session."""
|
||||
"""
|
||||
Initializes the underlying model for inference.
|
||||
|
||||
This method should handle loading model weights, creating the
|
||||
inference session (e.g., ONNX Runtime, PyTorch), and any necessary
|
||||
warm-up procedures to prepare the model for prediction.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
|
||||
|
||||
@abstractmethod
|
||||
def preprocess(self, image: np.ndarray, *args: Any) -> Any:
|
||||
"""Preprocess the input data for the model.
|
||||
"""
|
||||
Preprocesses the input data for the model.
|
||||
|
||||
This method should take a raw image and any other necessary data
|
||||
(like bounding boxes or landmarks) and convert it into the format
|
||||
expected by the model's inference engine (e.g., a blob or tensor).
|
||||
|
||||
Args:
|
||||
image: The input image in BGR format.
|
||||
*args: Subclass-specific data (bbox, landmarks, etc.).
|
||||
image (np.ndarray): The input image containing the face, typically
|
||||
in BGR format.
|
||||
*args: Additional arguments required for preprocessing, such as
|
||||
bounding boxes or facial landmarks.
|
||||
|
||||
Returns:
|
||||
Preprocessed data ready for model inference.
|
||||
The preprocessed data ready for model inference.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the preprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def postprocess(self, prediction: Any) -> Any:
|
||||
"""Convert raw model output into a typed result dataclass.
|
||||
"""
|
||||
Postprocesses the raw model output into a human-readable format.
|
||||
|
||||
This method takes the raw output from the model's inference and
|
||||
converts it into a meaningful result, such as an age value, a gender
|
||||
label, or an emotion category.
|
||||
|
||||
Args:
|
||||
prediction: Raw output from the model.
|
||||
prediction (Any): The raw output from the model's inference.
|
||||
|
||||
Returns:
|
||||
An ``AttributeResult`` or ``EmotionResult``.
|
||||
The final, processed attributes.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the postprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def predict(self, image: np.ndarray, face: Face) -> AttributeResult | EmotionResult:
|
||||
"""Run end-to-end prediction and enrich the Face in-place.
|
||||
def predict(self, image: np.ndarray, *args: Any) -> Any:
|
||||
"""
|
||||
Performs end-to-end attribute prediction on a given image.
|
||||
|
||||
Each subclass extracts what it needs from *face* (e.g. ``face.bbox``
|
||||
or ``face.landmarks``), runs the full preprocess-infer-postprocess
|
||||
pipeline, writes relevant fields back to *face*, and returns the
|
||||
result dataclass.
|
||||
This method orchestrates the full pipeline: it calls the preprocess,
|
||||
inference, and postprocess steps to return the final, user-friendly
|
||||
attribute prediction.
|
||||
|
||||
Args:
|
||||
image: The full input image in BGR format.
|
||||
face: Detected face whose attribute fields will be populated.
|
||||
image (np.ndarray): The input image containing the face.
|
||||
*args: Additional data required for prediction, such as a bounding
|
||||
box or landmarks.
|
||||
|
||||
Returns:
|
||||
The prediction result (``AttributeResult`` or ``EmotionResult``).
|
||||
The final predicted attributes.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the predict method.')
|
||||
|
||||
def __call__(self, image: np.ndarray, face: Face) -> AttributeResult | EmotionResult:
|
||||
"""Callable shortcut for :meth:`predict`."""
|
||||
return self.predict(image, face)
|
||||
def __call__(self, *args, **kwargs) -> Any:
|
||||
"""
|
||||
Provides a convenient, callable shortcut for the `predict` method.
|
||||
"""
|
||||
return self.predict(*args, **kwargs)
|
||||
|
||||
@@ -12,7 +12,7 @@ from uniface.constants import DDAMFNWeights
|
||||
from uniface.face_utils import face_alignment
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.types import EmotionResult, Face
|
||||
from uniface.types import EmotionResult
|
||||
|
||||
__all__ = ['Emotion']
|
||||
|
||||
@@ -116,23 +116,14 @@ class Emotion(Attribute):
|
||||
confidence = float(probabilities[pred_index])
|
||||
return EmotionResult(emotion=emotion_label, confidence=confidence)
|
||||
|
||||
def predict(self, image: np.ndarray, face: Face) -> EmotionResult:
|
||||
"""Predict emotion and enrich the Face in-place.
|
||||
|
||||
Args:
|
||||
image: The full input image in BGR format.
|
||||
face: Detected face; ``face.landmarks`` is used for alignment.
|
||||
|
||||
Returns:
|
||||
``EmotionResult`` with emotion label and confidence score.
|
||||
def predict(self, image: np.ndarray, landmark: list | np.ndarray) -> EmotionResult:
|
||||
"""
|
||||
input_tensor = self.preprocess(image, face.landmarks)
|
||||
Predicts the emotion from a single face specified by its landmarks.
|
||||
"""
|
||||
input_tensor = self.preprocess(image, landmark)
|
||||
with torch.no_grad():
|
||||
output = self.model(input_tensor)
|
||||
if isinstance(output, tuple):
|
||||
output = output[0]
|
||||
|
||||
result = self.postprocess(output)
|
||||
face.emotion = result.emotion
|
||||
face.emotion_confidence = result.confidence
|
||||
return result
|
||||
return self.postprocess(output)
|
||||
|
||||
@@ -11,7 +11,7 @@ from uniface.constants import FairFaceWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
from uniface.types import AttributeResult, Face
|
||||
from uniface.types import AttributeResult
|
||||
|
||||
__all__ = ['AGE_LABELS', 'RACE_LABELS', 'FairFace']
|
||||
|
||||
@@ -168,24 +168,29 @@ class FairFace(Attribute):
|
||||
race=RACE_LABELS[race_idx],
|
||||
)
|
||||
|
||||
def predict(self, image: np.ndarray, face: Face) -> AttributeResult:
|
||||
"""Predict race, gender, and age and enrich the Face in-place.
|
||||
def predict(self, image: np.ndarray, bbox: list | np.ndarray | None = None) -> AttributeResult:
|
||||
"""
|
||||
Predicts race, gender, and age for a face.
|
||||
|
||||
Args:
|
||||
image: The full input image in BGR format.
|
||||
face: Detected face; ``face.bbox`` is used for cropping.
|
||||
image (np.ndarray): The input image in BGR format.
|
||||
bbox (Optional[Union[List, np.ndarray]]): Face bounding box [x1, y1, x2, y2].
|
||||
If None, uses the entire image.
|
||||
|
||||
Returns:
|
||||
``AttributeResult`` with gender, age_group, and race.
|
||||
AttributeResult: Result containing:
|
||||
- gender: 0=Female, 1=Male
|
||||
- age_group: Age range string like "20-29"
|
||||
- race: Race/ethnicity label
|
||||
"""
|
||||
input_blob = self.preprocess(image, face.bbox)
|
||||
outputs = self.session.run(self.output_names, {self.input_name: input_blob})
|
||||
result = self.postprocess(outputs)
|
||||
# Preprocess
|
||||
input_blob = self.preprocess(image, bbox)
|
||||
|
||||
face.gender = result.gender
|
||||
face.age_group = result.age_group
|
||||
face.race = result.race
|
||||
return result
|
||||
# Inference
|
||||
outputs = self.session.run(self.output_names, {self.input_name: input_blob})
|
||||
|
||||
# Postprocess
|
||||
return self.postprocess(outputs)
|
||||
|
||||
@staticmethod
|
||||
def _softmax(x: np.ndarray) -> np.ndarray:
|
||||
|
||||
@@ -16,7 +16,6 @@ __all__ = [
|
||||
'distance2bbox',
|
||||
'distance2kps',
|
||||
'generate_anchors',
|
||||
'letterbox_resize',
|
||||
'non_max_suppression',
|
||||
'resize_image',
|
||||
'xyxy_to_cxcywh',
|
||||
@@ -278,70 +277,3 @@ def distance2kps(
|
||||
preds.append(px)
|
||||
preds.append(py)
|
||||
return np.stack(preds, axis=-1)
|
||||
|
||||
|
||||
def letterbox_resize(
|
||||
image: np.ndarray,
|
||||
target_size: int,
|
||||
fill_value: int = 114,
|
||||
) -> tuple[np.ndarray, float, tuple[int, int]]:
|
||||
"""Letterbox resize with center padding for YOLO-style detectors.
|
||||
|
||||
Maintains aspect ratio by scaling the image to fit within target_size,
|
||||
then center-pads with a constant fill value. Converts BGR to RGB,
|
||||
normalizes to [0, 1], and transposes to NCHW format.
|
||||
|
||||
This preprocessing strategy is standard for YOLO models and ensures
|
||||
no distortion while maintaining a square input size.
|
||||
|
||||
Args:
|
||||
image: Input image in BGR format with shape (H, W, C).
|
||||
target_size: Target square size (e.g., 640 for 640x640 input).
|
||||
fill_value: Padding fill value (default: 114 for gray background).
|
||||
|
||||
Returns:
|
||||
Tuple of (preprocessed_tensor, scale_ratio, padding):
|
||||
- preprocessed_tensor: Shape (1, 3, target_size, target_size),
|
||||
RGB, normalized [0, 1], NCHW format, float32, contiguous.
|
||||
- scale_ratio: Resize scale factor for coordinate transformation.
|
||||
- padding: Padding offsets as (pad_w, pad_h) for coordinate transformation.
|
||||
|
||||
Example:
|
||||
>>> image = cv2.imread('face.jpg') # (480, 640, 3)
|
||||
>>> tensor, scale, (pad_w, pad_h) = letterbox_resize(image, 640)
|
||||
>>> tensor.shape
|
||||
(1, 3, 640, 640)
|
||||
>>> # To transform coordinates back to original:
|
||||
>>> x_orig = (x_detected - pad_w) / scale
|
||||
>>> y_orig = (y_detected - pad_h) / scale
|
||||
"""
|
||||
# Get original image shape
|
||||
img_h, img_w = image.shape[:2]
|
||||
|
||||
# Calculate scale ratio to fit within target_size
|
||||
scale = min(target_size / img_h, target_size / img_w)
|
||||
new_h, new_w = int(img_h * scale), int(img_w * scale)
|
||||
|
||||
# Resize image maintaining aspect ratio
|
||||
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
|
||||
|
||||
# Create padded canvas with fill_value
|
||||
img_padded = np.full((target_size, target_size, 3), fill_value, dtype=np.uint8)
|
||||
|
||||
# Calculate padding to center the image
|
||||
pad_h = (target_size - new_h) // 2
|
||||
pad_w = (target_size - new_w) // 2
|
||||
|
||||
# Place resized image in center of canvas
|
||||
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
|
||||
|
||||
# Convert BGR to RGB and normalize to [0, 1]
|
||||
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
|
||||
img_normalized = img_rgb.astype(np.float32) / 255.0
|
||||
|
||||
# Transpose to CHW format and add batch dimension (NCHW)
|
||||
img_transposed = np.transpose(img_normalized, (2, 0, 1))
|
||||
img_batch = np.expand_dims(img_transposed, axis=0)
|
||||
img_batch = np.ascontiguousarray(img_batch)
|
||||
|
||||
return img_batch, scale, (pad_w, pad_h)
|
||||
|
||||
@@ -156,20 +156,6 @@ class GazeWeights(str, Enum):
|
||||
MOBILEONE_S0 = "gaze_mobileone_s0"
|
||||
|
||||
|
||||
class HeadPoseWeights(str, Enum):
|
||||
"""
|
||||
Head pose estimation models using 6D rotation representation.
|
||||
Trained on 300W-LP dataset, evaluated on AFLW2000.
|
||||
https://github.com/yakhyo/head-pose-estimation
|
||||
"""
|
||||
RESNET18 = "headpose_resnet18"
|
||||
RESNET34 = "headpose_resnet34"
|
||||
RESNET50 = "headpose_resnet50"
|
||||
MOBILENET_V2 = "headpose_mobilenetv2"
|
||||
MOBILENET_V3_SMALL = "headpose_mobilenetv3_small"
|
||||
MOBILENET_V3_LARGE = "headpose_mobilenetv3_large"
|
||||
|
||||
|
||||
class ParsingWeights(str, Enum):
|
||||
"""
|
||||
Face Parsing: Semantic Segmentation of Facial Components.
|
||||
@@ -362,32 +348,6 @@ MODEL_REGISTRY: dict[Enum, ModelInfo] = {
|
||||
sha256='8b4fdc4e3da44733c9a82e7776b411e4a39f94e8e285aee0fc85a548a55f7d9f'
|
||||
),
|
||||
|
||||
# Head Pose
|
||||
HeadPoseWeights.RESNET18: ModelInfo(
|
||||
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/resnet18.onnx',
|
||||
sha256='61c34e877989412980d1ea80c52391250b074abc00d19a6100de5c8e999212ee'
|
||||
),
|
||||
HeadPoseWeights.RESNET34: ModelInfo(
|
||||
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/resnet34.onnx',
|
||||
sha256='8da9f2ce4810298ebea68bd85fba1b6bd11716060c10534596f46be52cc908c9'
|
||||
),
|
||||
HeadPoseWeights.RESNET50: ModelInfo(
|
||||
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/resnet50.onnx',
|
||||
sha256='50c74d57b7663361b8ede83b0e4122546171119ef502ec55b790dbd7fc360260'
|
||||
),
|
||||
HeadPoseWeights.MOBILENET_V2: ModelInfo(
|
||||
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/mobilenetv2.onnx',
|
||||
sha256='1e902872868e483bd0e4f8f4a8ff2a4d61c2ccbca9dadf748e5479b5cc86a9e9'
|
||||
),
|
||||
HeadPoseWeights.MOBILENET_V3_SMALL: ModelInfo(
|
||||
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/mobilenetv3_small.onnx',
|
||||
sha256='e8ae4d932b3d13221638fc72e171603e020c6da28b770753f76146867f40e190'
|
||||
),
|
||||
HeadPoseWeights.MOBILENET_V3_LARGE: ModelInfo(
|
||||
url='https://github.com/yakhyo/head-pose-estimation/releases/download/weights/mobilenetv3_large.onnx',
|
||||
sha256='3a68815fa00aba41ddc4e014bf631b637caba8619df71160383f1fee8c15a3c9'
|
||||
),
|
||||
|
||||
# Parsing
|
||||
ParsingWeights.RESNET18: ModelInfo(
|
||||
url='https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Literal
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
|
||||
@@ -119,77 +119,3 @@ class BaseDetector(ABC):
|
||||
List of detected Face objects.
|
||||
"""
|
||||
return self.detect(image, **kwargs)
|
||||
|
||||
def _select_top_detections(
|
||||
self,
|
||||
detections: np.ndarray,
|
||||
landmarks: np.ndarray,
|
||||
max_num: int,
|
||||
original_shape: tuple[int, int],
|
||||
metric: Literal['default', 'max'] = 'max',
|
||||
center_weight: float = 2.0,
|
||||
) -> tuple[np.ndarray, np.ndarray]:
|
||||
"""Filter detections to keep only top max_num faces.
|
||||
|
||||
Ranks faces by area and/or distance from image center, then selects
|
||||
the top max_num detections.
|
||||
|
||||
Args:
|
||||
detections: Array of shape (N, 5) as [x1, y1, x2, y2, confidence].
|
||||
landmarks: Array of shape (N, 5, 2) for 5-point landmarks.
|
||||
max_num: Maximum number of faces to keep. If 0 or >= N, returns all.
|
||||
original_shape: Original image shape as (height, width).
|
||||
metric: Ranking metric:
|
||||
- 'max': Rank by bounding box area only.
|
||||
- 'default': Rank by area minus center distance penalty.
|
||||
center_weight: Weight for center distance penalty (only used with 'default' metric).
|
||||
|
||||
Returns:
|
||||
Filtered (detections, landmarks) tuple with at most max_num faces.
|
||||
"""
|
||||
if max_num <= 0 or detections.shape[0] <= max_num:
|
||||
return detections, landmarks
|
||||
|
||||
# Calculate bounding box areas
|
||||
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
|
||||
|
||||
# Calculate offsets from image center
|
||||
center_y, center_x = original_shape[0] // 2, original_shape[1] // 2
|
||||
offsets = np.vstack(
|
||||
[
|
||||
(detections[:, 0] + detections[:, 2]) / 2 - center_x,
|
||||
(detections[:, 1] + detections[:, 3]) / 2 - center_y,
|
||||
]
|
||||
)
|
||||
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
|
||||
|
||||
# Calculate ranking scores based on metric
|
||||
if metric == 'max':
|
||||
scores = area
|
||||
else:
|
||||
scores = area - offset_dist_squared * center_weight
|
||||
|
||||
# Select top max_num by score
|
||||
top_indices = np.argsort(scores)[::-1][:max_num]
|
||||
return detections[top_indices], landmarks[top_indices]
|
||||
|
||||
@staticmethod
|
||||
def _detections_to_faces(detections: np.ndarray, landmarks: np.ndarray) -> list[Face]:
|
||||
"""Convert detection arrays to Face objects.
|
||||
|
||||
Args:
|
||||
detections: Array of shape (N, 5) as [x1, y1, x2, y2, confidence].
|
||||
landmarks: Array of shape (N, 5, 2) for 5-point landmarks.
|
||||
|
||||
Returns:
|
||||
List of Face objects.
|
||||
"""
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
return faces
|
||||
|
||||
@@ -208,12 +208,42 @@ class RetinaFace(BaseDetector):
|
||||
# Postprocessing
|
||||
detections, landmarks = self.postprocess(outputs, resize_factor, shape=(width, height))
|
||||
|
||||
# Filter to top max_num faces if requested
|
||||
detections, landmarks = self._select_top_detections(
|
||||
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
|
||||
)
|
||||
if max_num > 0 and detections.shape[0] > max_num:
|
||||
# Calculate area of detections
|
||||
areas = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
|
||||
|
||||
return self._detections_to_faces(detections, landmarks)
|
||||
# Calculate offsets from image center
|
||||
center = (original_height // 2, original_width // 2)
|
||||
offsets = np.vstack(
|
||||
[
|
||||
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
|
||||
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
|
||||
]
|
||||
)
|
||||
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
|
||||
|
||||
# Calculate scores based on the chosen metric
|
||||
if metric == 'max':
|
||||
scores = areas
|
||||
else:
|
||||
scores = areas - offset_dist_squared * center_weight
|
||||
|
||||
# Sort by scores and select top `max_num`
|
||||
sorted_indices = np.argsort(scores)[::-1][:max_num]
|
||||
|
||||
detections = detections[sorted_indices]
|
||||
landmarks = landmarks[sorted_indices]
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
def postprocess(
|
||||
self,
|
||||
|
||||
@@ -272,9 +272,38 @@ class SCRFD(BaseDetector):
|
||||
landmarks = landmarks[order, :, :]
|
||||
landmarks = landmarks[keep, :, :].astype(np.float32)
|
||||
|
||||
# Filter to top max_num faces if requested
|
||||
detections, landmarks = self._select_top_detections(
|
||||
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
|
||||
)
|
||||
if 0 < max_num < detections.shape[0]:
|
||||
# Calculate area of detections
|
||||
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
|
||||
|
||||
return self._detections_to_faces(detections, landmarks)
|
||||
# Calculate offsets from image center
|
||||
center = (original_height // 2, original_width // 2)
|
||||
offsets = np.vstack(
|
||||
[
|
||||
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
|
||||
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
|
||||
]
|
||||
)
|
||||
|
||||
# Calculate scores based on the chosen metric
|
||||
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
|
||||
if metric == 'max':
|
||||
values = area
|
||||
else:
|
||||
values = area - offset_dist_squared * center_weight
|
||||
|
||||
# Sort by scores and select top `max_num`
|
||||
sorted_indices = np.argsort(values)[::-1][:max_num]
|
||||
detections = detections[sorted_indices]
|
||||
landmarks = landmarks[sorted_indices]
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
@@ -4,9 +4,10 @@
|
||||
|
||||
from typing import Any, Literal
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.common import letterbox_resize, non_max_suppression
|
||||
from uniface.common import non_max_suppression
|
||||
from uniface.constants import YOLOv5FaceWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
@@ -139,15 +140,45 @@ class YOLOv5Face(BaseDetector):
|
||||
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
|
||||
|
||||
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, float, tuple[int, int]]:
|
||||
"""Preprocess image using letterbox resize.
|
||||
"""
|
||||
Preprocess image for inference.
|
||||
|
||||
Args:
|
||||
image: Input image in BGR format.
|
||||
image (np.ndarray): Input image (BGR format)
|
||||
|
||||
Returns:
|
||||
Tuple of (preprocessed_tensor, scale_ratio, padding).
|
||||
Tuple[np.ndarray, float, Tuple[int, int]]: Preprocessed image, scale ratio, and padding
|
||||
"""
|
||||
return letterbox_resize(image, self.input_size)
|
||||
# Get original image shape
|
||||
img_h, img_w = image.shape[:2]
|
||||
|
||||
# Calculate scale ratio
|
||||
scale = min(self.input_size / img_h, self.input_size / img_w)
|
||||
new_h, new_w = int(img_h * scale), int(img_w * scale)
|
||||
|
||||
# Resize image
|
||||
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
|
||||
|
||||
# Create padded image
|
||||
img_padded = np.full((self.input_size, self.input_size, 3), 114, dtype=np.uint8)
|
||||
|
||||
# Calculate padding
|
||||
pad_h = (self.input_size - new_h) // 2
|
||||
pad_w = (self.input_size - new_w) // 2
|
||||
|
||||
# Place resized image in center
|
||||
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
|
||||
|
||||
# Convert to RGB and normalize
|
||||
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
|
||||
img_normalized = img_rgb.astype(np.float32) / 255.0
|
||||
|
||||
# Transpose to CHW format (HWC -> CHW) and add batch dimension
|
||||
img_transposed = np.transpose(img_normalized, (2, 0, 1))
|
||||
img_batch = np.expand_dims(img_transposed, axis=0)
|
||||
img_batch = np.ascontiguousarray(img_batch)
|
||||
|
||||
return img_batch, scale, (pad_w, pad_h)
|
||||
|
||||
def inference(self, input_tensor: np.ndarray) -> list[np.ndarray]:
|
||||
"""Perform model inference on the preprocessed image tensor.
|
||||
@@ -306,9 +337,38 @@ class YOLOv5Face(BaseDetector):
|
||||
if len(detections) == 0:
|
||||
return []
|
||||
|
||||
# Filter to top max_num faces if requested
|
||||
detections, landmarks = self._select_top_detections(
|
||||
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
|
||||
)
|
||||
if 0 < max_num < detections.shape[0]:
|
||||
# Calculate area of detections
|
||||
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
|
||||
|
||||
return self._detections_to_faces(detections, landmarks)
|
||||
# Calculate offsets from image center
|
||||
center = (original_height // 2, original_width // 2)
|
||||
offsets = np.vstack(
|
||||
[
|
||||
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
|
||||
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
|
||||
]
|
||||
)
|
||||
|
||||
# Calculate scores based on the chosen metric
|
||||
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
|
||||
if metric == 'max':
|
||||
values = area
|
||||
else:
|
||||
values = area - offset_dist_squared * center_weight
|
||||
|
||||
# Sort by scores and select top `max_num`
|
||||
sorted_indices = np.argsort(values)[::-1][:max_num]
|
||||
detections = detections[sorted_indices]
|
||||
landmarks = landmarks[sorted_indices]
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
@@ -11,9 +11,10 @@ Reference: https://github.com/yakhyo/yolov8-face-onnx-inference
|
||||
|
||||
from typing import Any, Literal
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.common import letterbox_resize, non_max_suppression
|
||||
from uniface.common import non_max_suppression
|
||||
from uniface.constants import YOLOv8FaceWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
@@ -150,15 +151,45 @@ class YOLOv8Face(BaseDetector):
|
||||
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
|
||||
|
||||
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, float, tuple[int, int]]:
|
||||
"""Preprocess image using letterbox resize.
|
||||
"""
|
||||
Preprocess image for inference (letterbox resize with center padding).
|
||||
|
||||
Args:
|
||||
image: Input image in BGR format.
|
||||
image (np.ndarray): Input image (BGR format)
|
||||
|
||||
Returns:
|
||||
Tuple of (preprocessed_tensor, scale_ratio, padding).
|
||||
Tuple[np.ndarray, float, Tuple[int, int]]: Preprocessed image, scale ratio, and padding (pad_w, pad_h)
|
||||
"""
|
||||
return letterbox_resize(image, self.input_size)
|
||||
# Get original image shape
|
||||
img_h, img_w = image.shape[:2]
|
||||
|
||||
# Calculate scale ratio
|
||||
scale = min(self.input_size / img_h, self.input_size / img_w)
|
||||
new_h, new_w = int(img_h * scale), int(img_w * scale)
|
||||
|
||||
# Resize image
|
||||
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
|
||||
|
||||
# Create padded image with gray background (114, 114, 114)
|
||||
img_padded = np.full((self.input_size, self.input_size, 3), 114, dtype=np.uint8)
|
||||
|
||||
# Calculate padding (center the image)
|
||||
pad_h = (self.input_size - new_h) // 2
|
||||
pad_w = (self.input_size - new_w) // 2
|
||||
|
||||
# Place resized image in center
|
||||
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
|
||||
|
||||
# Convert BGR to RGB and normalize
|
||||
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
|
||||
img_normalized = img_rgb.astype(np.float32) / 255.0
|
||||
|
||||
# Transpose to CHW format (HWC -> CHW) and add batch dimension
|
||||
img_transposed = np.transpose(img_normalized, (2, 0, 1))
|
||||
img_batch = np.expand_dims(img_transposed, axis=0)
|
||||
img_batch = np.ascontiguousarray(img_batch)
|
||||
|
||||
return img_batch, scale, (pad_w, pad_h)
|
||||
|
||||
def inference(self, input_tensor: np.ndarray) -> list[np.ndarray]:
|
||||
"""Perform model inference on the preprocessed image tensor.
|
||||
@@ -356,9 +387,38 @@ class YOLOv8Face(BaseDetector):
|
||||
if len(detections) == 0:
|
||||
return []
|
||||
|
||||
# Filter to top max_num faces if requested
|
||||
detections, landmarks = self._select_top_detections(
|
||||
detections, landmarks, max_num, (original_height, original_width), metric, center_weight
|
||||
)
|
||||
if 0 < max_num < detections.shape[0]:
|
||||
# Calculate area of detections
|
||||
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
|
||||
|
||||
return self._detections_to_faces(detections, landmarks)
|
||||
# Calculate offsets from image center
|
||||
center = (original_height // 2, original_width // 2)
|
||||
offsets = np.vstack(
|
||||
[
|
||||
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
|
||||
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
|
||||
]
|
||||
)
|
||||
|
||||
# Calculate scores based on the chosen metric
|
||||
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
|
||||
if metric == 'max':
|
||||
values = area
|
||||
else:
|
||||
values = area - offset_dist_squared * center_weight
|
||||
|
||||
# Sort by scores and select top `max_num`
|
||||
sorted_indices = np.argsort(values)[::-1][:max_num]
|
||||
detections = detections[sorted_indices]
|
||||
landmarks = landmarks[sorted_indices]
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
209
uniface/draw.py
209
uniface/draw.py
@@ -21,9 +21,6 @@ __all__ = [
|
||||
'draw_corner_bbox',
|
||||
'draw_detections',
|
||||
'draw_gaze',
|
||||
'draw_head_pose',
|
||||
'draw_head_pose_axis',
|
||||
'draw_head_pose_cube',
|
||||
'draw_text_label',
|
||||
'draw_tracks',
|
||||
'vis_parsing_maps',
|
||||
@@ -359,212 +356,6 @@ def draw_gaze(
|
||||
)
|
||||
|
||||
|
||||
def draw_head_pose_cube(
|
||||
image: np.ndarray,
|
||||
yaw: float,
|
||||
pitch: float,
|
||||
roll: float,
|
||||
bbox: list[int] | np.ndarray,
|
||||
size: int | None = None,
|
||||
) -> None:
|
||||
"""Draw a 3D wireframe cube representing head orientation on an image.
|
||||
|
||||
Projects a 3D cube onto the image plane based on yaw, pitch, and roll
|
||||
angles, centered on the face bounding box.
|
||||
|
||||
Modifies the image in-place.
|
||||
|
||||
Args:
|
||||
image: Input image to draw on (modified in-place).
|
||||
yaw: Yaw angle in degrees.
|
||||
pitch: Pitch angle in degrees.
|
||||
roll: Roll angle in degrees.
|
||||
bbox: Bounding box as ``[x_min, y_min, x_max, y_max]``.
|
||||
size: Cube size in pixels. If None, uses the bounding box width.
|
||||
|
||||
Example:
|
||||
>>> from uniface.draw import draw_head_pose_cube
|
||||
>>> draw_head_pose_cube(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
|
||||
"""
|
||||
x_min, y_min, x_max, y_max = map(int, bbox[:4])
|
||||
if size is None:
|
||||
size = x_max - x_min
|
||||
|
||||
h = size * 0.5
|
||||
yaw_r, pitch_r, roll_r = np.radians([-yaw, pitch, roll])
|
||||
|
||||
cx = (x_min + x_max) * 0.5
|
||||
cy = (y_min + y_max) * 0.5
|
||||
|
||||
cos_y, sin_y = np.cos(yaw_r), np.sin(yaw_r)
|
||||
cos_p, sin_p = np.cos(pitch_r), np.sin(pitch_r)
|
||||
cos_r, sin_r = np.cos(roll_r), np.sin(roll_r)
|
||||
|
||||
ex = np.array([cos_y * cos_r, cos_p * sin_r + cos_r * sin_p * sin_y])
|
||||
ey = np.array([-cos_y * sin_r, cos_p * cos_r - sin_p * sin_y * sin_r])
|
||||
ez = np.array([sin_y, -cos_y * sin_p])
|
||||
|
||||
center = np.array([cx, cy])
|
||||
|
||||
def _pt(v: np.ndarray) -> tuple[int, int]:
|
||||
return (int(v[0]), int(v[1]))
|
||||
|
||||
f0 = center + h * (-ex - ey - ez)
|
||||
f1 = center + h * (+ex - ey - ez)
|
||||
f2 = center + h * (+ex + ey - ez)
|
||||
f3 = center + h * (-ex + ey - ez)
|
||||
b0 = center + h * (-ex - ey + ez)
|
||||
b1 = center + h * (+ex - ey + ez)
|
||||
b2 = center + h * (+ex + ey + ez)
|
||||
b3 = center + h * (-ex + ey + ez)
|
||||
|
||||
red = (0, 0, 255)
|
||||
green = (0, 255, 0)
|
||||
blue = (255, 0, 0)
|
||||
|
||||
# Front face at head (red)
|
||||
cv2.line(image, _pt(f0), _pt(f1), red, 2)
|
||||
cv2.line(image, _pt(f1), _pt(f2), red, 2)
|
||||
cv2.line(image, _pt(f2), _pt(f3), red, 2)
|
||||
cv2.line(image, _pt(f3), _pt(f0), red, 2)
|
||||
|
||||
# Back face in looking direction (green)
|
||||
cv2.line(image, _pt(b0), _pt(b1), green, 2)
|
||||
cv2.line(image, _pt(b1), _pt(b2), green, 2)
|
||||
cv2.line(image, _pt(b2), _pt(b3), green, 2)
|
||||
cv2.line(image, _pt(b3), _pt(b0), green, 2)
|
||||
|
||||
# Side edges (blue)
|
||||
cv2.line(image, _pt(f0), _pt(b0), blue, 2)
|
||||
cv2.line(image, _pt(f1), _pt(b1), blue, 2)
|
||||
cv2.line(image, _pt(f2), _pt(b2), blue, 2)
|
||||
cv2.line(image, _pt(f3), _pt(b3), blue, 2)
|
||||
|
||||
|
||||
def draw_head_pose_axis(
|
||||
image: np.ndarray,
|
||||
yaw: float,
|
||||
pitch: float,
|
||||
roll: float,
|
||||
bbox: list[int] | np.ndarray,
|
||||
size_ratio: float = 0.5,
|
||||
) -> None:
|
||||
"""Draw 3D coordinate axes representing head orientation on an image.
|
||||
|
||||
Draws X (red), Y (green), and Z (blue) axes from the center of the
|
||||
bounding box, rotated according to yaw, pitch, and roll.
|
||||
|
||||
Modifies the image in-place.
|
||||
|
||||
Args:
|
||||
image: Input image to draw on (modified in-place).
|
||||
yaw: Yaw angle in degrees.
|
||||
pitch: Pitch angle in degrees.
|
||||
roll: Roll angle in degrees.
|
||||
bbox: Bounding box as ``[x_min, y_min, x_max, y_max]``.
|
||||
size_ratio: Axis length as a fraction of bbox size. Defaults to 0.5.
|
||||
|
||||
Example:
|
||||
>>> from uniface.draw import draw_head_pose_axis
|
||||
>>> draw_head_pose_axis(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
|
||||
"""
|
||||
x_min, y_min, x_max, y_max = map(int, bbox[:4])
|
||||
yaw_r, pitch_r, roll_r = np.radians([-yaw, pitch, roll])
|
||||
|
||||
tdx = int(x_min + (x_max - x_min) * 0.5)
|
||||
tdy = int(y_min + (y_max - y_min) * 0.5)
|
||||
|
||||
bbox_size = min(x_max - x_min, y_max - y_min)
|
||||
size = bbox_size * size_ratio
|
||||
|
||||
cos_yaw, sin_yaw = np.cos(yaw_r), np.sin(yaw_r)
|
||||
cos_pitch, sin_pitch = np.cos(pitch_r), np.sin(pitch_r)
|
||||
cos_roll, sin_roll = np.cos(roll_r), np.sin(roll_r)
|
||||
|
||||
# X-Axis (red)
|
||||
x1 = int(size * (cos_yaw * cos_roll) + tdx)
|
||||
y1 = int(size * (cos_pitch * sin_roll + cos_roll * sin_pitch * sin_yaw) + tdy)
|
||||
|
||||
# Y-Axis (green)
|
||||
x2 = int(size * (-cos_yaw * sin_roll) + tdx)
|
||||
y2 = int(size * (cos_pitch * cos_roll - sin_pitch * sin_yaw * sin_roll) + tdy)
|
||||
|
||||
# Z-Axis (blue)
|
||||
x3 = int(size * sin_yaw + tdx)
|
||||
y3 = int(size * (-cos_yaw * sin_pitch) + tdy)
|
||||
|
||||
cv2.line(image, (tdx, tdy), (x1, y1), (0, 0, 255), 2)
|
||||
cv2.line(image, (tdx, tdy), (x2, y2), (0, 255, 0), 2)
|
||||
cv2.line(image, (tdx, tdy), (x3, y3), (255, 0, 0), 2)
|
||||
|
||||
|
||||
def draw_head_pose(
|
||||
image: np.ndarray,
|
||||
bbox: np.ndarray | list[int],
|
||||
pitch: float,
|
||||
yaw: float,
|
||||
roll: float,
|
||||
*,
|
||||
draw_type: str = 'cube',
|
||||
draw_bbox: bool = False,
|
||||
corner_bbox: bool = True,
|
||||
draw_angles: bool = True,
|
||||
) -> None:
|
||||
"""Draw head pose visualization with optional bounding box on an image.
|
||||
|
||||
High-level convenience function that combines bounding box drawing with
|
||||
a 3D shape visualization of head orientation.
|
||||
|
||||
Modifies the image in-place.
|
||||
|
||||
Args:
|
||||
image: Input image to draw on (modified in-place).
|
||||
bbox: Face bounding box in xyxy format ``[x1, y1, x2, y2]``.
|
||||
pitch: Pitch angle in degrees (rotation around X-axis).
|
||||
yaw: Yaw angle in degrees (rotation around Y-axis).
|
||||
roll: Roll angle in degrees (rotation around Z-axis).
|
||||
draw_type: Visualization type, ``'cube'`` or ``'axis'``.
|
||||
Defaults to ``'cube'``.
|
||||
draw_bbox: Whether to draw the bounding box. Defaults to False.
|
||||
corner_bbox: Use corner-style bounding box. Defaults to True.
|
||||
draw_angles: Whether to display angle values as text. Defaults to True.
|
||||
|
||||
Example:
|
||||
>>> from uniface.headpose import HeadPose
|
||||
>>> from uniface.draw import draw_head_pose
|
||||
>>> estimator = HeadPose()
|
||||
>>> result = estimator.estimate(face_crop)
|
||||
>>> draw_head_pose(image, bbox, result.pitch, result.yaw, result.roll)
|
||||
"""
|
||||
x_min, y_min, x_max, y_max = map(int, bbox[:4])
|
||||
|
||||
line_thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
|
||||
|
||||
if draw_bbox:
|
||||
if corner_bbox:
|
||||
draw_corner_bbox(image, np.array(bbox), color=(0, 255, 0), thickness=line_thickness)
|
||||
else:
|
||||
cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), line_thickness)
|
||||
|
||||
bbox_list = [x_min, y_min, x_max, y_max]
|
||||
if draw_type == 'axis':
|
||||
draw_head_pose_axis(image, yaw, pitch, roll, bbox_list)
|
||||
else:
|
||||
draw_head_pose_cube(image, yaw, pitch, roll, bbox_list)
|
||||
|
||||
if draw_angles:
|
||||
font_scale = max(0.4, min(0.7, (y_max - y_min) / 200))
|
||||
draw_text_label(
|
||||
image,
|
||||
f'P:{pitch:.0f} Y:{yaw:.0f} R:{roll:.0f}',
|
||||
x_min,
|
||||
y_min,
|
||||
bg_color=(0, 0, 255),
|
||||
text_color=(255, 255, 255),
|
||||
font_scale=font_scale,
|
||||
)
|
||||
|
||||
|
||||
def draw_tracks(
|
||||
*,
|
||||
image: np.ndarray,
|
||||
|
||||
@@ -71,7 +71,8 @@ def estimate_norm(
|
||||
alignment[:, 0] += diff_x
|
||||
|
||||
# Compute the transformation matrix
|
||||
transform = SimilarityTransform.from_estimate(landmark, alignment)
|
||||
transform = SimilarityTransform()
|
||||
transform.estimate(landmark, alignment)
|
||||
|
||||
matrix = transform.params[0:2, :]
|
||||
inverse_matrix = np.linalg.inv(transform.params)[0:2, :]
|
||||
|
||||
@@ -1,53 +0,0 @@
|
||||
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from uniface.types import HeadPoseResult
|
||||
|
||||
from .base import BaseHeadPoseEstimator
|
||||
from .models import HeadPose
|
||||
|
||||
|
||||
def create_head_pose_estimator(method: str = 'headpose', **kwargs) -> BaseHeadPoseEstimator:
|
||||
"""
|
||||
Factory function to create head pose estimators.
|
||||
|
||||
This function initializes and returns a head pose estimator instance based on the
|
||||
specified method. It acts as a high-level interface to the underlying model classes.
|
||||
|
||||
Args:
|
||||
method (str): The head pose estimation method to use.
|
||||
Options: 'headpose' (default).
|
||||
**kwargs: Model-specific parameters passed to the estimator's constructor.
|
||||
For example, `model_name` can be used to select a specific
|
||||
backbone from `HeadPoseWeights` enum (RESNET18, RESNET34, RESNET50,
|
||||
MOBILENET_V2, MOBILENET_V3_SMALL, MOBILENET_V3_LARGE).
|
||||
|
||||
Returns:
|
||||
BaseHeadPoseEstimator: An initialized head pose estimator instance ready for use.
|
||||
|
||||
Raises:
|
||||
ValueError: If the specified `method` is not supported.
|
||||
|
||||
Examples:
|
||||
>>> # Create the default head pose estimator (ResNet18 backbone)
|
||||
>>> estimator = create_head_pose_estimator()
|
||||
|
||||
>>> # Create with MobileNetV2 backbone
|
||||
>>> from uniface.constants import HeadPoseWeights
|
||||
>>> estimator = create_head_pose_estimator('headpose', model_name=HeadPoseWeights.MOBILENET_V2)
|
||||
|
||||
>>> # Use the estimator
|
||||
>>> result = estimator.estimate(face_crop)
|
||||
>>> print(f'Pitch: {result.pitch:.1f}°, Yaw: {result.yaw:.1f}°, Roll: {result.roll:.1f}°')
|
||||
"""
|
||||
method = method.lower()
|
||||
|
||||
if method in ('headpose', 'head_pose', '6drepnet'):
|
||||
return HeadPose(**kwargs)
|
||||
else:
|
||||
available = ['headpose']
|
||||
raise ValueError(f"Unsupported head pose estimation method: '{method}'. Available: {available}")
|
||||
|
||||
|
||||
__all__ = ['BaseHeadPoseEstimator', 'HeadPose', 'HeadPoseResult', 'create_head_pose_estimator']
|
||||
@@ -1,115 +0,0 @@
|
||||
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import numpy as np
|
||||
|
||||
from uniface.types import HeadPoseResult
|
||||
|
||||
__all__ = ['BaseHeadPoseEstimator', 'HeadPoseResult']
|
||||
|
||||
|
||||
class BaseHeadPoseEstimator(ABC):
|
||||
"""
|
||||
Abstract base class for all head pose estimation models.
|
||||
|
||||
This class defines the common interface that all head pose estimators must implement,
|
||||
ensuring consistency across different head pose estimation methods. Head pose estimation
|
||||
predicts the orientation of a person's head based on their face image.
|
||||
|
||||
The head orientation is represented as Euler angles in degrees:
|
||||
- Pitch: Rotation around X-axis (positive = looking down, negative = looking up)
|
||||
- Yaw: Rotation around Y-axis (positive = looking right, negative = looking left)
|
||||
- Roll: Rotation around Z-axis (positive = tilting clockwise, negative = tilting counter-clockwise)
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the underlying model for inference.
|
||||
|
||||
This method should handle loading model weights, creating the
|
||||
inference session (e.g., ONNX Runtime), and any necessary
|
||||
setup procedures to prepare the model for prediction.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
|
||||
|
||||
@abstractmethod
|
||||
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Preprocess the input face image for model inference.
|
||||
|
||||
This method should take a raw face crop and convert it into the format
|
||||
expected by the model's inference engine (e.g., normalized tensor).
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format with
|
||||
shape (H, W, C).
|
||||
|
||||
Returns:
|
||||
np.ndarray: The preprocessed image tensor ready for inference,
|
||||
typically with shape (1, C, H, W).
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the preprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def postprocess(self, rotation_matrix: np.ndarray) -> HeadPoseResult:
|
||||
"""
|
||||
Postprocess a rotation matrix into Euler angles.
|
||||
|
||||
This method takes the raw rotation matrix output from the model's
|
||||
inference and converts it into pitch, yaw, and roll angles in degrees.
|
||||
|
||||
Args:
|
||||
rotation_matrix: Rotation matrix with shape (B, 3, 3) from the
|
||||
model inference.
|
||||
|
||||
Returns:
|
||||
HeadPoseResult: Result containing pitch, yaw, and roll in degrees.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the postprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def estimate(self, face_image: np.ndarray) -> HeadPoseResult:
|
||||
"""
|
||||
Perform end-to-end head pose estimation on a face image.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the head orientation.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format.
|
||||
The face should be roughly centered and
|
||||
well-framed within the image.
|
||||
|
||||
Returns:
|
||||
HeadPoseResult: Result containing Euler angles in degrees:
|
||||
- pitch: Rotation around X-axis (positive = down)
|
||||
- yaw: Rotation around Y-axis (positive = right)
|
||||
- roll: Rotation around Z-axis (positive = clockwise)
|
||||
|
||||
Example:
|
||||
>>> estimator = create_head_pose_estimator()
|
||||
>>> result = estimator.estimate(face_crop)
|
||||
>>> print(f'Pose: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the estimate method.')
|
||||
|
||||
def __call__(self, face_image: np.ndarray) -> HeadPoseResult:
|
||||
"""
|
||||
Provides a convenient, callable shortcut for the `estimate` method.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format.
|
||||
|
||||
Returns:
|
||||
HeadPoseResult: Result containing pitch, yaw, and roll in degrees.
|
||||
"""
|
||||
return self.estimate(face_image)
|
||||
@@ -1,178 +0,0 @@
|
||||
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.constants import HeadPoseWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
from uniface.types import HeadPoseResult
|
||||
|
||||
from .base import BaseHeadPoseEstimator
|
||||
|
||||
__all__ = ['HeadPose']
|
||||
|
||||
|
||||
class HeadPose(BaseHeadPoseEstimator):
|
||||
"""
|
||||
Head Pose Estimation with ONNX Runtime using 6D Rotation Representation.
|
||||
|
||||
This model estimates head orientation from a single face image by predicting
|
||||
a 3x3 rotation matrix (via continuous 6D representation) and converting it to
|
||||
Euler angles (pitch, yaw, roll) in degrees.
|
||||
|
||||
Supports multiple backbone architectures: ResNet-18/34/50, MobileNetV2,
|
||||
and MobileNetV3 (small/large).
|
||||
|
||||
Reference:
|
||||
https://github.com/yakhyo/head-pose-estimation
|
||||
|
||||
Args:
|
||||
model_name (HeadPoseWeights): The enum specifying the head pose model to load.
|
||||
Options: RESNET18, RESNET34, RESNET50, MOBILENET_V2, MOBILENET_V3_SMALL,
|
||||
MOBILENET_V3_LARGE. Defaults to `HeadPoseWeights.RESNET18`.
|
||||
input_size (tuple[int, int]): The resolution (width, height) for the model's
|
||||
input. Defaults to (224, 224).
|
||||
providers (list[str] | None): ONNX Runtime execution providers. If None, auto-detects
|
||||
the best available provider. Example: ['CPUExecutionProvider'] to force CPU.
|
||||
|
||||
Attributes:
|
||||
input_size (tuple[int, int]): Model input dimensions.
|
||||
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
|
||||
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
|
||||
|
||||
Example:
|
||||
>>> from uniface.headpose import HeadPose
|
||||
>>> from uniface import RetinaFace
|
||||
>>>
|
||||
>>> detector = RetinaFace()
|
||||
>>> head_pose = HeadPose()
|
||||
>>>
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face.bbox
|
||||
... x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
... face_crop = image[y1:y2, x1:x2]
|
||||
... result = head_pose.estimate(face_crop)
|
||||
... print(f'Pose: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: HeadPoseWeights = HeadPoseWeights.RESNET18,
|
||||
input_size: tuple[int, int] = (224, 224),
|
||||
providers: list[str] | None = None,
|
||||
) -> None:
|
||||
Logger.info(f'Initializing HeadPose with model={model_name}, input_size={input_size}')
|
||||
|
||||
self.input_size = input_size
|
||||
self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
|
||||
self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
|
||||
self.providers = providers
|
||||
|
||||
self.model_path = verify_model_weights(model_name)
|
||||
self._initialize_model()
|
||||
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the ONNX model from the stored model path.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
try:
|
||||
self.session = create_onnx_session(self.model_path, providers=self.providers)
|
||||
|
||||
input_cfg = self.session.get_inputs()[0]
|
||||
input_shape = input_cfg.shape
|
||||
self.input_name = input_cfg.name
|
||||
self.input_size = tuple(input_shape[2:4][::-1])
|
||||
|
||||
outputs = self.session.get_outputs()
|
||||
self.output_names = [output.name for output in outputs]
|
||||
|
||||
if len(self.output_names) != 1:
|
||||
raise ValueError(f'Expected 1 output node (rotation_matrix), got {len(self.output_names)}')
|
||||
|
||||
Logger.info(f'HeadPose initialized with input size {self.input_size}')
|
||||
|
||||
except Exception as e:
|
||||
Logger.error(f"Failed to load head pose model from '{self.model_path}'", exc_info=True)
|
||||
raise RuntimeError(f'Failed to initialize head pose model: {e}') from e
|
||||
|
||||
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Preprocess a face crop for head pose estimation.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
|
||||
"""
|
||||
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
image = cv2.resize(image, self.input_size)
|
||||
image = image.astype(np.float32) / 255.0
|
||||
image = (image - self.input_mean) / self.input_std
|
||||
|
||||
# HWC -> CHW -> NCHW
|
||||
image = np.transpose(image, (2, 0, 1))
|
||||
image = np.expand_dims(image, axis=0).astype(np.float32)
|
||||
|
||||
return image
|
||||
|
||||
@staticmethod
|
||||
def rotation_matrix_to_euler(rotation_matrix: np.ndarray) -> np.ndarray:
|
||||
"""Convert (B, 3, 3) rotation matrices to Euler angles in degrees.
|
||||
|
||||
Uses the ZYX convention to decompose rotation matrices into
|
||||
pitch (X), yaw (Y), and roll (Z) angles.
|
||||
|
||||
Args:
|
||||
rotation_matrix: Batch of rotation matrices with shape (B, 3, 3).
|
||||
|
||||
Returns:
|
||||
np.ndarray: Euler angles with shape (B, 3) as [pitch, yaw, roll] in degrees.
|
||||
"""
|
||||
R = rotation_matrix
|
||||
sy = np.sqrt(R[:, 0, 0] ** 2 + R[:, 1, 0] ** 2)
|
||||
singular = sy < 1e-6
|
||||
|
||||
x = np.where(singular, np.arctan2(-R[:, 1, 2], R[:, 1, 1]), np.arctan2(R[:, 2, 1], R[:, 2, 2]))
|
||||
y = np.arctan2(-R[:, 2, 0], sy)
|
||||
z = np.where(singular, np.zeros_like(sy), np.arctan2(R[:, 1, 0], R[:, 0, 0]))
|
||||
|
||||
return np.degrees(np.stack([x, y, z], axis=1))
|
||||
|
||||
def postprocess(self, rotation_matrix: np.ndarray) -> HeadPoseResult:
|
||||
"""
|
||||
Convert a rotation matrix into Euler angles.
|
||||
|
||||
Args:
|
||||
rotation_matrix: Rotation matrix with shape (B, 3, 3).
|
||||
|
||||
Returns:
|
||||
HeadPoseResult: Result containing pitch, yaw, and roll in degrees.
|
||||
"""
|
||||
euler = self.rotation_matrix_to_euler(rotation_matrix)
|
||||
return HeadPoseResult(
|
||||
pitch=float(euler[0, 0]),
|
||||
yaw=float(euler[0, 1]),
|
||||
roll=float(euler[0, 2]),
|
||||
)
|
||||
|
||||
def estimate(self, face_image: np.ndarray) -> HeadPoseResult:
|
||||
"""
|
||||
Perform end-to-end head pose estimation on a face image.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the head orientation.
|
||||
"""
|
||||
input_tensor = self.preprocess(face_image)
|
||||
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
|
||||
rotation_matrix = outputs[0] # (1, 3, 3)
|
||||
|
||||
return self.postprocess(rotation_matrix)
|
||||
@@ -10,23 +10,17 @@ import numpy as np
|
||||
|
||||
|
||||
class BaseFaceParser(ABC):
|
||||
"""Abstract base class for all face parsing models.
|
||||
"""
|
||||
Abstract base class for all face parsing models.
|
||||
|
||||
This class defines the common interface that all face parsing models must implement,
|
||||
ensuring consistency across different parsing methods. Face parsing segments a face
|
||||
image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
|
||||
|
||||
Subclasses must define a ``mask_type`` class attribute to indicate output format:
|
||||
|
||||
- ``"class_ids"``: uint8 mask with discrete class labels (e.g. BiSeNet: 0-18)
|
||||
- ``"probability"``: float32 mask with continuous values in [0, 1] (e.g. XSeg)
|
||||
|
||||
Attributes:
|
||||
mask_type (str): Output format identifier. Must be set by subclasses.
|
||||
The output is a segmentation mask where each pixel is assigned a class label
|
||||
representing a facial component.
|
||||
"""
|
||||
|
||||
mask_type: str
|
||||
|
||||
@abstractmethod
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
@@ -92,17 +86,13 @@ class BaseFaceParser(ABC):
|
||||
Ignored by parsers that do not need landmarks (e.g., BiSeNet).
|
||||
|
||||
Returns:
|
||||
np.ndarray: Segmentation mask with the same size as input image.
|
||||
Format depends on ``mask_type``:
|
||||
|
||||
- ``"class_ids"``: uint8 with discrete class labels
|
||||
- ``"probability"``: float32 with values in [0, 1]
|
||||
np.ndarray: Segmentation mask with the same size as input image,
|
||||
where each pixel value represents a facial component class.
|
||||
|
||||
Example:
|
||||
>>> parser = create_face_parser()
|
||||
>>> mask = parser.parse(face_crop)
|
||||
>>> print(f'Mask type: {parser.mask_type}')
|
||||
>>> print(f'Mask shape: {mask.shape}, dtype: {mask.dtype}')
|
||||
>>> print(f'Mask shape: {mask.shape}, unique classes: {np.unique(mask)}')
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the parse method.')
|
||||
|
||||
|
||||
@@ -18,7 +18,8 @@ __all__ = ['BiSeNet']
|
||||
|
||||
|
||||
class BiSeNet(BaseFaceParser):
|
||||
"""BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
|
||||
"""
|
||||
BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
|
||||
|
||||
BiSeNet is a semantic segmentation model that segments a face image into
|
||||
different facial components such as skin, eyes, nose, mouth, hair, etc. The model
|
||||
@@ -44,7 +45,6 @@ class BiSeNet(BaseFaceParser):
|
||||
input_size (Tuple[int, int]): Model input dimensions.
|
||||
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
|
||||
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
|
||||
mask_type (str): Output type identifier - "class_ids" for BiSeNet.
|
||||
|
||||
Example:
|
||||
>>> from uniface.parsing import BiSeNet
|
||||
@@ -61,11 +61,8 @@ class BiSeNet(BaseFaceParser):
|
||||
... face_crop = image[y1:y2, x1:x2]
|
||||
... mask = parser.parse(face_crop)
|
||||
... print(f'Mask shape: {mask.shape}, unique classes: {np.unique(mask)}')
|
||||
... print(f'Output type: {parser.mask_type}') # "class_ids"
|
||||
"""
|
||||
|
||||
mask_type = 'class_ids'
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: ParsingWeights = ParsingWeights.RESNET18,
|
||||
|
||||
@@ -19,9 +19,10 @@ __all__ = ['XSeg']
|
||||
|
||||
|
||||
class XSeg(BaseFaceParser):
|
||||
"""XSeg: Face Segmentation Model from DeepFaceLab with ONNX Runtime.
|
||||
"""
|
||||
XSeg: Face Segmentation Model from DeepFaceLab with ONNX Runtime.
|
||||
|
||||
XSeg outputs a soft probability mask for face regions. Unlike BiSeNet which works
|
||||
XSeg outputs a mask for face regions. Unlike BiSeNet which works
|
||||
on bbox crops, XSeg requires 5-point landmarks for face alignment. The model
|
||||
uses NHWC input format and outputs values in [0, 1] range.
|
||||
|
||||
@@ -42,7 +43,6 @@ class XSeg(BaseFaceParser):
|
||||
align_size (int): Face alignment output size.
|
||||
blur_sigma (float): Blur sigma for post-processing.
|
||||
input_size (tuple[int, int]): Model input dimensions (width, height).
|
||||
mask_type (str): Output type identifier - "probability" for XSeg.
|
||||
|
||||
Example:
|
||||
>>> from uniface.parsing import XSeg
|
||||
@@ -56,11 +56,8 @@ class XSeg(BaseFaceParser):
|
||||
... if face.landmarks is not None:
|
||||
... mask = parser.parse(image, landmarks=face.landmarks)
|
||||
... print(f'Mask shape: {mask.shape}')
|
||||
... print(f'Output type: {parser.mask_type}') # "probability"
|
||||
"""
|
||||
|
||||
mask_type = 'probability'
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: XSegWeights = XSegWeights.DEFAULT,
|
||||
|
||||
@@ -141,7 +141,7 @@ class BaseRecognizer(ABC):
|
||||
image is already aligned.
|
||||
|
||||
Returns:
|
||||
Face embedding with shape (1, 512) — raw ONNX output with batch dimension.
|
||||
Face embedding vector (typically 512-dimensional).
|
||||
"""
|
||||
# If landmarks are provided, align the face first
|
||||
if landmarks is not None:
|
||||
@@ -164,9 +164,9 @@ class BaseRecognizer(ABC):
|
||||
landmarks: Facial landmarks (5 points for alignment).
|
||||
|
||||
Returns:
|
||||
L2-normalized face embedding as a 1D vector with shape (512,).
|
||||
L2-normalized face embedding vector (typically 512-dimensional).
|
||||
"""
|
||||
embedding = self.get_embedding(image, landmarks).ravel()
|
||||
embedding = self.get_embedding(image, landmarks)
|
||||
norm = np.linalg.norm(embedding)
|
||||
return embedding / norm if norm > 0 else embedding
|
||||
|
||||
@@ -178,6 +178,6 @@ class BaseRecognizer(ABC):
|
||||
landmarks: Facial landmarks (5 points for alignment).
|
||||
|
||||
Returns:
|
||||
L2-normalized face embedding as a 1D vector with shape (512,).
|
||||
L2-normalized face embedding vector (typically 512-dimensional).
|
||||
"""
|
||||
return self.get_normalized_embedding(image, landmarks)
|
||||
|
||||
@@ -28,7 +28,6 @@ __all__ = [
|
||||
'EmotionResult',
|
||||
'Face',
|
||||
'GazeResult',
|
||||
'HeadPoseResult',
|
||||
'SpoofingResult',
|
||||
]
|
||||
|
||||
@@ -49,24 +48,6 @@ class GazeResult:
|
||||
return f'GazeResult(pitch={self.pitch:.4f}, yaw={self.yaw:.4f})'
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
class HeadPoseResult:
|
||||
"""Result of head pose estimation.
|
||||
|
||||
Attributes:
|
||||
pitch: Rotation around X-axis in degrees (positive = looking down).
|
||||
yaw: Rotation around Y-axis in degrees (positive = looking right).
|
||||
roll: Rotation around Z-axis in degrees (positive = tilting clockwise).
|
||||
"""
|
||||
|
||||
pitch: float
|
||||
yaw: float
|
||||
roll: float
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f'HeadPoseResult(pitch={self.pitch:.1f}, yaw={self.yaw:.1f}, roll={self.roll:.1f})'
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
class SpoofingResult:
|
||||
"""Result of face anti-spoofing detection.
|
||||
@@ -264,5 +245,5 @@ class Face:
|
||||
if self.emotion is not None:
|
||||
parts.append(f'emotion={self.emotion}')
|
||||
if self.embedding is not None:
|
||||
parts.append(f'embedding_dim={self.embedding.shape[-1]}')
|
||||
parts.append(f'embedding_dim={self.embedding.shape[0]}')
|
||||
return ', '.join(parts) + ')'
|
||||
|
||||
Reference in New Issue
Block a user