4 Commits

Author SHA1 Message Date
Yakhyokhuja Valikhujaev
edbab5f7bf fix: use Python 3.11 in validate job for tomllib support (#65) 2026-01-07 00:29:48 +09:00
Yakhyokhuja Valikhujaev
cd8077e460 feat: Update release to v2.2.0 (#64) 2026-01-07 00:16:29 +09:00
Yakhyokhuja Valikhujaev
452b3381a2 Update badge links in README.md (#63) 2026-01-06 23:32:36 +09:00
Yakhyokhuja Valikhujaev
07c8bd7b24 feat: Add YOLOv8 Face Detection model support (#62)
* docs: Update UniFace documentation

* feat: Add YOLOv8 face detection model
2026-01-03 19:08:41 +09:00
23 changed files with 775 additions and 207 deletions

View File

@@ -4,11 +4,9 @@ on:
push:
branches:
- main
- develop
pull_request:
branches:
- main
- develop
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
@@ -22,7 +20,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
python-version: "3.10"
- uses: pre-commit/action@v3.0.1
test:
@@ -33,8 +31,16 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.11", "3.13"]
include:
# Full Python range on Linux (fastest runner)
- os: ubuntu-latest
python-version: "3.10"
- os: ubuntu-latest
python-version: "3.13"
- os: macos-latest
python-version: "3.13"
- os: windows-latest
python-version: "3.13"
steps:
- name: Checkout code

View File

@@ -24,7 +24,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.11" # Needs 3.11+ for tomllib
- name: Get version from tag and pyproject.toml
id: get_version
@@ -54,7 +54,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.11", "3.13"]
python-version: ["3.10", "3.13"]
steps:
- name: Checkout code

View File

@@ -2,11 +2,11 @@
<div align="center">
[![PyPI](https://img.shields.io/pypi/v/uniface.svg)](https://pypi.org/project/uniface/)
[![Python](https://img.shields.io/badge/Python-3.11%2B-blue)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/uniface.svg?label=PyPI)](https://pypi.org/project/uniface/)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![CI](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
[![Downloads](https://static.pepy.tech/badge/uniface)](https://pepy.tech/project/uniface)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/uniface?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=Downloads)](https://pepy.tech/projects/uniface)
[![Docs](https://img.shields.io/badge/Docs-UniFace-blue.svg)](https://yakhyo.github.io/uniface/)
</div>
@@ -23,7 +23,7 @@
## Features
- **Face Detection** — RetinaFace, SCRFD, and YOLOv5-Face with 5-point landmarks
- **Face Detection** — RetinaFace, SCRFD, YOLOv5-Face, and YOLOv8-Face with 5-point landmarks
- **Face Recognition** — ArcFace, MobileFace, and SphereFace embeddings
- **Facial Landmarks** — 106-point landmark localization
- **Face Parsing** — BiSeNet semantic segmentation (19 classes)
@@ -107,6 +107,7 @@ for face in faces:
- [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) — RetinaFace training
- [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) — YOLOv5-Face ONNX
- [yakhyo/yolov8-face-onnx-inference](https://github.com/yakhyo/yolov8-face-onnx-inference) — YOLOv8-Face ONNX
- [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) — ArcFace, MobileFace, SphereFace
- [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) — BiSeNet face parsing
- [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) — MobileGaze training

View File

@@ -155,6 +155,8 @@ If a model fails verification, it's re-downloaded automatically.
| YOLOv5n-Face | 11 MB | ✅ |
| YOLOv5s-Face | 28 MB | ✅ |
| YOLOv5m-Face | 82 MB | ✅ |
| YOLOv8-Lite-S | 7.4 MB | ✅ |
| YOLOv8n-Face | 12 MB | ✅ |
### Recognition Models

View File

@@ -15,7 +15,7 @@ graph TB
end
subgraph Detection
DET[RetinaFace / SCRFD / YOLOv5Face]
DET[RetinaFace / SCRFD / YOLOv5Face / YOLOv8Face]
end
subgraph Analysis
@@ -97,7 +97,7 @@ def detect(self, image: np.ndarray) -> list[Face]:
```
uniface/
├── detection/ # Face detection (RetinaFace, SCRFD, YOLOv5Face)
├── detection/ # Face detection (RetinaFace, SCRFD, YOLOv5Face, YOLOv8Face)
├── recognition/ # Face recognition (AdaFace, ArcFace, MobileFace, SphereFace)
├── landmark/ # 106-point landmarks
├── attribute/ # Age, gender, emotion, race

View File

@@ -12,10 +12,10 @@ template: home.html
<p class="hero-subtitle">A lightweight, production-ready face analysis library built on ONNX Runtime</p>
[![PyPI](https://img.shields.io/pypi/v/uniface.svg)](https://pypi.org/project/uniface/)
[![Python](https://img.shields.io/badge/Python-3.11%2B-blue)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/uniface.svg?label=PyPI)](https://pypi.org/project/uniface/)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/uniface)](https://pepy.tech/project/uniface)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/uniface?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=Downloads)](https://pepy.tech/projects/uniface)
[Get Started](quickstart.md){ .md-button .md-button--primary }
[View on GitHub](https://github.com/yakhyo/uniface){ .md-button }
@@ -26,7 +26,7 @@ template: home.html
<div class="feature-card" markdown>
### :material-face-recognition: Face Detection
ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models with 5-point landmarks.
ONNX-optimized detectors (RetinaFace, SCRFD, YOLO) with 5-point landmarks.
</div>
<div class="feature-card" markdown>

View File

@@ -13,6 +13,8 @@ UniFace is released under the [MIT License](https://opensource.org/licenses/MIT)
| RetinaFace | [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) | MIT |
| SCRFD | [InsightFace](https://github.com/deepinsight/insightface) | MIT |
| YOLOv5-Face | [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) | GPL-3.0 |
| YOLOv8-Face | [yakhyo/yolov8-face-onnx-inference](https://github.com/yakhyo/yolov8-face-onnx-inference) | GPL-3.0 |
| AdaFace | [yakhyo/adaface-onnx](https://github.com/yakhyo/adaface-onnx) | MIT |
| ArcFace | [InsightFace](https://github.com/deepinsight/insightface) | MIT |
| MobileFace | [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) | MIT |
| SphereFace | [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) | MIT |

View File

@@ -1,6 +1,6 @@
# Model Zoo
Complete guide to all available models, their performance characteristics, and selection criteria.
Complete guide to all available models and their performance characteristics.
---
@@ -8,16 +8,16 @@ Complete guide to all available models, their performance characteristics, and s
### RetinaFace Family
RetinaFace models are trained on the WIDER FACE dataset and provide excellent accuracy-speed tradeoffs.
RetinaFace models are trained on the WIDER FACE dataset.
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
| -------------- | ------ | ----- | ------ | ------ | ------ | ----------------------------- |
| `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% | Mobile/Edge devices |
| `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% | Balanced mobile |
| `MNET_V2` :material-check-circle: | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% | **Default** |
| `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% | Server/High accuracy |
| `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% | Maximum accuracy |
| Model Name | Params | Size | Easy | Medium | Hard |
| -------------- | ------ | ----- | ------ | ------ | ------ |
| `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% |
| `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% |
| `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% |
| `MNET_V2` :material-check-circle: | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% |
| `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% |
| `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% |
!!! info "Accuracy & Benchmarks"
**Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) - from [RetinaFace paper](https://arxiv.org/abs/1905.00641)
@@ -28,12 +28,12 @@ RetinaFace models are trained on the WIDER FACE dataset and provide excellent ac
### SCRFD Family
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models offer state-of-the-art speed-accuracy tradeoffs.
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models trained on WIDER FACE dataset.
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
| ---------------- | ------ | ----- | ------ | ------ | ------ | ------------------------------- |
| `SCRFD_500M` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% | Real-time applications |
| `SCRFD_10G` :material-check-circle: | 4.2M | 17MB | 95.16% | 93.87% | 83.05% | **High accuracy + speed** |
| Model Name | Params | Size | Easy | Medium | Hard |
| ---------------- | ------ | ----- | ------ | ------ | ------ |
| `SCRFD_500M` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% |
| `SCRFD_10G` :material-check-circle: | 4.2M | 17MB | 95.16% | 93.87% | 83.05% |
!!! info "Accuracy & Benchmarks"
**Accuracy**: WIDER FACE validation set - from [SCRFD paper](https://arxiv.org/abs/2105.04714)
@@ -44,13 +44,13 @@ SCRFD (Sample and Computation Redistribution for Efficient Face Detection) model
### YOLOv5-Face Family
YOLOv5-Face models provide excellent detection accuracy with 5-point facial landmarks, optimized for real-time applications.
YOLOv5-Face models provide detection with 5-point facial landmarks, trained on WIDER FACE dataset.
| Model Name | Size | Easy | Medium | Hard | Use Case |
| -------------- | ---- | ------ | ------ | ------ | ------------------------------ |
| `YOLOV5N` | 11MB | 93.61% | 91.52% | 80.53% | Lightweight/Mobile |
| `YOLOV5S` :material-check-circle: | 28MB | 94.33% | 92.61% | 83.15% | **Real-time + accuracy** |
| `YOLOV5M` | 82MB | 95.30% | 93.76% | 85.28% | High accuracy |
| Model Name | Size | Easy | Medium | Hard |
| -------------- | ---- | ------ | ------ | ------ |
| `YOLOV5N` | 11MB | 93.61% | 91.52% | 80.53% |
| `YOLOV5S` :material-check-circle: | 28MB | 94.33% | 92.61% | 83.15% |
| `YOLOV5M` | 82MB | 95.30% | 93.76% | 85.28% |
!!! info "Accuracy & Benchmarks"
**Accuracy**: WIDER FACE validation set - from [YOLOv5-Face paper](https://arxiv.org/abs/2105.12931)
@@ -58,7 +58,26 @@ YOLOv5-Face models provide excellent detection accuracy with 5-point facial land
**Speed**: Benchmark on your own hardware using `python tools/detection.py --source <image> --iterations 100`
!!! note "Fixed Input Size"
All YOLOv5-Face models use a fixed input size of 640×640. Models exported to ONNX from [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face).
All YOLOv5-Face models use a fixed input size of 640×640.
---
### YOLOv8-Face Family
YOLOv8-Face models use anchor-free design with DFL (Distribution Focal Loss) for bbox regression. Provides detection with 5-point facial landmarks.
| Model Name | Size | Easy | Medium | Hard |
| ---------------- | ------ | ------ | ------ | ------ |
| `YOLOV8_LITE_S`| 7.4MB | 93.4% | 91.2% | 78.6% |
| `YOLOV8N` :material-check-circle: | 12MB | 94.6% | 92.3% | 79.6% |
!!! info "Accuracy & Benchmarks"
**Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets)
**Speed**: Benchmark on your own hardware using `python tools/detection.py --source <image> --method yolov8face`
!!! note "Fixed Input Size"
All YOLOv8-Face models use a fixed input size of 640×640.
---
@@ -66,12 +85,12 @@ YOLOv5-Face models provide excellent detection accuracy with 5-point facial land
### AdaFace
High-quality face recognition using adaptive margin based on image quality. Achieves state-of-the-art results on challenging benchmarks.
Face recognition using adaptive margin based on image quality.
| Model Name | Backbone | Dataset | Size | IJB-B TAR | IJB-C TAR | Use Case |
| ----------- | -------- | ----------- | ------ | --------- | --------- | --------------------- |
| `IR_18` :material-check-circle: | IR-18 | WebFace4M | 92 MB | 93.03% | 94.99% | **Balanced (default)** |
| `IR_101` | IR-101 | WebFace12M | 249 MB | - | 97.66% | Maximum accuracy |
| Model Name | Backbone | Dataset | Size | IJB-B TAR | IJB-C TAR |
| ----------- | -------- | ----------- | ------ | --------- | --------- |
| `IR_18` :material-check-circle: | IR-18 | WebFace4M | 92 MB | 93.03% | 94.99% |
| `IR_101` | IR-101 | WebFace12M | 249 MB | - | 97.66% |
!!! info "Training Data & Accuracy"
**Dataset**: WebFace4M (4M images) / WebFace12M (12M images)
@@ -81,55 +100,51 @@ High-quality face recognition using adaptive margin based on image quality. Achi
!!! tip "Key Innovation"
AdaFace introduces adaptive margin that adjusts based on image quality, providing better performance on low-quality images compared to fixed-margin approaches.
**Reference**: [AdaFace: Quality Adaptive Margin for Face Recognition](https://github.com/mk-minchul/AdaFace) | [ONNX Export](https://github.com/yakhyo/adaface-onnx)
---
### ArcFace
State-of-the-art face recognition using additive angular margin loss.
Face recognition using additive angular margin loss.
| Model Name | Backbone | Params | Size | Use Case |
| ----------- | --------- | ------ | ----- | -------------------------------- |
| `MNET` :material-check-circle: | MobileNet | 2.0M | 8MB | **Balanced (recommended)** |
| `RESNET` | ResNet50 | 43.6M | 166MB | Maximum accuracy |
| Model Name | Backbone | Params | Size | LFW | CFP-FP | AgeDB-30 | IJB-C |
| ----------- | --------- | ------ | ----- | ------ | ------ | -------- | ----- |
| `MNET` :material-check-circle: | MobileNet | 2.0M | 8MB | 99.70% | 98.00% | 96.58% | 95.02% |
| `RESNET` | ResNet50 | 43.6M | 166MB | 99.83% | 99.33% | 98.23% | 97.25% |
!!! info "Training Data"
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
**Dataset**: Trained on WebFace600K (600K images)
**Accuracy**: Benchmark on your own dataset or use standard face verification benchmarks
**Accuracy**: IJB-C accuracy reported as TAR@FAR=1e-4
---
### MobileFace
Lightweight face recognition optimized for mobile devices.
Lightweight face recognition models with MobileNet backbones.
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | Use Case |
| ----------------- | ---------------- | ------ | ---- | ------ | ------ | ------ | -------- | --------------------- |
| `MNET_025` | MobileNetV1 0.25 | 0.36M | 1MB | 98.76% | 92.02% | 82.37% | 90.02% | Ultra-lightweight |
| `MNET_V2` :material-check-circle: | MobileNetV2 | 2.29M | 4MB | 99.55% | 94.87% | 86.89% | 95.16% | **Mobile/Edge** |
| `MNET_V3_SMALL` | MobileNetV3-S | 1.25M | 3MB | 99.30% | 93.77% | 85.29% | 92.79% | Mobile optimized |
| `MNET_V3_LARGE` | MobileNetV3-L | 3.52M | 10MB | 99.53% | 94.56% | 86.79% | 95.13% | Balanced mobile |
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 |
| ----------------- | ---------------- | ------ | ---- | ------ | ------ | ------ | -------- |
| `MNET_025` | MobileNetV1 0.25 | 0.36M | 1MB | 98.76% | 92.02% | 82.37% | 90.02% |
| `MNET_V2` :material-check-circle: | MobileNetV2 | 2.29M | 4MB | 99.55% | 94.87% | 86.89% | 95.16% |
| `MNET_V3_SMALL` | MobileNetV3-S | 1.25M | 3MB | 99.30% | 93.77% | 85.29% | 92.79% |
| `MNET_V3_LARGE` | MobileNetV3-L | 3.52M | 10MB | 99.53% | 94.56% | 86.79% | 95.13% |
!!! info "Training Data"
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
**Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
!!! tip "Use Case"
These models are lightweight alternatives to ArcFace for resource-constrained environments.
---
### SphereFace
Face recognition using angular softmax loss.
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | Use Case |
| ------------ | -------- | ------ | ---- | ------ | ------ | ------ | -------- | ------------------- |
| `SPHERE20` | Sphere20 | 24.5M | 50MB | 99.67% | 95.61% | 88.75% | 96.58% | Research/Comparison |
| `SPHERE36` | Sphere36 | 34.6M | 92MB | 99.72% | 95.64% | 89.92% | 96.83% | Research/Comparison |
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 |
| ------------ | -------- | ------ | ---- | ------ | ------ | ------ | -------- |
| `SPHERE20` | Sphere20 | 24.5M | 50MB | 99.67% | 95.61% | 88.75% | 96.58% |
| `SPHERE36` | Sphere36 | 34.6M | 92MB | 99.72% | 95.64% | 89.92% | 96.83% |
!!! info "Training Data"
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
@@ -145,11 +160,11 @@ Face recognition using angular softmax loss.
### 106-Point Landmark Detection
High-precision facial landmark localization.
Facial landmark localization model.
| Model Name | Points | Params | Size | Use Case |
| ---------- | ------ | ------ | ---- | ------------------------ |
| `2D106` | 106 | 3.7M | 14MB | Face alignment, analysis |
| Model Name | Points | Params | Size |
| ---------- | ------ | ------ | ---- |
| `2D106` | 106 | 3.7M | 14MB |
**Landmark Groups:**
@@ -167,9 +182,9 @@ High-precision facial landmark localization.
### Age & Gender Detection
| Model Name | Attributes | Params | Size | Use Case |
| ----------- | ----------- | ------ | ---- | --------------- |
| `AgeGender` | Age, Gender | 2.1M | 8MB | General purpose |
| Model Name | Attributes | Params | Size |
| ----------- | ----------- | ------ | ---- |
| `AgeGender` | Age, Gender | 2.1M | 8MB |
!!! info "Training Data"
**Dataset**: Trained on CelebA
@@ -181,9 +196,9 @@ High-precision facial landmark localization.
### FairFace Attributes
| Model Name | Attributes | Params | Size | Use Case |
| ----------- | --------------------- | ------ | ----- | --------------------------- |
| `FairFace` | Race, Gender, Age Group | - | 44MB | Balanced demographic prediction |
| Model Name | Attributes | Params | Size |
| ----------- | --------------------- | ------ | ----- |
| `FairFace` | Race, Gender, Age Group | - | 44MB |
!!! info "Training Data"
**Dataset**: Trained on FairFace dataset with balanced demographics
@@ -199,10 +214,10 @@ High-precision facial landmark localization.
### Emotion Detection
| Model Name | Classes | Params | Size | Use Case |
| ------------- | ------- | ------ | ---- | --------------- |
| `AFFECNET7` | 7 | 0.5M | 2MB | 7-class emotion |
| `AFFECNET8` | 8 | 0.5M | 2MB | 8-class emotion |
| Model Name | Classes | Params | Size |
| ------------- | ------- | ------ | ---- |
| `AFFECNET7` | 7 | 0.5M | 2MB |
| `AFFECNET8` | 8 | 0.5M | 2MB |
**Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger
@@ -220,15 +235,15 @@ High-precision facial landmark localization.
### MobileGaze Family
Real-time gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
Gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
| Model Name | Params | Size | MAE* | Use Case |
| -------------- | ------ | ------- | ----- | ----------------------------- |
| `RESNET18` | 11.7M | 43 MB | 12.84 | Balanced accuracy/speed |
| `RESNET34` :material-check-circle: | 24.8M | 81.6 MB | 11.33 | **Default** |
| `RESNET50` | 25.6M | 91.3 MB | 11.34 | High accuracy |
| `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 | Mobile/Edge devices |
| `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 | Lightweight/Real-time |
| Model Name | Params | Size | MAE* |
| -------------- | ------ | ------- | ----- |
| `RESNET18` | 11.7M | 43 MB | 12.84 |
| `RESNET34` :material-check-circle: | 24.8M | 81.6 MB | 11.33 |
| `RESNET50` | 25.6M | 91.3 MB | 11.34 |
| `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 |
| `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 |
*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better
@@ -248,10 +263,10 @@ Real-time gaze direction prediction models trained on Gaze360 dataset. Returns p
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
| Model Name | Params | Size | Classes | Use Case |
| -------------- | ------ | ------- | ------- | ----------------------------- |
| `RESNET18` :material-check-circle: | 13.3M | 50.7 MB | 19 | **Default** |
| `RESNET34` | 24.1M | 89.2 MB | 19 | Higher accuracy |
| Model Name | Params | Size | Classes |
| -------------- | ------ | ------- | ------- |
| `RESNET18` :material-check-circle: | 13.3M | 50.7 MB | 19 |
| `RESNET34` | 24.1M | 89.2 MB | 19 |
!!! info "Training Data"
**Dataset**: Trained on CelebAMask-HQ
@@ -289,18 +304,18 @@ BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segme
### MiniFASNet Family
Lightweight face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).
Face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).
| Model Name | Size | Scale | Use Case |
| ---------- | ------ | ----- | ----------------------------- |
| `V1SE` | 1.2 MB | 4.0 | Squeeze-and-excitation variant |
| `V2` :material-check-circle: | 1.2 MB | 2.7 | **Default** |
| Model Name | Size | Scale |
| ---------- | ------ | ----- |
| `V1SE` | 1.2 MB | 4.0 |
| `V2` :material-check-circle: | 1.2 MB | 2.7 |
!!! info "Output Format"
**Output**: Returns `SpoofingResult(is_real, confidence)` where is_real: True=Real, False=Fake
!!! note "Input Requirements"
Requires face bounding box from a detector. Use with RetinaFace, SCRFD, or YOLOv5Face.
Requires face bounding box from a detector.
---
@@ -321,6 +336,8 @@ Models are automatically downloaded and cached on first use.
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
- **YOLOv5-Face Original**: [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face) - Original PyTorch implementation
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
- **YOLOv8-Face Original**: [derronqi/yolov8-face](https://github.com/derronqi/yolov8-face) - Original PyTorch implementation
- **YOLOv8-Face ONNX**: [yakhyo/yolov8-face-onnx-inference](https://github.com/yakhyo/yolov8-face-onnx-inference) - ONNX inference implementation
- **AdaFace Original**: [mk-minchul/AdaFace](https://github.com/mk-minchul/AdaFace) - Original PyTorch implementation
- **AdaFace ONNX**: [yakhyo/adaface-onnx](https://github.com/yakhyo/adaface-onnx) - ONNX export and inference
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code

View File

@@ -1,22 +1,25 @@
# Detection
Face detection is the first step in any face analysis pipeline. UniFace provides three detection models.
Face detection is the first step in any face analysis pipeline. UniFace provides four detection models.
---
## Available Models
| Model | Backbone | Size | WIDER FACE (Easy/Medium/Hard) | Best For |
|-------|----------|------|-------------------------------|----------|
| **RetinaFace** | MobileNet V2 | 3.5 MB | 91.7% / 91.0% / 86.6% | Balanced (recommended) |
| **SCRFD** | SCRFD-10G | 17 MB | 95.2% / 93.9% / 83.1% | High accuracy |
| **YOLOv5-Face** | YOLOv5s | 28 MB | 94.3% / 92.6% / 83.2% | Real-time |
| Model | Backbone | Size | Easy | Medium | Hard | Landmarks |
|-------|----------|------|------|--------|------|:---------:|
| **RetinaFace** | MobileNet V2 | 3.5 MB | 91.7% | 91.0% | 86.6% | :material-check: |
| **SCRFD** | SCRFD-10G | 17 MB | 95.2% | 93.9% | 83.1% | :material-check: |
| **YOLOv5-Face** | YOLOv5s | 28 MB | 94.3% | 92.6% | 83.2% | :material-check: |
| **YOLOv8-Face** | YOLOv8n | 12 MB | 94.6% | 92.3% | 79.6% | :material-check: |
!!! note "Dataset"
All models trained on WIDERFACE dataset.
---
## RetinaFace
The recommended detector for most use cases.
Single-shot face detector with multi-scale feature pyramid.
### Basic Usage
@@ -159,7 +162,55 @@ detector = YOLOv5Face(model_name=YOLOv5FaceWeights.YOLOV5M)
detector = YOLOv5Face(
model_name=YOLOv5FaceWeights.YOLOV5S,
confidence_threshold=0.6,
nms_threshold=0.5
nms_threshold=0.5,
nms_mode='numpy' # or 'torchvision' for faster NMS
)
```
---
## YOLOv8-Face
Anchor-free detection with DFL (Distribution Focal Loss) for accurate bbox regression.
### Basic Usage
```python
from uniface import YOLOv8Face
detector = YOLOv8Face()
faces = detector.detect(image)
```
### Model Variants
```python
from uniface import YOLOv8Face
from uniface.constants import YOLOv8FaceWeights
# Lightweight
detector = YOLOv8Face(model_name=YOLOv8FaceWeights.YOLOV8_LITE_S)
# Recommended (default)
detector = YOLOv8Face(model_name=YOLOv8FaceWeights.YOLOV8N)
```
| Variant | Size | Easy | Medium | Hard |
|---------|------|------|--------|------|
| YOLOV8_LITE_S | 7.4 MB | 93.4% | 91.2% | 78.6% |
| **YOLOV8N** :material-check-circle: | 12 MB | 94.6% | 92.3% | 79.6% |
!!! note "Fixed Input Size"
YOLOv8-Face uses a fixed input size of 640×640.
### Configuration
```python
detector = YOLOv8Face(
model_name=YOLOv8FaceWeights.YOLOV8N,
confidence_threshold=0.5,
nms_threshold=0.45,
nms_mode='numpy' # or 'torchvision' for faster NMS
)
```
@@ -177,6 +228,8 @@ detector = create_detector('retinaface')
detector = create_detector('scrfd')
# or
detector = create_detector('yolov5face')
# or
detector = create_detector('yolov8face')
```
---
@@ -188,11 +241,11 @@ One-line detection:
```python
from uniface import detect_faces
faces = detect_faces(
image,
method='retinaface',
confidence_threshold=0.5
)
# Using RetinaFace (default)
faces = detect_faces(image, method='retinaface', confidence_threshold=0.5)
# Using YOLOv8-Face
faces = detect_faces(image, method='yolov8face', confidence_threshold=0.5)
```
---

View File

@@ -6,13 +6,13 @@ Gaze estimation predicts where a person is looking (pitch and yaw angles).
## Available Models
| Model | Backbone | Size | MAE* | Best For |
|-------|----------|------|------|----------|
| ResNet18 | ResNet18 | 43 MB | 12.84° | Balanced |
| **ResNet34** :material-check-circle: | ResNet34 | 82 MB | 11.33° | Recommended |
| ResNet50 | ResNet50 | 91 MB | 11.34° | High accuracy |
| MobileNetV2 | MobileNetV2 | 9.6 MB | 13.07° | Mobile |
| MobileOne-S0 | MobileOne | 4.8 MB | 12.58° | Lightweight |
| Model | Backbone | Size | MAE* |
|-------|----------|------|------|
| ResNet18 | ResNet18 | 43 MB | 12.84° |
| **ResNet34** :material-check-circle: | ResNet34 | 82 MB | 11.33° |
| ResNet50 | ResNet50 | 91 MB | 11.34° |
| MobileNetV2 | MobileNetV2 | 9.6 MB | 13.07° |
| MobileOne-S0 | MobileOne | 4.8 MB | 12.58° |
*MAE = Mean Absolute Error on Gaze360 test set (lower is better)

View File

@@ -6,12 +6,12 @@ Facial landmark detection provides precise localization of facial features.
## Available Models
| Model | Points | Size | Use Case |
|-------|--------|------|----------|
| **Landmark106** | 106 | 14 MB | Detailed face analysis |
| Model | Points | Size |
|-------|--------|------|
| **Landmark106** | 106 | 14 MB |
!!! info "5-Point Landmarks"
Basic 5-point landmarks are included with all detection models (RetinaFace, SCRFD, YOLOv5-Face).
Basic 5-point landmarks are included with all detection models (RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face).
---

View File

@@ -6,10 +6,10 @@ Face parsing segments faces into semantic components (skin, eyes, nose, mouth, h
## Available Models
| Model | Backbone | Size | Classes | Best For |
|-------|----------|------|---------|----------|
| **BiSeNet ResNet18** :material-check-circle: | ResNet18 | 51 MB | 19 | Balanced (recommended) |
| **BiSeNet ResNet34** | ResNet34 | 89 MB | 19 | Higher accuracy |
| Model | Backbone | Size | Classes |
|-------|----------|------|---------|
| **BiSeNet ResNet18** :material-check-circle: | ResNet18 | 51 MB | 19 |
| BiSeNet ResNet34 | ResNet34 | 89 MB | 19 |
---
@@ -71,10 +71,10 @@ parser = BiSeNet()
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
```
| Variant | Params | Size | Notes |
|---------|--------|------|-------|
| **RESNET18** :material-check-circle: | 13.3M | 51 MB | Recommended |
| RESNET34 | 24.1M | 89 MB | Higher accuracy |
| Variant | Params | Size |
|---------|--------|------|
| **RESNET18** :material-check-circle: | 13.3M | 51 MB |
| RESNET34 | 24.1M | 89 MB |
---

View File

@@ -6,13 +6,13 @@ Face anonymization protects privacy by blurring or obscuring faces in images and
## Available Methods
| Method | Description | Use Case |
|--------|-------------|----------|
| **pixelate** | Blocky pixelation | News media standard |
| **gaussian** | Smooth blur | Natural appearance |
| **blackout** | Solid color fill | Maximum privacy |
| **elliptical** | Oval-shaped blur | Natural face shape |
| **median** | Edge-preserving blur | Artistic effect |
| Method | Description |
|--------|-------------|
| **pixelate** | Blocky pixelation |
| **gaussian** | Smooth blur |
| **blackout** | Solid color fill |
| **elliptical** | Oval-shaped blur |
| **median** | Edge-preserving blur |
---

View File

@@ -6,18 +6,18 @@ Face recognition extracts embeddings for identity verification and face search.
## Available Models
| Model | Backbone | Size | Embedding Dim | Best For |
|-------|----------|------|---------------|----------|
| **AdaFace** | IR-18/IR-101 | 92-249 MB | 512 | High-quality recognition |
| **ArcFace** | MobileNet/ResNet | 8-166 MB | 512 | General use (recommended) |
| **MobileFace** | MobileNet V2/V3 | 1-10 MB | 512 | Mobile/Edge |
| **SphereFace** | Sphere20/36 | 50-92 MB | 512 | Research |
| Model | Backbone | Size | Embedding Dim |
|-------|----------|------|---------------|
| **AdaFace** | IR-18/IR-101 | 92-249 MB | 512 |
| **ArcFace** | MobileNet/ResNet | 8-166 MB | 512 |
| **MobileFace** | MobileNet V2/V3 | 1-10 MB | 512 |
| **SphereFace** | Sphere20/36 | 50-92 MB | 512 |
---
## AdaFace
High-quality face recognition using adaptive margin based on image quality. AdaFace achieves state-of-the-art results on challenging benchmarks like IJB-B and IJB-C.
Face recognition using adaptive margin based on image quality.
### Basic Usage
@@ -49,10 +49,10 @@ recognizer = AdaFace(model_name=AdaFaceWeights.IR_18)
recognizer = AdaFace(model_name=AdaFaceWeights.IR_101)
```
| Variant | Dataset | Size | IJB-B | IJB-C | Use Case |
|---------|---------|------|-------|-------|----------|
| **IR_18** :material-check-circle: | WebFace4M | 92 MB | 93.03% | 94.99% | Balanced (default) |
| IR_101 | WebFace12M | 249 MB | - | 97.66% | Maximum accuracy |
| Variant | Dataset | Size | IJB-B | IJB-C |
|---------|---------|------|-------|-------|
| **IR_18** :material-check-circle: | WebFace4M | 92 MB | 93.03% | 94.99% |
| IR_101 | WebFace12M | 249 MB | - | 97.66% |
!!! info "Benchmark Metrics"
IJB-B and IJB-C accuracy reported as TAR@FAR=0.01%
@@ -61,7 +61,7 @@ recognizer = AdaFace(model_name=AdaFaceWeights.IR_101)
## ArcFace
State-of-the-art recognition using additive angular margin loss.
Face recognition using additive angular margin loss.
### Basic Usage
@@ -93,16 +93,21 @@ recognizer = ArcFace(model_name=ArcFaceWeights.MNET)
recognizer = ArcFace(model_name=ArcFaceWeights.RESNET)
```
| Variant | Backbone | Size | Use Case |
|---------|----------|------|----------|
| **MNET** :material-check-circle: | MobileNet | 8 MB | Balanced (recommended) |
| RESNET | ResNet50 | 166 MB | Maximum accuracy |
| Variant | Backbone | Size | LFW | CFP-FP | AgeDB-30 | IJB-C |
|---------|----------|------|-----|--------|----------|-------|
| **MNET** :material-check-circle: | MobileNet | 8 MB | 99.70% | 98.00% | 96.58% | 95.02% |
| RESNET | ResNet50 | 166 MB | 99.83% | 99.33% | 98.23% | 97.25% |
!!! info "Training Data & Metrics"
**Dataset**: Trained on WebFace600K (600K images)
**Accuracy**: IJB-C reported as TAR@FAR=1e-4
---
## MobileFace
Lightweight recognition for resource-constrained environments.
Lightweight face recognition models with MobileNet backbones.
### Basic Usage
@@ -129,18 +134,18 @@ recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2)
recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V3_LARGE)
```
| Variant | Params | Size | LFW | Use Case |
|---------|--------|------|-----|----------|
| MNET_025 | 0.36M | 1 MB | 98.8% | Ultra-lightweight |
| **MNET_V2** :material-check-circle: | 2.29M | 4 MB | 99.6% | Mobile/Edge |
| MNET_V3_SMALL | 1.25M | 3 MB | 99.3% | Mobile optimized |
| MNET_V3_LARGE | 3.52M | 10 MB | 99.5% | Balanced mobile |
| Variant | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 |
|---------|--------|------|-----|-------|-------|----------|
| MNET_025 | 0.36M | 1 MB | 98.76% | 92.02% | 82.37% | 90.02% |
| **MNET_V2** :material-check-circle: | 2.29M | 4 MB | 99.55% | 94.87% | 86.89% | 95.16% |
| MNET_V3_SMALL | 1.25M | 3 MB | 99.30% | 93.77% | 85.29% | 92.79% |
| MNET_V3_LARGE | 3.52M | 10 MB | 99.53% | 94.56% | 86.79% | 95.13% |
---
## SphereFace
Recognition using angular softmax loss (A-Softmax).
Face recognition using angular softmax loss (A-Softmax).
### Basic Usage
@@ -152,10 +157,10 @@ recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20)
embedding = recognizer.get_normalized_embedding(image, landmarks)
```
| Variant | Params | Size | LFW | Use Case |
|---------|--------|------|-----|----------|
| SPHERE20 | 24.5M | 50 MB | 99.7% | Research |
| SPHERE36 | 34.6M | 92 MB | 99.7% | Research |
| Variant | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 |
|---------|--------|------|-----|-------|-------|----------|
| SPHERE20 | 24.5M | 50 MB | 99.67% | 95.61% | 88.75% | 96.58% |
| SPHERE36 | 34.6M | 92 MB | 99.72% | 95.64% | 89.92% | 96.83% |
---

View File

@@ -6,10 +6,10 @@ Face anti-spoofing detects whether a face is real (live) or fake (photo, video r
## Available Models
| Model | Size | Notes |
|-------|------|-------|
| MiniFASNet V1SE | 1.2 MB | Squeeze-and-Excitation variant |
| **MiniFASNet V2** :material-check-circle: | 1.2 MB | Improved version (recommended) |
| Model | Size |
|-------|------|
| MiniFASNet V1SE | 1.2 MB |
| **MiniFASNet V2** :material-check-circle: | 1.2 MB |
---

View File

@@ -357,18 +357,18 @@ cv2.destroyAllWindows()
## Model Selection
For detailed model comparisons, benchmarks, and selection guidance, see the [Model Zoo](models.md).
For detailed model comparisons and benchmarks, see the [Model Zoo](models.md).
**Quick recommendations:**
**Available models by task:**
| Task | Recommended Model | Alternative |
|------|-------------------|-------------|
| Detection (balanced) | `RetinaFace` (MNET_V2) | `YOLOv5Face` (YOLOV5S) |
| Detection (speed) | `RetinaFace` (MNET_025) | `SCRFD` (SCRFD_500M) |
| Detection (accuracy) | `SCRFD` (SCRFD_10G) | `RetinaFace` (RESNET34) |
| Recognition | `ArcFace` (MNET) | `AdaFace` (IR_18) |
| Gaze | `MobileGaze` (RESNET34) | `MobileGaze` (MOBILEONE_S0) |
| Parsing | `BiSeNet` (RESNET18) | `BiSeNet` (RESNET34) |
| Task | Available Models |
|------|------------------|
| Detection | `RetinaFace`, `SCRFD`, `YOLOv5Face`, `YOLOv8Face` |
| Recognition | `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace` |
| Gaze | `MobileGaze` (ResNet18/34/50, MobileNetV2, MobileOneS0) |
| Parsing | `BiSeNet` (ResNet18/34) |
| Attributes | `AgeGender`, `FairFace`, `Emotion` |
| Anti-Spoofing | `MiniFASNet` (V1SE, V2) |
---

View File

@@ -1,6 +1,6 @@
[project]
name = "uniface"
version = "2.1.0"
version = "2.2.0"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
readme = "README.md"
license = "MIT"
@@ -9,7 +9,7 @@ maintainers = [
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" },
]
requires-python = ">=3.11,<3.14"
requires-python = ">=3.10,<3.14"
keywords = [
"face-detection",
"face-recognition",
@@ -33,6 +33,7 @@ classifiers = [
"Intended Audience :: Science/Research",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
@@ -71,7 +72,7 @@ uniface = ["py.typed"]
[tool.ruff]
line-length = 120
target-version = "py311"
target-version = "py310"
exclude = [
".git",
".ruff_cache",
@@ -127,15 +128,6 @@ section-order = [
[tool.ruff.lint.pydocstyle]
convention = "google"
[tool.mypy]
python_version = "3.11"
warn_return_any = false
warn_unused_ignores = true
ignore_missing_imports = true
exclude = ["tests/", "scripts/", "examples/"]
# Disable strict return type checking for numpy operations
disable_error_code = ["no-any-return"]
[tool.bandit]
exclude_dirs = ["tests", "scripts", "examples"]
skips = ["B101", "B614"] # B101: assert, B614: torch.jit.load (models are SHA256 verified)

View File

@@ -18,7 +18,7 @@ from pathlib import Path
import cv2
from uniface.detection import SCRFD, RetinaFace, YOLOv5Face
from uniface.detection import SCRFD, RetinaFace, YOLOv5Face, YOLOv8Face
from uniface.visualization import draw_detections
IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
@@ -157,7 +157,9 @@ def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):
def main():
parser = argparse.ArgumentParser(description='Run face detection')
parser.add_argument('--source', type=str, required=True, help='Image/video path or camera ID (0, 1, ...)')
parser.add_argument('--method', type=str, default='retinaface', choices=['retinaface', 'scrfd', 'yolov5face'])
parser.add_argument(
'--method', type=str, default='retinaface', choices=['retinaface', 'scrfd', 'yolov5face', 'yolov8face']
)
parser.add_argument('--threshold', type=float, default=0.25, help='Visualization threshold')
parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
args = parser.parse_args()
@@ -167,10 +169,14 @@ def main():
detector = RetinaFace()
elif args.method == 'scrfd':
detector = SCRFD()
else:
elif args.method == 'yolov5face':
from uniface.constants import YOLOv5FaceWeights
detector = YOLOv5Face(model_name=YOLOv5FaceWeights.YOLOV5M)
else: # yolov8face
from uniface.constants import YOLOv8FaceWeights
detector = YOLOv8Face(model_name=YOLOv8FaceWeights.YOLOV8N)
# Determine source type and process
source_type = get_source_type(args.source)

View File

@@ -14,7 +14,7 @@
"""UniFace: A comprehensive library for face analysis.
This library provides unified APIs for:
- Face detection (RetinaFace, SCRFD, YOLOv5Face)
- Face detection (RetinaFace, SCRFD, YOLOv5Face, YOLOv8Face)
- Face recognition (AdaFace, ArcFace, MobileFace, SphereFace)
- Facial landmarks (106-point detection)
- Face parsing (semantic segmentation)
@@ -28,7 +28,7 @@ from __future__ import annotations
__license__ = 'MIT'
__author__ = 'Yakhyokhuja Valikhujaev'
__version__ = '2.1.0'
__version__ = '2.2.0'
from uniface.face_utils import compute_similarity, face_alignment
from uniface.log import Logger, enable_logging
@@ -41,6 +41,7 @@ from .detection import (
SCRFD,
RetinaFace,
YOLOv5Face,
YOLOv8Face,
create_detector,
detect_faces,
list_available_detectors,
@@ -81,6 +82,7 @@ __all__ = [
'RetinaFace',
'SCRFD',
'YOLOv5Face',
'YOLOv8Face',
# Recognition models
'AdaFace',
'ArcFace',

View File

@@ -79,6 +79,20 @@ class YOLOv5FaceWeights(str, Enum):
YOLOV5M = "yolov5m"
class YOLOv8FaceWeights(str, Enum):
"""
YOLOv8-Face models trained on WIDER FACE dataset.
Uses anchor-free design with DFL (Distribution Focal Loss) for bbox regression.
Exported to ONNX from: https://github.com/yakhyo/yolov8-face-onnx-inference
Model Performance (WIDER FACE):
- YOLOV8_LITE_S: 7.4MB, 93.4% Easy / 91.2% Medium / 78.6% Hard (lightweight)
- YOLOV8N: 12MB, 94.6% Easy / 92.3% Medium / 79.6% Hard (recommended)
"""
YOLOV8_LITE_S = "yolov8_lite_s"
YOLOV8N = "yolov8n_face"
class DDAMFNWeights(str, Enum):
"""
Trained on AffectNet dataset.
@@ -179,6 +193,9 @@ MODEL_URLS: dict[Enum, str] = {
YOLOv5FaceWeights.YOLOV5N: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5n_face.onnx',
YOLOv5FaceWeights.YOLOV5S: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5s_face.onnx',
YOLOv5FaceWeights.YOLOV5M: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5m_face.onnx',
# YOLOv8-Face
YOLOv8FaceWeights.YOLOV8_LITE_S: 'https://github.com/yakhyo/yolov8-face-onnx-inference/releases/download/weights/yolov8-lite-s.onnx',
YOLOv8FaceWeights.YOLOV8N: 'https://github.com/yakhyo/yolov8-face-onnx-inference/releases/download/weights/yolov8n-face.onnx',
# DDAFM
DDAMFNWeights.AFFECNET7: 'https://github.com/yakhyo/uniface/releases/download/weights/affecnet7.script',
DDAMFNWeights.AFFECNET8: 'https://github.com/yakhyo/uniface/releases/download/weights/affecnet8.script',
@@ -231,6 +248,9 @@ MODEL_SHA256: dict[Enum, str] = {
YOLOv5FaceWeights.YOLOV5N: 'eb244a06e36999db732b317c2b30fa113cd6cfc1a397eaf738f2d6f33c01f640',
YOLOv5FaceWeights.YOLOV5S: 'fc682801cd5880e1e296184a14aea0035486b5146ec1a1389d2e7149cb134bb2',
YOLOv5FaceWeights.YOLOV5M: '04302ce27a15bde3e20945691b688e2dd018a10e92dd8932146bede6a49207b2',
# YOLOv8-Face
YOLOv8FaceWeights.YOLOV8_LITE_S: '11bc496be01356d2d960085bfd8abb8f103199900a034f239a8a1705a1b31dba',
YOLOv8FaceWeights.YOLOV8N: '33f3951af7fc0c4d9b321b29cdcd8c9a59d0a29a8d4bdc01fcb5507d5c714809',
# DDAFM
DDAMFNWeights.AFFECNET7: '10535bf8b6afe8e9d6ae26cea6c3add9a93036e9addb6adebfd4a972171d015d',
DDAMFNWeights.AFFECNET8: '8c66963bc71db42796a14dfcbfcd181b268b65a3fc16e87147d6a3a3d7e0f487',

View File

@@ -14,6 +14,7 @@ from .base import BaseDetector
from .retinaface import RetinaFace
from .scrfd import SCRFD
from .yolov5 import YOLOv5Face
from .yolov8 import YOLOv8Face
# Global cache for detector instances (keyed by method name + config hash)
_detector_cache: dict[str, BaseDetector] = {}
@@ -27,7 +28,7 @@ def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs: Any) -
Args:
image: Input image as numpy array with shape (H, W, C) in BGR format.
method: Detection method to use. Options: 'retinaface', 'scrfd', 'yolov5face'.
method: Detection method to use. Options: 'retinaface', 'scrfd', 'yolov5face', 'yolov8face'.
**kwargs: Additional arguments passed to the detector.
Returns:
@@ -66,6 +67,7 @@ def create_detector(method: str = 'retinaface', **kwargs: Any) -> BaseDetector:
- 'retinaface': RetinaFace detector (default)
- 'scrfd': SCRFD detector (fast and accurate)
- 'yolov5face': YOLOv5-Face detector (accurate with landmarks)
- 'yolov8face': YOLOv8-Face detector (anchor-free, accurate)
**kwargs: Detector-specific parameters.
Returns:
@@ -84,11 +86,9 @@ def create_detector(method: str = 'retinaface', **kwargs: Any) -> BaseDetector:
... 'scrfd', model_name=SCRFDWeights.SCRFD_10G_KPS, confidence_threshold=0.8, input_size=(640, 640)
... )
>>> # RetinaFace detector
>>> from uniface.constants import RetinaFaceWeights
>>> detector = create_detector(
... 'retinaface', model_name=RetinaFaceWeights.MNET_V2, confidence_threshold=0.8, nms_threshold=0.4
... )
>>> # YOLOv8-Face detector
>>> from uniface.constants import YOLOv8FaceWeights
>>> detector = create_detector('yolov8face', model_name=YOLOv8FaceWeights.YOLOV8N, confidence_threshold=0.5)
"""
method = method.lower()
@@ -101,8 +101,11 @@ def create_detector(method: str = 'retinaface', **kwargs: Any) -> BaseDetector:
elif method == 'yolov5face':
return YOLOv5Face(**kwargs)
elif method == 'yolov8face':
return YOLOv8Face(**kwargs)
else:
available_methods = ['retinaface', 'scrfd', 'yolov5face']
available_methods = ['retinaface', 'scrfd', 'yolov5face', 'yolov8face']
raise ValueError(f"Unsupported detection method: '{method}'. Available methods: {available_methods}")
@@ -147,6 +150,17 @@ def list_available_detectors() -> dict[str, dict[str, Any]]:
'input_size': 640,
},
},
'yolov8face': {
'description': 'YOLOv8-Face detector - anchor-free design with DFL for accurate detection',
'supports_landmarks': True,
'paper': 'https://github.com/derronqi/yolov8-face',
'default_params': {
'model_name': 'yolov8n_face',
'confidence_threshold': 0.5,
'nms_threshold': 0.45,
'input_size': 640,
},
},
}
@@ -155,6 +169,7 @@ __all__ = [
'BaseDetector',
'RetinaFace',
'YOLOv5Face',
'YOLOv8Face',
'create_detector',
'detect_faces',
'list_available_detectors',

View File

@@ -16,6 +16,15 @@ from uniface.types import Face
from .base import BaseDetector
# Optional torchvision import for faster NMS
try:
import torch
import torchvision
TORCHVISION_AVAILABLE = True
except ImportError:
TORCHVISION_AVAILABLE = False
__all__ = ['YOLOv5Face']
@@ -34,6 +43,8 @@ class YOLOv5Face(BaseDetector):
nms_threshold (float): Non-Maximum Suppression threshold. Defaults to 0.5.
input_size (int): Input image size. Defaults to 640.
Note: ONNX model is fixed at 640. Changing this will cause inference errors.
nms_mode (str): NMS calculation method. Options: 'torchvision' (faster, requires torch)
or 'numpy' (no dependencies). Defaults to 'torchvision' if available.
**kwargs: Advanced options:
max_det (int): Maximum number of detections to return. Defaults to 750.
@@ -42,6 +53,7 @@ class YOLOv5Face(BaseDetector):
confidence_threshold (float): Threshold used to filter low-confidence detections.
nms_threshold (float): Threshold used during NMS to suppress overlapping boxes.
input_size (int): Image size to which inputs are resized before inference.
nms_mode (str): NMS calculation method being used.
max_det (int): Maximum number of detections to return.
_model_path (str): Absolute path to the downloaded/verified model weights.
@@ -57,6 +69,7 @@ class YOLOv5Face(BaseDetector):
confidence_threshold: float = 0.6,
nms_threshold: float = 0.5,
input_size: int = 640,
nms_mode: Literal['torchvision', 'numpy'] = 'numpy',
**kwargs: Any,
) -> None:
super().__init__(
@@ -64,6 +77,7 @@ class YOLOv5Face(BaseDetector):
confidence_threshold=confidence_threshold,
nms_threshold=nms_threshold,
input_size=input_size,
nms_mode=nms_mode,
**kwargs,
)
self._supports_landmarks = True # YOLOv5-Face supports landmarks
@@ -79,12 +93,19 @@ class YOLOv5Face(BaseDetector):
self.nms_threshold = nms_threshold
self.input_size = input_size
# Set NMS mode with automatic fallback
if nms_mode == 'torchvision' and not TORCHVISION_AVAILABLE:
Logger.warning('torchvision not available, falling back to numpy NMS')
self.nms_mode = 'numpy'
else:
self.nms_mode = nms_mode
# Advanced options from kwargs
self.max_det = kwargs.get('max_det', 750)
Logger.info(
f'Initializing YOLOv5Face with model={self.model_name}, confidence_threshold={self.confidence_threshold}, '
f'nms_threshold={self.nms_threshold}, input_size={self.input_size}'
f'nms_threshold={self.nms_threshold}, input_size={self.input_size}, nms_mode={self.nms_mode}'
)
# Get path to model weights
@@ -205,9 +226,16 @@ class YOLOv5Face(BaseDetector):
# Get landmarks (5 points, 10 coordinates)
landmarks = predictions[:, 5:15].copy()
# Apply NMS
detections_for_nms = np.hstack((boxes, scores[:, None])).astype(np.float32, copy=False)
keep = non_max_suppression(detections_for_nms, self.nms_threshold)
# Apply NMS based on selected mode
if self.nms_mode == 'torchvision':
keep = torchvision.ops.nms(
torch.tensor(boxes, dtype=torch.float32),
torch.tensor(scores, dtype=torch.float32),
self.nms_threshold,
).numpy()
else:
detections = np.hstack((boxes, scores[:, None])).astype(np.float32, copy=False)
keep = non_max_suppression(detections, self.nms_threshold)
if len(keep) == 0:
return np.array([]), np.array([])

419
uniface/detection/yolov8.py Normal file
View File

@@ -0,0 +1,419 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""
YOLOv8-Face detector implementation.
Uses anchor-free design with DFL (Distribution Focal Loss) for bbox regression.
Reference: https://github.com/yakhyo/yolov8-face-onnx-inference
"""
from typing import Any, Literal
import cv2
import numpy as np
from uniface.common import non_max_suppression
from uniface.constants import YOLOv8FaceWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from uniface.types import Face
from .base import BaseDetector
# Optional torchvision import for faster NMS
try:
import torch
import torchvision
TORCHVISION_AVAILABLE = True
except ImportError:
TORCHVISION_AVAILABLE = False
__all__ = ['YOLOv8Face']
class YOLOv8Face(BaseDetector):
"""
Face detector based on the YOLOv8-Face architecture.
Uses anchor-free design with DFL (Distribution Focal Loss) for bbox regression.
Outputs 3 feature maps at different scales for multi-scale detection.
Reference: https://github.com/yakhyo/yolov8-face-onnx-inference
Args:
model_name (YOLOv8FaceWeights): Predefined model enum (e.g., `YOLOV8N`).
Specifies the YOLOv8-Face variant to load. Defaults to YOLOV8N.
confidence_threshold (float): Confidence threshold for filtering detections. Defaults to 0.5.
nms_threshold (float): Non-Maximum Suppression threshold. Defaults to 0.45.
input_size (int): Input image size. Defaults to 640.
Note: ONNX model is fixed at 640. Changing this will cause inference errors.
nms_mode (str): NMS calculation method. Options: 'torchvision' (faster, requires torch)
or 'numpy' (no dependencies). Defaults to 'torchvision' if available.
**kwargs: Advanced options:
max_det (int): Maximum number of detections to return. Defaults to 750.
Attributes:
model_name (YOLOv8FaceWeights): Selected model variant.
confidence_threshold (float): Threshold used to filter low-confidence detections.
nms_threshold (float): Threshold used during NMS to suppress overlapping boxes.
input_size (int): Image size to which inputs are resized before inference.
nms_mode (str): NMS calculation method being used.
max_det (int): Maximum number of detections to return.
_model_path (str): Absolute path to the downloaded/verified model weights.
Raises:
ValueError: If the model weights are invalid or not found.
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(
self,
*,
model_name: YOLOv8FaceWeights = YOLOv8FaceWeights.YOLOV8N,
confidence_threshold: float = 0.5,
nms_threshold: float = 0.45,
input_size: int = 640,
nms_mode: Literal['torchvision', 'numpy'] = 'numpy',
**kwargs: Any,
) -> None:
super().__init__(
model_name=model_name,
confidence_threshold=confidence_threshold,
nms_threshold=nms_threshold,
input_size=input_size,
nms_mode=nms_mode,
**kwargs,
)
self._supports_landmarks = True # YOLOv8-Face supports landmarks
# Validate input size
if input_size != 640:
raise ValueError(
f'YOLOv8Face only supports input_size=640 (got {input_size}). The ONNX model has a fixed input shape.'
)
self.model_name = model_name
self.confidence_threshold = confidence_threshold
self.nms_threshold = nms_threshold
self.input_size = input_size
# Set NMS mode with automatic fallback
if nms_mode == 'torchvision' and not TORCHVISION_AVAILABLE:
Logger.warning('torchvision not available, falling back to numpy NMS')
self.nms_mode = 'numpy'
else:
self.nms_mode = nms_mode
# Advanced options from kwargs
self.max_det = kwargs.get('max_det', 750)
# YOLOv8 strides for 640x640 input (3 feature maps: 80x80, 40x40, 20x20)
self.strides = [8, 16, 32]
Logger.info(
f'Initializing YOLOv8Face with model={self.model_name}, confidence_threshold={self.confidence_threshold}, '
f'nms_threshold={self.nms_threshold}, input_size={self.input_size}, nms_mode={self.nms_mode}'
)
# Get path to model weights
self._model_path = verify_model_weights(self.model_name)
Logger.info(f'Verified model weights located at: {self._model_path}')
# Initialize model
self._initialize_model(self._model_path)
def _initialize_model(self, model_path: str) -> None:
"""
Initializes an ONNX model session from the given path.
Args:
model_path (str): The file path to the ONNX model.
Raises:
RuntimeError: If the model fails to load, logs an error and raises an exception.
"""
try:
self.session = create_onnx_session(model_path)
self.input_names = self.session.get_inputs()[0].name
self.output_names = [x.name for x in self.session.get_outputs()]
Logger.info(f'Successfully initialized the model from {model_path}')
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, float, tuple[int, int]]:
"""
Preprocess image for inference (letterbox resize with center padding).
Args:
image (np.ndarray): Input image (BGR format)
Returns:
Tuple[np.ndarray, float, Tuple[int, int]]: Preprocessed image, scale ratio, and padding (pad_w, pad_h)
"""
# Get original image shape
img_h, img_w = image.shape[:2]
# Calculate scale ratio
scale = min(self.input_size / img_h, self.input_size / img_w)
new_h, new_w = int(img_h * scale), int(img_w * scale)
# Resize image
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
# Create padded image with gray background (114, 114, 114)
img_padded = np.full((self.input_size, self.input_size, 3), 114, dtype=np.uint8)
# Calculate padding (center the image)
pad_h = (self.input_size - new_h) // 2
pad_w = (self.input_size - new_w) // 2
# Place resized image in center
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
# Convert BGR to RGB and normalize
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
img_normalized = img_rgb.astype(np.float32) / 255.0
# Transpose to CHW format (HWC -> CHW) and add batch dimension
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
img_batch = np.ascontiguousarray(img_batch)
return img_batch, scale, (pad_w, pad_h)
def inference(self, input_tensor: np.ndarray) -> list[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
Args:
input_tensor (np.ndarray): Preprocessed input tensor.
Returns:
List[np.ndarray]: Raw model outputs (3 feature maps).
"""
return self.session.run(self.output_names, {self.input_names: input_tensor})
@staticmethod
def _softmax(x: np.ndarray, axis: int = -1) -> np.ndarray:
"""Compute softmax values for array x along specified axis."""
exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
return exp_x / np.sum(exp_x, axis=axis, keepdims=True)
def postprocess(
self,
predictions: list[np.ndarray],
scale: float,
padding: tuple[int, int],
original_shape: tuple[int, int],
) -> tuple[np.ndarray, np.ndarray]:
"""
Postprocess model predictions with DFL decoding and coordinate scaling.
Args:
predictions (List[np.ndarray]): Raw model outputs (3 feature maps)
scale (float): Scale ratio used in preprocessing
padding (Tuple[int, int]): Padding (pad_w, pad_h) used in preprocessing
original_shape (Tuple[int, int]): Original image shape (height, width)
Returns:
Tuple[np.ndarray, np.ndarray]: Filtered detections and landmarks
- detections: [N, 5] as [x1, y1, x2, y2, conf]
- landmarks: [N, 5, 2] for each detection
"""
# YOLOv8-Face outputs 3 feature maps with Pose head
# Each output: (1, 80, H, W) where 80 = 64 (bbox DFL) + 1 (class) + 15 (5 keypoints * 3)
boxes_list = []
scores_list = []
landmarks_list = []
for pred, stride in zip(predictions, self.strides, strict=False):
# pred shape: (1, 80, H, W)
batch_size, channels, height, width = pred.shape
# Reshape: (1, 80, H, W) -> (1, 80, H*W) -> (1, H*W, 80) -> (H*W, 80)
pred = pred.reshape(batch_size, channels, -1).transpose(0, 2, 1)[0]
# Create grid with 0.5 offset (matching PyTorch's make_anchors)
grid_y, grid_x = np.meshgrid(np.arange(height) + 0.5, np.arange(width) + 0.5, indexing='ij')
grid_x = grid_x.flatten()
grid_y = grid_y.flatten()
# Extract components
bbox_pred = pred[:, :64] # DFL bbox prediction (64 channels = 4 * 16)
cls_conf = pred[:, 64] # Class confidence (1 channel)
kpt_pred = pred[:, 65:] # Keypoints (15 channels = 5 points * 3: x, y, visibility)
# Decode bounding boxes from DFL
bbox_pred = bbox_pred.reshape(-1, 4, 16)
bbox_dist = self._softmax(bbox_pred, axis=-1) @ np.arange(16)
# Convert distances to xyxy format
x1 = (grid_x - bbox_dist[:, 0]) * stride
y1 = (grid_y - bbox_dist[:, 1]) * stride
x2 = (grid_x + bbox_dist[:, 2]) * stride
y2 = (grid_y + bbox_dist[:, 3]) * stride
boxes = np.stack([x1, y1, x2, y2], axis=-1)
# Decode keypoints: kpt = (kpt * 2.0 + grid) * stride
kpt_grid_y, kpt_grid_x = np.meshgrid(np.arange(height), np.arange(width), indexing='ij')
kpt_grid_x = kpt_grid_x.flatten()
kpt_grid_y = kpt_grid_y.flatten()
kpt_pred = kpt_pred.reshape(-1, 5, 3) # 5 points * (x, y, visibility)
kpt_x = (kpt_pred[:, :, 0] * 2.0 + kpt_grid_x[:, None]) * stride
kpt_y = (kpt_pred[:, :, 1] * 2.0 + kpt_grid_y[:, None]) * stride
# Ignore visibility (kpt_pred[:, :, 2]) for uniface compatibility
landmarks = np.stack([kpt_x, kpt_y], axis=-1).reshape(-1, 10)
# Apply sigmoid to class confidence
scores = 1 / (1 + np.exp(-cls_conf))
boxes_list.append(boxes)
scores_list.append(scores)
landmarks_list.append(landmarks)
# Concatenate all predictions from all feature maps
boxes = np.concatenate(boxes_list, axis=0)
scores = np.concatenate(scores_list, axis=0)
landmarks = np.concatenate(landmarks_list, axis=0)
# Filter by confidence threshold
mask = scores >= self.confidence_threshold
boxes = boxes[mask]
scores = scores[mask]
landmarks = landmarks[mask]
if len(boxes) == 0:
return np.array([]), np.array([])
# Apply NMS based on selected mode
if self.nms_mode == 'torchvision':
keep = torchvision.ops.nms(
torch.tensor(boxes, dtype=torch.float32),
torch.tensor(scores, dtype=torch.float32),
self.nms_threshold,
).numpy()
else:
detections_for_nms = np.hstack((boxes, scores[:, None])).astype(np.float32, copy=False)
keep = non_max_suppression(detections_for_nms, self.nms_threshold)
if len(keep) == 0:
return np.array([]), np.array([])
# Limit to max_det
keep = keep[: self.max_det]
boxes = boxes[keep]
scores = scores[keep]
landmarks = landmarks[keep]
# === SCALE TO ORIGINAL IMAGE COORDINATES ===
pad_w, pad_h = padding
# Scale boxes back to original image coordinates
boxes[:, [0, 2]] = (boxes[:, [0, 2]] - pad_w) / scale
boxes[:, [1, 3]] = (boxes[:, [1, 3]] - pad_h) / scale
# Clip boxes to image boundaries
boxes[:, [0, 2]] = np.clip(boxes[:, [0, 2]], 0, original_shape[1])
boxes[:, [1, 3]] = np.clip(boxes[:, [1, 3]], 0, original_shape[0])
# Scale landmarks back to original image coordinates
landmarks[:, 0::2] = (landmarks[:, 0::2] - pad_w) / scale # x coordinates
landmarks[:, 1::2] = (landmarks[:, 1::2] - pad_h) / scale # y coordinates
# Reshape landmarks to (N, 5, 2)
landmarks = landmarks.reshape(-1, 5, 2)
# Combine box and score
detections = np.concatenate([boxes, scores[:, None]], axis=1)
return detections, landmarks
def detect(
self,
image: np.ndarray,
*,
max_num: int = 0,
metric: Literal['default', 'max'] = 'max',
center_weight: float = 2.0,
) -> list[Face]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
Args:
image (np.ndarray): Input image as a NumPy array of shape (H, W, C) in BGR format.
max_num (int): Maximum number of detections to return. Use 0 to return all detections. Defaults to 0.
metric (Literal["default", "max"]): Metric for ranking detections when `max_num` is limited.
- "default": Prioritize detections closer to the image center.
- "max": Prioritize detections with larger bounding box areas.
center_weight (float): Weight for penalizing detections farther from the image center
when using the "default" metric. Defaults to 2.0.
Returns:
List[Face]: List of Face objects, each containing:
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
- confidence (float): Detection confidence score (0.0 to 1.0)
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
Example:
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face.bbox # np.ndarray with shape (4,)
... confidence = face.confidence # float
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
"""
original_height, original_width = image.shape[:2]
# Preprocess
image_tensor, scale, padding = self.preprocess(image)
# ONNXRuntime inference
outputs = self.inference(image_tensor)
# Postprocess with original image shape for clipping
detections, landmarks = self.postprocess(outputs, scale, padding, (original_height, original_width))
# Handle case when no faces are detected
if len(detections) == 0:
return []
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == 'max':
values = area
else:
values = area - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces