61 Commits

Author SHA1 Message Date
yakhyo
df673c4a3f remove release guide 2025-11-11 21:38:57 +09:00
yakhyo
496de7a491 Release v1.0.0: First stable release with complete feature set
- Added high-level pipeline API (FacePipeline, process_faces, compare_faces)
- Implemented face detection (RetinaFace, SCRFD)
- Implemented face recognition (ArcFace, MobileFace, SphereFace)
- Added landmark detection (106 points)
- Added attribute analysis (age, gender, emotion)
- Comprehensive examples and documentation
- Cleaned up legacy code and simplified documentation structure
2025-11-11 21:38:57 +09:00
yakhyo
89a05e4689 fix ci badge reference 2025-11-08 01:46:28 +09:00
yakhyo
d3c2d959d0 add permissions for github release creation 2025-11-08 01:43:47 +09:00
Yakhyokhuja Valikhujaev
f3d28d5ef5 Merge pull request #17 from yakhyo/develop
merge: Merge develop into main
2025-11-08 01:39:23 +09:00
yakhyo
c9f8215e28 use automatic package discovery for submodules 2025-11-08 01:36:05 +09:00
yakhyo
0aea17d14d make emotion module optional to avoid pytorch dependency 2025-11-08 01:33:42 +09:00
yakhyo
3cf13f70d4 fix package configuration to include all submodules 2025-11-08 01:24:45 +09:00
yakhyo
666438909d improve logging system with verbose flag
- silent by default (only warnings/errors)
- add --verbose flag to all scripts
- add enable_logging() function for library users
- cleaner output for end users
2025-11-08 01:15:25 +09:00
yakhyo
77f14a616a add apple silicon support and update documentation
- add dynamic onnx provider selection for m1/m2/m3/m4 macs
- replace mkdocs with simple markdown files
- fix model download and scrfd detection issues
- update ci/cd workflows
2025-11-08 01:02:14 +09:00
yakhyo
98f8acc51b ref: Update attribute and landmark modules 2025-11-07 23:58:47 +09:00
yakhyo
30a177981d docs: Update documentation 2025-11-07 23:58:47 +09:00
yakhyo
0417f7531f feat: Update recognition, landmark modules 2025-11-07 23:58:47 +09:00
yakhyo
b15504dfc5 feat: Face detection module has been updated 2025-11-07 23:58:47 +09:00
yakhyo
fb29a919b1 ref: Update some modules and remove redundant parts 2025-11-07 23:58:47 +09:00
yakhyo
b35b1a3f7c ref: Several minor updates, does not affect performance 2025-11-07 23:58:47 +09:00
yakhyo
5bd6bb1673 Landmark model add 2025-11-07 23:58:47 +09:00
yakhyo
cf5d06729d Initial code for facial landmark model 2025-11-07 23:58:47 +09:00
yakhyo
29964df259 feat: Age and gender inference code updated 2025-11-07 23:58:47 +09:00
yakhyo
eef4a0624a feat: Use face alignment for emotion detection 2025-11-07 23:58:47 +09:00
yakhyo
6a7ba6fc0a feat: Add face emotion model 2025-11-07 23:58:47 +09:00
yakhyo
282737e0e9 fix: Typing error 2025-11-07 23:58:47 +09:00
yakhyo
597c86f997 feat: Fix and test face recognition 2025-11-07 23:58:47 +09:00
yakhyo
d0446827e9 feat: Add new models 2025-11-07 23:58:47 +09:00
yakhyo
5f88345830 chore: Some minor updates 2025-11-07 23:58:47 +09:00
yakhyo
ab56589f77 chore: Some minor changes 2025-11-07 23:58:47 +09:00
yakhyo
d05e609ddf chore: Some minor changes does not affect functionality 2025-11-07 23:58:47 +09:00
yakhyo
cee2b692ad ref: Some minor changes and additional models 2025-11-07 23:58:47 +09:00
yakhyo
67bb13c082 feat: Add support for different mean/std 2025-11-07 23:58:47 +09:00
yakhyo
f9b4ea492b docs: Create mkdocs 2025-11-07 23:58:47 +09:00
yakhyo
08f79e7d47 chore: Update logo images 2025-11-07 23:57:19 +09:00
yakhyo
3a0b0e21b1 chore: Add logo images 2025-11-07 23:57:19 +09:00
yakhyo
85cf413cb8 feat: Add face attribute detection model 2025-11-07 23:57:19 +09:00
yakhyo
d1830c7058 feat: Add new face recognition models and update existing code 2025-11-07 23:57:19 +09:00
yakhyo
204b1d75e1 feat: Add face recognition, rename and modify some files 2025-11-07 23:57:19 +09:00
Yakhyokhuja Valikhujaev
777333eb2d Merge pull request #16 from yakhyo/fix/install-issue
fix: Fix installation issue with onnxruntime-gpu version
2025-08-30 22:56:16 +09:00
yakhyo
c2d52e305a fix: Fix github actions 2025-08-30 22:50:28 +09:00
yakhyo
cd3ff79c2e fix: Fix installation issue with onnxruntime-gpu version 2025-08-30 22:43:26 +09:00
yakhyo
fbca77e050 Merge pull request #8 from yakhyo/develop
feat: Change to pyproject and model name signature
2025-03-26 11:59:29 +09:00
yakhyo
8dd7f3f101 feat: Add model name signature and several more updates 2025-03-26 11:55:56 +09:00
Yakhyokhuja Valikhujaev
0ae5714f99 Merge pull request #6 from yakhyo/np-adapt
Remove torch dependency and adapt numpy
2025-03-16 17:53:56 +09:00
yakhyo
64c9c2f452 chore: Update README and remove scripts 2025-03-16 17:52:41 +09:00
yakhyo
31f97da783 docs: Update README.md 2025-03-16 14:37:55 +09:00
yakhyo
d586cffb3a feat: Change setup.py to pyproject.toml and remove torch dependency 2025-03-16 14:36:35 +09:00
Yakhyokhuja Valikhujaev
4256407044 Merge pull request #5 from yakhyo/feat-inv
add inverse matrix for face alignment to rotate back
2025-03-13 23:36:57 +09:00
yakhyo
e54607292f Add latest version of python in ci 2025-03-13 23:34:36 +09:00
yakhyo
56ac8af432 chore: Just variable name change 2025-03-13 23:32:02 +09:00
yakhyo
904ba2be83 chore: Add license and remove version.py [skip ci] 2025-01-09 05:29:01 +00:00
yakhyo
ad661da2f3 feat: Update alignment and make transform inverse accessible through alignment function [skip ci] 2025-01-09 05:17:28 +00:00
Yakhyokhuja Valikhujaev
084b1132ad Update README.md [skip ci] 2024-11-28 18:19:27 +09:00
Yakhyokhuja Valikhujaev
f22e8f01fb Merge pull request #4 from yakhyo/feat
feat: Update face alignment and bump version to v0.1.5
2024-11-23 19:26:35 +09:00
yakhyo
da09d7497d docs: Update README.md and add type annotation [skip ci] 2024-11-23 10:25:09 +00:00
yakhyo
7330b4fd6e chore: Add repo badge [skip ci] 2024-11-21 09:38:20 +00:00
yakhyo
fa179c6a7a feat: Update face alignment following insightface style 2024-11-21 09:28:07 +00:00
Yakhyokhuja Valikhujaev
9acc6e344c Merge pull request #3 from yakhyo/feat
chore: Bump to version 0.1.4
2024-11-21 15:19:42 +09:00
yakhyo
9819520d76 chore: Bump to version 0.1.4 2024-11-21 06:17:00 +00:00
Yakhyokhuja Valikhujaev
db544d1a29 Merge pull request #2 from yakhyo/feat
feat: Add facial alignment and face detection examples
2024-11-21 15:02:55 +09:00
yakhyo
12ccac11b7 feat: Add usage examples for face detection and face alignment 2024-11-21 05:55:55 +00:00
yakhyo
a158e47f52 feat: Bump to version v0.1.3, face alignment feature added 2024-11-21 02:34:01 +00:00
Yakhyokhuja Valikhujaev
5b4148f824 Update README.md [skip ci] 2024-11-20 18:07:02 +09:00
yakhyo
6a69739e8e chore: Bump version to v0.1.2 2024-11-20 08:56:46 +00:00
58 changed files with 5555 additions and 816 deletions

BIN
.github/logos/logo_preview.jpg vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 826 KiB

BIN
.github/logos/logo_readme.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 563 KiB

BIN
.github/logos/logo_web.webp vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

View File

@@ -1,64 +0,0 @@
name: Build, Test, and Publish
on:
push:
branches:
- main
tags:
- "v*.*.*" # Trigger publish on version tags
pull_request:
branches:
- main
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[dev] || pip install pytest # Use extras_require if available
- name: Run Tests
run: |
pytest
publish:
runs-on: ubuntu-latest
needs: build # Publish only if tests pass
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10" # Use a single Python version for publishing
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install build twine
- name: Build Package
run: python -m build
- name: Publish to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: twine upload dist/*

87
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,87 @@
name: CI
on:
push:
branches:
- main
- develop
pull_request:
branches:
- main
- develop
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[dev]
- name: Check ONNX Runtime providers
run: |
python -c "import onnxruntime as ort; print('Available providers:', ort.get_available_providers())"
- name: Lint with ruff (if available)
run: |
pip install ruff || true
ruff check . --exit-zero || true
continue-on-error: true
- name: Run tests
run: pytest -v --tb=short
- name: Test package imports
run: |
python -c "from uniface import RetinaFace, ArcFace, Landmark106, AgeGender; print('All imports successful')"
build:
runs-on: ubuntu-latest
needs: test
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: 'pip'
- name: Install build tools
run: |
python -m pip install --upgrade pip
python -m pip install build
- name: Build package
run: python -m build
- name: Check package
run: |
python -m pip install twine
twine check dist/*
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: dist-python-${{ github.sha }}
path: dist/
retention-days: 7

108
.github/workflows/publish.yml vendored Normal file
View File

@@ -0,0 +1,108 @@
name: Publish to PyPI
on:
push:
tags:
- "v*.*.*" # Trigger only on version tags like v0.1.9
jobs:
validate:
runs-on: ubuntu-latest
outputs:
version: ${{ steps.get_version.outputs.version }}
tag_version: ${{ steps.get_version.outputs.tag_version }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Get version from tag and pyproject.toml
id: get_version
run: |
TAG_VERSION=${GITHUB_REF#refs/tags/v}
echo "tag_version=$TAG_VERSION" >> $GITHUB_OUTPUT
PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' pyproject.toml)
echo "version=$PYPROJECT_VERSION" >> $GITHUB_OUTPUT
echo "Tag version: v$TAG_VERSION"
echo "pyproject.toml version: $PYPROJECT_VERSION"
- name: Verify version match
run: |
if [ "${{ steps.get_version.outputs.tag_version }}" != "${{ steps.get_version.outputs.version }}" ]; then
echo "Error: Tag version (${{ steps.get_version.outputs.tag_version }}) does not match pyproject.toml version (${{ steps.get_version.outputs.version }})"
exit 1
fi
echo "Version validation passed: ${{ steps.get_version.outputs.version }}"
test:
runs-on: ubuntu-latest
needs: validate
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[dev]
- name: Run tests
run: pytest -v
publish:
runs-on: ubuntu-latest
needs: [validate, test]
permissions:
contents: write
id-token: write
environment:
name: pypi
url: https://pypi.org/project/uniface/
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: 'pip'
- name: Install build tools
run: |
python -m pip install --upgrade pip
python -m pip install build twine
- name: Build package
run: python -m build
- name: Check package
run: twine check dist/*
- name: Publish to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: twine upload dist/*
- name: Create GitHub Release
uses: softprops/action-gh-release@v1
with:
files: dist/*
generate_release_notes: true

2
.gitignore vendored
View File

@@ -1,3 +1,5 @@
tmp_*
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

395
MODELS.md Normal file
View File

@@ -0,0 +1,395 @@
# UniFace Model Zoo
Complete guide to all available models, their performance characteristics, and selection criteria.
---
## Face Detection Models
### RetinaFace Family
RetinaFace models are trained on the WIDER FACE dataset and provide excellent accuracy-speed tradeoffs.
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
|---------------------|--------|--------|--------|--------|--------|----------------------------|
| `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% | Mobile/Edge devices |
| `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% | Balanced mobile |
| `MNET_V2` ⭐ | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% | **Recommended default** |
| `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% | Server/High accuracy |
| `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% | Maximum accuracy |
**Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) - from [RetinaFace paper](https://arxiv.org/abs/1905.00641)
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
#### Usage
```python
from uniface import RetinaFace
from uniface.constants import RetinaFaceWeights
# Default (recommended)
detector = RetinaFace() # Uses MNET_V2
# Specific model
detector = RetinaFace(
model_name=RetinaFaceWeights.MNET_025, # Fastest
conf_thresh=0.5,
nms_thresh=0.4,
input_size=(640, 640)
)
```
---
### SCRFD Family
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models offer state-of-the-art speed-accuracy tradeoffs.
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
|-----------------|--------|-------|--------|--------|--------|----------------------------|
| `SCRFD_500M` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% | Real-time applications |
| `SCRFD_10G` ⭐ | 4.2M | 17MB | 95.16% | 93.87% | 83.05% | **High accuracy + speed** |
**Accuracy**: WIDER FACE validation set - from [SCRFD paper](https://arxiv.org/abs/2105.04714)
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
#### Usage
```python
from uniface import SCRFD
from uniface.constants import SCRFDWeights
# Fast real-time detection
detector = SCRFD(
model_name=SCRFDWeights.SCRFD_500M_KPS,
conf_thresh=0.5,
input_size=(640, 640)
)
# High accuracy
detector = SCRFD(
model_name=SCRFDWeights.SCRFD_10G_KPS,
conf_thresh=0.5
)
```
---
## Face Recognition Models
### ArcFace
State-of-the-art face recognition using additive angular margin loss.
| Model Name | Backbone | Params | Size | Use Case |
|-------------|-------------|--------|-------|----------------------------|
| `MNET` ⭐ | MobileNet | 2.0M | 8MB | **Balanced (recommended)** |
| `RESNET` | ResNet50 | 43.6M | 166MB | Maximum accuracy |
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
**Accuracy**: Benchmark on your own dataset or use standard face verification benchmarks
#### Usage
```python
from uniface import ArcFace
from uniface.constants import ArcFaceWeights
# Default (MobileNet backbone)
recognizer = ArcFace()
# High accuracy (ResNet50 backbone)
recognizer = ArcFace(model_name=ArcFaceWeights.RESNET)
# Extract embedding
embedding = recognizer.get_normalized_embedding(image, landmarks)
# Returns: (1, 512) normalized embedding vector
```
---
### MobileFace
Lightweight face recognition optimized for mobile devices.
| Model Name | Backbone | Params | Size | Use Case |
|-----------------|-----------------|--------|------|--------------------|
| `MNET_025` | MobileNetV1 0.25| 0.2M | 1MB | Ultra-lightweight |
| `MNET_V2` ⭐ | MobileNetV2 | 1.0M | 4MB | **Mobile/Edge** |
| `MNET_V3_SMALL` | MobileNetV3-S | 0.8M | 3MB | Mobile optimized |
| `MNET_V3_LARGE` | MobileNetV3-L | 2.5M | 10MB | Balanced mobile |
**Note**: These models are lightweight alternatives to ArcFace for resource-constrained environments
#### Usage
```python
from uniface import MobileFace
from uniface.constants import MobileFaceWeights
# Lightweight
recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2)
```
---
### SphereFace
Face recognition using angular softmax loss.
| Model Name | Backbone | Params | Size | Use Case |
|-------------|----------|--------|------|----------------------|
| `SPHERE20` | Sphere20 | 13.0M | 50MB | Research/Comparison |
| `SPHERE36` | Sphere36 | 24.2M | 92MB | Research/Comparison |
**Note**: SphereFace uses angular softmax loss, an earlier approach before ArcFace
#### Usage
```python
from uniface import SphereFace
from uniface.constants import SphereFaceWeights
recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20)
```
---
## Facial Landmark Models
### 106-Point Landmark Detection
High-precision facial landmark localization.
| Model Name | Points | Params | Size | Use Case |
|------------|--------|--------|------|-----------------------------|
| `2D106` | 106 | 3.7M | 14MB | Face alignment, analysis |
**Note**: Provides 106 facial keypoints for detailed face analysis and alignment
#### Usage
```python
from uniface import Landmark106
landmarker = Landmark106()
landmarks = landmarker.get_landmarks(image, bbox)
# Returns: (106, 2) array of (x, y) coordinates
```
**Landmark Groups:**
- Face contour: 0-32 (33 points)
- Eyebrows: 33-50 (18 points)
- Nose: 51-62 (12 points)
- Eyes: 63-86 (24 points)
- Mouth: 87-105 (19 points)
---
## Attribute Analysis Models
### Age & Gender Detection
| Model Name | Attributes | Params | Size | Use Case |
|------------|-------------|--------|------|-------------------|
| `DEFAULT` | Age, Gender | 2.1M | 8MB | General purpose |
**Dataset**: Trained on CelebA
**Note**: Accuracy varies by demographic and image quality. Test on your specific use case.
#### Usage
```python
from uniface import AgeGender
predictor = AgeGender()
gender, age = predictor.predict(image, bbox)
# Returns: ("Male"/"Female", age_in_years)
```
---
### Emotion Detection
| Model Name | Classes | Params | Size | Use Case |
|--------------|---------|--------|------|-----------------------|
| `AFFECNET7` | 7 | 0.5M | 2MB | 7-class emotion |
| `AFFECNET8` | 8 | 0.5M | 2MB | 8-class emotion |
**Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger
**Classes (8)**: Above + Contempt
**Dataset**: Trained on AffectNet
**Note**: Emotion detection accuracy depends heavily on facial expression clarity and cultural context
#### Usage
```python
from uniface import Emotion
from uniface.constants import DDAMFNWeights
predictor = Emotion(model_name=DDAMFNWeights.AFFECNET7)
emotion, confidence = predictor.predict(image, landmarks)
```
---
## Model Selection Guide
### By Use Case
#### Mobile/Edge Devices
- **Detection**: `RetinaFace(MNET_025)` or `SCRFD(SCRFD_500M)`
- **Recognition**: `MobileFace(MNET_V2)`
- **Priority**: Speed, small model size
#### Real-Time Applications (Webcam, Video)
- **Detection**: `RetinaFace(MNET_V2)` or `SCRFD(SCRFD_500M)`
- **Recognition**: `ArcFace(MNET)`
- **Priority**: Speed-accuracy balance
#### High-Accuracy Applications (Security, Verification)
- **Detection**: `SCRFD(SCRFD_10G)` or `RetinaFace(RESNET34)`
- **Recognition**: `ArcFace(RESNET)`
- **Priority**: Maximum accuracy
#### Server/Cloud Deployment
- **Detection**: `SCRFD(SCRFD_10G)`
- **Recognition**: `ArcFace(RESNET)`
- **Priority**: Accuracy, batch processing
---
### By Hardware
#### Apple Silicon (M1/M2/M3/M4)
**Recommended**: All models work well with CoreML acceleration
```bash
pip install uniface[silicon]
```
**Recommended models**:
- **Fast**: `SCRFD(SCRFD_500M)` - Lightweight, real-time capable
- **Balanced**: `RetinaFace(MNET_V2)` - Good accuracy/speed tradeoff
- **Accurate**: `SCRFD(SCRFD_10G)` - High accuracy
**Benchmark on your M4**: `python scripts/run_detection.py --iterations 100`
#### NVIDIA GPU (CUDA)
**Recommended**: Larger models for maximum throughput
```bash
pip install uniface[gpu]
```
**Recommended models**:
- **Fast**: `SCRFD(SCRFD_500M)` - Maximum throughput
- **Balanced**: `SCRFD(SCRFD_10G)` - Best overall
- **Accurate**: `RetinaFace(RESNET34)` - Highest accuracy
#### CPU Only
**Recommended**: Lightweight models
**Recommended models**:
- **Fast**: `RetinaFace(MNET_025)` - Smallest, fastest
- **Balanced**: `RetinaFace(MNET_V2)` - Recommended default
- **Accurate**: `SCRFD(SCRFD_10G)` - Best accuracy on CPU
**Note**: FPS values vary significantly based on image size, number of faces, and hardware. Always benchmark on your specific setup.
---
## Benchmark Details
### How to Benchmark
Run benchmarks on your own hardware:
```bash
# Detection speed
python scripts/run_detection.py --image assets/test.jpg --iterations 100
# Compare models
python scripts/run_detection.py --image assets/test.jpg --method retinaface --iterations 100
python scripts/run_detection.py --image assets/test.jpg --method scrfd --iterations 100
```
### Accuracy Metrics Explained
- **WIDER FACE**: Standard face detection benchmark with three difficulty levels
- **Easy**: Large faces (>50px), clear backgrounds
- **Medium**: Medium-sized faces (30-50px), moderate occlusion
- **Hard**: Small faces (<30px), heavy occlusion, blur
*Accuracy values are from the original papers - see references below*
- **Model Size**: ONNX model file size (affects download time and memory)
- **Params**: Number of model parameters (affects inference speed)
### Important Notes
1. **Speed varies by**:
- Image resolution
- Number of faces in image
- Hardware (CPU/GPU/CoreML)
- Batch size
- Operating system
2. **Accuracy varies by**:
- Image quality
- Lighting conditions
- Face pose and occlusion
- Demographic factors
3. **Always benchmark on your specific use case** before choosing a model
---
## Model Updates
Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
### Manual Model Management
```python
from uniface.model_store import verify_model_weights
from uniface.constants import RetinaFaceWeights
# Download specific model
model_path = verify_model_weights(
RetinaFaceWeights.MNET_V2,
root='./custom_cache'
)
# Models are verified with SHA-256 checksums
```
### Download All Models
```bash
# Using the provided script
python scripts/download_model.py
# Download specific model
python scripts/download_model.py --model MNET_V2
```
---
## References
### Model Training & Architectures
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
### Papers
- **RetinaFace**: [Single-Shot Multi-Level Face Localisation in the Wild](https://arxiv.org/abs/1905.00641)
- **SCRFD**: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714)
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)

355
QUICKSTART.md Normal file
View File

@@ -0,0 +1,355 @@
# UniFace Quick Start Guide
Get up and running with UniFace in 5 minutes! This guide covers the most common use cases.
---
## Installation
```bash
# macOS (Apple Silicon)
pip install uniface[silicon]
# Linux/Windows with NVIDIA GPU
pip install uniface[gpu]
# CPU-only (all platforms)
pip install uniface
```
---
## 1. Face Detection (30 seconds)
Detect faces in an image:
```python
import cv2
from uniface import RetinaFace
# Load image
image = cv2.imread("photo.jpg")
# Initialize detector (models auto-download on first use)
detector = RetinaFace()
# Detect faces
faces = detector.detect(image)
# Print results
for i, face in enumerate(faces):
print(f"Face {i+1}:")
print(f" Confidence: {face['confidence']:.2f}")
print(f" BBox: {face['bbox']}")
print(f" Landmarks: {len(face['landmarks'])} points")
```
**Output:**
```
Face 1:
Confidence: 0.99
BBox: [120.5, 85.3, 245.8, 210.6]
Landmarks: 5 points
```
---
## 2. Visualize Detections (1 minute)
Draw bounding boxes and landmarks:
```python
import cv2
from uniface import RetinaFace
from uniface.visualization import draw_detections
# Detect faces
detector = RetinaFace()
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
# Extract visualization data
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
# Draw on image
draw_detections(image, bboxes, scores, landmarks, vis_threshold=0.6)
# Save result
cv2.imwrite("output.jpg", image)
print("Saved output.jpg")
```
---
## 3. Face Recognition (2 minutes)
Compare two faces:
```python
import cv2
import numpy as np
from uniface import RetinaFace, ArcFace
# Initialize models
detector = RetinaFace()
recognizer = ArcFace()
# Load two images
image1 = cv2.imread("person1.jpg")
image2 = cv2.imread("person2.jpg")
# Detect faces
faces1 = detector.detect(image1)
faces2 = detector.detect(image2)
if faces1 and faces2:
# Extract embeddings
emb1 = recognizer.get_normalized_embedding(image1, faces1[0]['landmarks'])
emb2 = recognizer.get_normalized_embedding(image2, faces2[0]['landmarks'])
# Compute similarity (cosine similarity)
similarity = np.dot(emb1, emb2.T)[0][0]
# Interpret result
if similarity > 0.6:
print(f"✅ Same person (similarity: {similarity:.3f})")
else:
print(f"❌ Different people (similarity: {similarity:.3f})")
else:
print("No faces detected")
```
**Similarity thresholds:**
- `> 0.6`: Same person (high confidence)
- `0.4 - 0.6`: Uncertain (manual review)
- `< 0.4`: Different people
---
## 4. Webcam Demo (2 minutes)
Real-time face detection:
```python
import cv2
from uniface import RetinaFace
from uniface.visualization import draw_detections
detector = RetinaFace()
cap = cv2.VideoCapture(0)
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
if not ret:
break
# Detect faces
faces = detector.detect(frame)
# Draw results
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(frame, bboxes, scores, landmarks)
# Show frame
cv2.imshow("UniFace - Press 'q' to quit", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
```
---
## 5. Age & Gender Detection (2 minutes)
Detect age and gender:
```python
import cv2
from uniface import RetinaFace, AgeGender
# Initialize models
detector = RetinaFace()
age_gender = AgeGender()
# Load image
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
# Predict attributes
for i, face in enumerate(faces):
gender, age = age_gender.predict(image, face['bbox'])
print(f"Face {i+1}: {gender}, {age} years old")
```
**Output:**
```
Face 1: Male, 32 years old
Face 2: Female, 28 years old
```
---
## 6. Facial Landmarks (2 minutes)
Detect 106 facial landmarks:
```python
import cv2
from uniface import RetinaFace, Landmark106
# Initialize models
detector = RetinaFace()
landmarker = Landmark106()
# Detect face and landmarks
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
if faces:
landmarks = landmarker.get_landmarks(image, faces[0]['bbox'])
print(f"Detected {len(landmarks)} landmarks")
# Draw landmarks
for x, y in landmarks.astype(int):
cv2.circle(image, (x, y), 2, (0, 255, 0), -1)
cv2.imwrite("landmarks.jpg", image)
```
---
## 7. Batch Processing (3 minutes)
Process multiple images:
```python
import cv2
from pathlib import Path
from uniface import RetinaFace
detector = RetinaFace()
# Process all images in a folder
image_dir = Path("images/")
output_dir = Path("output/")
output_dir.mkdir(exist_ok=True)
for image_path in image_dir.glob("*.jpg"):
print(f"Processing {image_path.name}...")
image = cv2.imread(str(image_path))
faces = detector.detect(image)
print(f" Found {len(faces)} face(s)")
# Save results
output_path = output_dir / image_path.name
# ... draw and save ...
print("Done!")
```
---
## 8. Model Selection
Choose the right model for your use case:
```python
from uniface import create_detector
from uniface.constants import RetinaFaceWeights, SCRFDWeights
# Fast detection (mobile/edge devices)
detector = create_detector(
'retinaface',
model_name=RetinaFaceWeights.MNET_025,
conf_thresh=0.7
)
# Balanced (recommended)
detector = create_detector(
'retinaface',
model_name=RetinaFaceWeights.MNET_V2
)
# High accuracy (server/GPU)
detector = create_detector(
'scrfd',
model_name=SCRFDWeights.SCRFD_10G_KPS,
conf_thresh=0.5
)
```
---
## Common Issues
### 1. Models Not Downloading
```python
# Manually download a model
from uniface.model_store import verify_model_weights
from uniface.constants import RetinaFaceWeights
model_path = verify_model_weights(RetinaFaceWeights.MNET_V2)
print(f"Model downloaded to: {model_path}")
```
### 2. Check Hardware Acceleration
```python
import onnxruntime as ort
print("Available providers:", ort.get_available_providers())
# macOS M-series should show: ['CoreMLExecutionProvider', ...]
# NVIDIA GPU should show: ['CUDAExecutionProvider', ...]
```
### 3. Slow Performance on Mac
Make sure you installed with CoreML support:
```bash
pip install uniface[silicon]
```
### 4. Import Errors
```python
# ✅ Correct imports
from uniface import RetinaFace, ArcFace, Landmark106
from uniface.detection import create_detector
# ❌ Wrong imports
from uniface import retinaface # Module, not class
```
---
## Next Steps
- **Detailed Examples**: Check the [examples/](examples/) folder for Jupyter notebooks
- **Model Benchmarks**: See [MODELS.md](MODELS.md) for performance comparisons
- **Full Documentation**: Read [README.md](README.md) for complete API reference
---
## References
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch)
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
---
Happy coding! 🚀

510
README.md
View File

@@ -1,239 +1,447 @@
# UniFace: All-in-One Face Analysis Library
<div align="center">
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
![Python](https://img.shields.io/badge/Python-3.10%2B-blue)
[![PyPI Version](https://img.shields.io/pypi/v/uniface.svg)](https://pypi.org/project/uniface/)
[![Build Status](https://github.com/yakhyo/uniface/actions/workflows/build.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
[![CI](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
[![Downloads](https://pepy.tech/badge/uniface)](https://pepy.tech/project/uniface)
[![Code Style: PEP8](https://img.shields.io/badge/code%20style-PEP8-green.svg)](https://www.python.org/dev/peps/pep-0008/)
[![GitHub Release Downloads](https://img.shields.io/github/downloads/yakhyo/uniface/total.svg?label=Model%20Downloads)](https://github.com/yakhyo/uniface/releases)
<div align="center">
<img src=".github/logos/logo_web.webp" width=75%>
</div>
**uniface** is a lightweight face detection library designed for high-performance face localization and landmark detection. The library supports ONNX models and provides utilities for bounding box visualization and landmark plotting. To train RetinaFace model, see https://github.com/yakhyo/retinaface-pytorch.
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, and attribute analysis with hardware acceleration support across platforms.
---
## Features
- [ ] Age and gender detection (Planned).
- [ ] Face recognition (Planned).
- [x] High-speed face detection using ONNX models (Added: 2024-11-20).
- [x] Accurate facial landmark localization (e.g., eyes, nose, and mouth) (Added: 2024-11-20).
- [x] Easy-to-use API for inference and visualization (Added: 2024-11-20).
- **High-Speed Face Detection**: ONNX-optimized RetinaFace and SCRFD models
- **Facial Landmark Detection**: Accurate 106-point landmark localization
- **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
- **Attribute Analysis**: Age, gender, and emotion detection
- **Face Alignment**: Precise alignment for downstream tasks
- **Hardware Acceleration**: CoreML (Apple Silicon), CUDA (NVIDIA), CPU fallback
- **Simple API**: Intuitive factory functions and clean interfaces
- **Production-Ready**: Type hints, comprehensive logging, PEP8 compliant
---
## Installation
### Using pip
### Quick Install (All Platforms)
```bash
pip install uniface
```
### Local installation using pip
### Platform-Specific Installation
**Clone the repository**
#### macOS (Apple Silicon - M1/M2/M3/M4)
For optimal performance with **CoreML acceleration** (3-5x faster):
```bash
# Standard installation (CPU only)
pip install uniface
# With CoreML acceleration (recommended for M-series chips)
pip install uniface[silicon]
```
**Verify CoreML is available:**
```python
import onnxruntime as ort
print(ort.get_available_providers())
# Should show: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
```
#### Linux/Windows with NVIDIA GPU
```bash
# With CUDA acceleration
pip install uniface[gpu]
```
**Requirements:**
- CUDA 11.x or 12.x
- cuDNN 8.x
- See [ONNX Runtime GPU requirements](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html)
#### CPU-Only (All Platforms)
```bash
pip install uniface
```
### Install from Source
```bash
git clone https://github.com/yakhyo/uniface.git
cd uniface
```
**Install using pip**
```bash
pip install .
pip install -e .
```
---
## Quick Start
### Initialize the Model
### Face Detection
```python
import cv2
from uniface import RetinaFace
# Initialize the RetinaFace model
uniface_inference = RetinaFace(
model="retinaface_mnet_v2", # Model name
conf_thresh=0.5, # Confidence threshold
pre_nms_topk=5000, # Pre-NMS Top-K detections
nms_thresh=0.4, # NMS IoU threshold
post_nms_topk=750 # Post-NMS Top-K detections
# Initialize detector
detector = RetinaFace()
# Load image
image = cv2.imread("image.jpg")
# Detect faces
faces = detector.detect(image)
# Process results
for face in faces:
bbox = face['bbox'] # [x1, y1, x2, y2]
confidence = face['confidence']
landmarks = face['landmarks'] # 5-point landmarks
print(f"Face detected with confidence: {confidence:.2f}")
```
### Face Recognition
```python
from uniface import ArcFace, RetinaFace
from uniface import compute_similarity
# Initialize models
detector = RetinaFace()
recognizer = ArcFace()
# Detect and extract embeddings
faces1 = detector.detect(image1)
faces2 = detector.detect(image2)
embedding1 = recognizer.get_normalized_embedding(image1, faces1[0]['landmarks'])
embedding2 = recognizer.get_normalized_embedding(image2, faces2[0]['landmarks'])
# Compare faces
similarity = compute_similarity(embedding1, embedding2)
print(f"Similarity: {similarity:.4f}")
```
### Facial Landmarks
```python
from uniface import RetinaFace, Landmark106
detector = RetinaFace()
landmarker = Landmark106()
faces = detector.detect(image)
landmarks = landmarker.get_landmarks(image, faces[0]['bbox'])
# Returns 106 (x, y) landmark points
```
### Age & Gender Detection
```python
from uniface import RetinaFace, AgeGender
detector = RetinaFace()
age_gender = AgeGender()
faces = detector.detect(image)
gender, age = age_gender.predict(image, faces[0]['bbox'])
print(f"{gender}, {age} years old")
```
---
## Documentation
- [**QUICKSTART.md**](QUICKSTART.md) - 5-minute getting started guide
- [**MODELS.md**](MODELS.md) - Model zoo, benchmarks, and selection guide
- [**Examples**](examples/) - Jupyter notebooks with detailed examples
---
## API Overview
### Factory Functions (Recommended)
```python
from uniface import create_detector, create_recognizer, create_landmarker
# Create detector with default settings
detector = create_detector('retinaface')
# Create with custom config
detector = create_detector(
'scrfd',
model_name='scrfd_10g_kps',
conf_thresh=0.8,
input_size=(640, 640)
)
# Recognition and landmarks
recognizer = create_recognizer('arcface')
landmarker = create_landmarker('2d106det')
```
### Run Inference
### Direct Model Instantiation
Inference on image:
```python
from uniface import RetinaFace, SCRFD, ArcFace, MobileFace
from uniface.constants import RetinaFaceWeights
# Detection
detector = RetinaFace(
model_name=RetinaFaceWeights.MNET_V2,
conf_thresh=0.5,
nms_thresh=0.4
)
# Recognition
recognizer = ArcFace() # Uses default weights
recognizer = MobileFace() # Lightweight alternative
```
### High-Level Detection API
```python
from uniface import detect_faces
# One-line face detection
faces = detect_faces(image, method='retinaface', conf_thresh=0.8)
```
---
## Model Performance
### Face Detection (WIDER FACE Dataset)
| Model | Easy | Medium | Hard | Use Case |
|--------------------|--------|--------|--------|-------------------------|
| retinaface_mnet025 | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| retinaface_mnet_v2 | 91.70% | 91.03% | 86.60% | Balanced (recommended) |
| retinaface_r34 | 94.16% | 93.12% | 88.90% | High accuracy |
| scrfd_500m | 90.57% | 88.12% | 68.51% | Real-time applications |
| scrfd_10g | 95.16% | 93.87% | 83.05% | Best accuracy/speed |
*Accuracy values from original papers: [RetinaFace](https://arxiv.org/abs/1905.00641), [SCRFD](https://arxiv.org/abs/2105.04714)*
**Benchmark on your hardware:**
```bash
python scripts/run_detection.py --image assets/test.jpg --iterations 100
```
See [MODELS.md](MODELS.md) for detailed model information and selection guide.
<div align="center">
<img src="assets/test_result.png">
</div>
---
## Examples
### Webcam Face Detection
```python
import cv2
from uniface import RetinaFace
from uniface.visualization import draw_detections
# Load an image
image_path = "assets/test.jpg"
original_image = cv2.imread(image_path)
# Perform inference
boxes, landmarks = uniface_inference.detect(original_image)
# Visualize results
draw_detections(original_image, (boxes, landmarks), vis_threshold=0.6)
# Save the output image
output_path = "output.jpg"
cv2.imwrite(output_path, original_image)
print(f"Saved output image to {output_path}")
```
Inference on video:
```python
import cv2
from uniface.visualization import draw_detections
# Initialize the webcam
detector = RetinaFace()
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Unable to access the webcam.")
exit()
while True:
# Capture a frame from the webcam
ret, frame = cap.read()
if not ret:
print("Error: Failed to read frame.")
break
# Perform inference
boxes, landmarks = uniface_inference.detect(frame)
faces = detector.detect(frame)
# Draw detections on the frame
draw_detections(frame, (boxes, landmarks), vis_threshold=0.6)
# Extract data for visualization
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
# Display the output
cv2.imshow("Webcam Inference", frame)
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=0.6)
# Exit if 'q' is pressed
cv2.imshow("Face Detection", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the webcam and close all OpenCV windows
cap.release()
cv2.destroyAllWindows()
```
### Face Search System
```python
import numpy as np
from uniface import RetinaFace, ArcFace
detector = RetinaFace()
recognizer = ArcFace()
# Build face database
database = {}
for person_id, image_path in person_images.items():
image = cv2.imread(image_path)
faces = detector.detect(image)
if faces:
embedding = recognizer.get_normalized_embedding(
image, faces[0]['landmarks']
)
database[person_id] = embedding
# Search for a face
query_image = cv2.imread("query.jpg")
query_faces = detector.detect(query_image)
if query_faces:
query_embedding = recognizer.get_normalized_embedding(
query_image, query_faces[0]['landmarks']
)
# Find best match
best_match = None
best_similarity = -1
for person_id, db_embedding in database.items():
similarity = np.dot(query_embedding, db_embedding.T)[0][0]
if similarity > best_similarity:
best_similarity = similarity
best_match = person_id
print(f"Best match: {best_match} (similarity: {best_similarity:.4f})")
```
More examples in the [examples/](examples/) directory.
---
### Evaluation results of available models on WiderFace
## Advanced Configuration
| RetinaFace Models | Easy | Medium | Hard |
| ------------------ | ---------- | ---------- | ---------- |
| retinaface_mnet025 | 88.48% | 87.02% | 80.61% |
| retinaface_mnet050 | 89.42% | 87.97% | 82.40% |
| retinaface_mnet_v1 | 90.59% | 89.14% | 84.13% |
| retinaface_mnet_v2 | 91.70% | 91.03% | 86.60% |
| retinaface_r18 | 92.50% | 91.02% | 86.63% |
| retinaface_r34 | **94.16%** | **93.12%** | **88.90%** |
### Custom ONNX Runtime Providers
## API Reference
### `RetinaFace` Class
#### Initialization
```python
RetinaFace(
model: str,
conf_thresh: float = 0.5,
pre_nms_topk: int = 5000,
nms_thresh: float = 0.4,
post_nms_topk: int = 750
from uniface.onnx_utils import get_available_providers, create_onnx_session
# Check available providers
providers = get_available_providers()
print(f"Available: {providers}")
# Force CPU-only execution
from uniface import RetinaFace
detector = RetinaFace()
# Internally uses create_onnx_session() which auto-selects best provider
```
### Model Download and Caching
Models are automatically downloaded on first use and cached in `~/.uniface/models/`.
```python
from uniface.model_store import verify_model_weights
from uniface.constants import RetinaFaceWeights
# Manually download and verify a model
model_path = verify_model_weights(
RetinaFaceWeights.MNET_V2,
root='./custom_models' # Custom cache directory
)
```
**Parameters**:
- `model` *(str)*: Name of the model to use. Supported models:
- `retinaface_mnet025`, `retinaface_mnet050`, `retinaface_mnet_v1`, `retinaface_mnet_v2`
- `retinaface_r18`, `retinaface_r34`
- `conf_thresh` *(float, default=0.5)*: Minimum confidence score for detections.
- `pre_nms_topk` *(int, default=5000)*: Max detections to keep before NMS.
- `nms_thresh` *(float, default=0.4)*: IoU threshold for Non-Maximum Suppression.
- `post_nms_topk` *(int, default=750)*: Max detections to keep after NMS.
### Logging Configuration
```python
from uniface import Logger
import logging
# Set logging level
Logger.setLevel(logging.DEBUG) # DEBUG, INFO, WARNING, ERROR
# Disable logging
Logger.setLevel(logging.CRITICAL)
```
---
### `detect` Method
```python
detect(
image: np.ndarray,
max_num: int = 0,
metric: str = "default",
center_weight: float = 2.0
) -> Tuple[np.ndarray, np.ndarray]
## Testing
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=uniface --cov-report=html
# Run specific test file
pytest tests/test_retinaface.py -v
```
**Description**:
Detects faces in the given image and returns bounding boxes and landmarks.
**Parameters**:
- `image` *(np.ndarray)*: Input image in BGR format.
- `max_num` *(int, default=0)*: Maximum number of faces to return. `0` means return all.
- `metric` *(str, default="default")*: Metric for prioritizing detections:
- `"default"`: Prioritize detections closer to the image center.
- `"max"`: Prioritize larger bounding box areas.
- `center_weight` *(float, default=2.0)*: Weight for prioritizing center-aligned faces.
**Returns**:
- `bounding_boxes` *(np.ndarray)*: Array of detections as `[x_min, y_min, x_max, y_max, confidence]`.
- `landmarks` *(np.ndarray)*: Array of landmarks as `[(x1, y1), ..., (x5, y5)]`.
---
### Visualization Utilities
## Development
#### `draw_detections`
```python
draw_detections(
image: np.ndarray,
detections: Tuple[np.ndarray, np.ndarray],
vis_threshold: float
) -> None
### Setup Development Environment
```bash
git clone https://github.com/yakhyo/uniface.git
cd uniface
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black uniface/
isort uniface/
```
**Description**:
Draws bounding boxes and landmarks on the given image.
### Project Structure
**Parameters**:
- `image` *(np.ndarray)*: The input image in BGR format.
- `detections` *(Tuple[np.ndarray, np.ndarray])*: A tuple of bounding boxes and landmarks.
- `vis_threshold` *(float)*: Minimum confidence score for visualization.
```
uniface/
├── uniface/
│ ├── detection/ # Face detection models
│ ├── recognition/ # Face recognition models
│ ├── landmark/ # Landmark detection
│ ├── attribute/ # Age, gender, emotion
│ ├── onnx_utils.py # ONNX Runtime utilities
│ ├── model_store.py # Model download & caching
│ └── visualization.py # Drawing utilities
├── tests/ # Unit tests
├── examples/ # Example notebooks
└── scripts/ # Utility scripts
```
---
## References
### Model Training & Architectures
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
### Papers
- **RetinaFace**: [Single-Shot Multi-Level Face Localisation in the Wild](https://arxiv.org/abs/1905.00641)
- **SCRFD**: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714)
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
---
## Contributing
We welcome contributions to enhance the library! Feel free to:
Contributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/yakhyo/uniface).
- Submit bug reports or feature requests.
- Fork the repository and create a pull request.
---
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
## Acknowledgments
- Based on the RetinaFace model for face detection ([https://github.com/yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch)).
- Inspired by InsightFace and other face detection projects.
---

BIN
assets/alignment_result.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 996 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

BIN
assets/test_result.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

0
examples/.gitkeep Normal file
View File

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

38
pyproject.toml Normal file
View File

@@ -0,0 +1,38 @@
[project]
name = "uniface"
version = "1.0.0"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Age, and Gender Detection"
readme = "README.md"
license = { text = "MIT" }
authors = [
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }
]
dependencies = [
"numpy>=1.21.0",
"opencv-python>=4.5.0",
"onnx>=1.12.0",
"onnxruntime>=1.16.0",
"scikit-image>=0.19.0",
"requests>=2.28.0",
"tqdm>=4.64.0"
]
requires-python = ">=3.10"
[project.optional-dependencies]
dev = ["pytest>=7.0.0"]
gpu = ["onnxruntime-gpu>=1.16.0"]
silicon = ["onnxruntime-silicon>=1.16.0"]
[project.urls]
Homepage = "https://github.com/yakhyo/uniface"
Repository = "https://github.com/yakhyo/uniface"
[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"
[tool.setuptools]
packages = { find = {} }
[tool.setuptools.package-data]
"uniface" = ["*.txt", "*.md"]

View File

@@ -1,8 +1,8 @@
pytest
numpy
opencv-python
opencv-python-headless
onnx
onnxruntime
requests
torch
numpy>=1.21.0
opencv-python>=4.5.0
onnx>=1.12.0
onnxruntime>=1.16.0
scikit-image>=0.19.0
requests>=2.28.0
pytest>=7.0.0
tqdm>=4.64.0

18
scripts/README.md Normal file
View File

@@ -0,0 +1,18 @@
### `download_model.py`
# Download all models
```bash
python scripts/download_model.py
```
# Download just RESNET18
```bash
python scripts/download_model.py --model RESNET18
```
### `run_inference.py`
```bash
python scripts/run_inference.py --image assets/test.jpg --model MNET_V2 --iterations 10
```

389
scripts/TESTING.md Normal file
View File

@@ -0,0 +1,389 @@
# Testing Scripts Guide
Complete guide to testing all scripts in the `scripts/` directory.
---
## 📁 Available Scripts
1. **download_model.py** - Download and verify model weights
2. **run_detection.py** - Face detection on images
3. **run_recognition.py** - Face recognition (extract embeddings)
4. **run_face_search.py** - Real-time face matching with webcam
5. **sha256_generate.py** - Generate SHA256 checksums for models
---
## Testing Each Script
### 1. Test Model Download
```bash
# Download a specific model
python scripts/download_model.py --model MNET_V2
# Download all RetinaFace models (takes ~5 minutes, ~200MB)
python scripts/download_model.py
# Verify models are cached
ls -lh ~/.uniface/models/
```
**Expected Output:**
```
📥 Downloading model: retinaface_mnet_v2
2025-11-08 00:00:00 - INFO - Downloading model 'RetinaFaceWeights.MNET_V2' from https://...
Downloading ~/.uniface/models/retinaface_mnet_v2.onnx: 100%|████| 3.5M/3.5M
2025-11-08 00:00:05 - INFO - Successfully downloaded 'RetinaFaceWeights.MNET_V2'
✅ All requested weights are ready and verified.
```
---
### 2. Test Face Detection
```bash
# Basic detection
python scripts/run_detection.py --image assets/test.jpg
# With custom settings
python scripts/run_detection.py \
--image assets/test.jpg \
--method scrfd \
--threshold 0.7 \
--save_dir outputs
# Benchmark mode (100 iterations)
python scripts/run_detection.py \
--image assets/test.jpg \
--iterations 100
```
**Expected Output:**
```
Initializing detector: retinaface
2025-11-08 00:00:00 - INFO - Initializing RetinaFace with model=RetinaFaceWeights.MNET_V2...
2025-11-08 00:00:01 - INFO - CoreML acceleration enabled (Apple Silicon)
✅ Output saved at: outputs/test_out.jpg
[1/1] ⏱️ Inference time: 0.0234 seconds
```
**Verify Output:**
```bash
# Check output image was created
ls -lh outputs/test_out.jpg
# View the image (macOS)
open outputs/test_out.jpg
```
---
### 3. Test Face Recognition (Embedding Extraction)
```bash
# Extract embeddings from an image
python scripts/run_recognition.py --image assets/test.jpg
# With different models
python scripts/run_recognition.py \
--image assets/test.jpg \
--detector scrfd \
--recognizer mobileface
```
**Expected Output:**
```
Initializing detector: retinaface
Initializing recognizer: arcface
2025-11-08 00:00:00 - INFO - Successfully initialized face encoder from ~/.uniface/models/w600k_mbf.onnx
Detected 1 face(s). Extracting embeddings for the first face...
- Embedding shape: (1, 512)
- L2 norm of unnormalized embedding: 64.2341
- L2 norm of normalized embedding: 1.0000
```
---
### 4. Test Real-Time Face Search (Webcam)
**Prerequisites:**
- Webcam connected
- Reference image with a clear face
```bash
# Basic usage
python scripts/run_face_search.py --image assets/test.jpg
# With custom models
python scripts/run_face_search.py \
--image assets/test.jpg \
--detector scrfd \
--recognizer arcface
```
**Expected Behavior:**
1. Webcam window opens
2. Faces are detected in real-time
3. Green box = Match (similarity > 0.4)
4. Red box = Unknown (similarity < 0.4)
5. Press 'q' to quit
**Expected Output:**
```
Initializing models...
2025-11-08 00:00:00 - INFO - CoreML acceleration enabled (Apple Silicon)
Extracting reference embedding...
Webcam started. Press 'q' to quit.
```
**Troubleshooting:**
```bash
# If webcam doesn't open
python -c "import cv2; cap = cv2.VideoCapture(0); print('Webcam OK' if cap.isOpened() else 'Webcam FAIL')"
# If no faces detected
# - Ensure good lighting
# - Face should be frontal and clearly visible
# - Try lowering threshold: edit script line 29, change 0.4 to 0.3
```
---
### 5. Test SHA256 Generator (For Developers)
```bash
# Generate checksum for a model file
python scripts/sha256_generate.py ~/.uniface/models/retinaface_mnet_v2.onnx
# Generate for all models
for model in ~/.uniface/models/*.onnx; do
python scripts/sha256_generate.py "$model"
done
```
---
## 🔍 Quick Verification Tests
### Test 1: Imports Work
```bash
python -c "
from uniface.detection import create_detector
from uniface.recognition import create_recognizer
print('✅ Imports successful')
"
```
### Test 2: Models Download
```bash
python -c "
from uniface import RetinaFace
detector = RetinaFace()
print('✅ Model downloaded and loaded')
"
```
### Test 3: Detection Works
```bash
python -c "
import cv2
import numpy as np
from uniface import RetinaFace
detector = RetinaFace()
image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detector.detect(image)
print(f'✅ Detection works, found {len(faces)} faces')
"
```
### Test 4: Recognition Works
```bash
python -c "
import cv2
import numpy as np
from uniface import RetinaFace, ArcFace
detector = RetinaFace()
recognizer = ArcFace()
image = cv2.imread('assets/test.jpg')
faces = detector.detect(image)
if faces:
landmarks = np.array(faces[0]['landmarks'])
embedding = recognizer.get_normalized_embedding(image, landmarks)
print(f'✅ Recognition works, embedding shape: {embedding.shape}')
else:
print('⚠️ No faces detected in test image')
"
```
---
## End-to-End Test Workflow
Run this complete workflow to verify everything works:
```bash
#!/bin/bash
# Save as test_all_scripts.sh
echo "=== Testing UniFace Scripts ==="
echo ""
# Test 1: Download models
echo "1⃣ Testing model download..."
python scripts/download_model.py --model MNET_V2
if [ $? -eq 0 ]; then
echo "✅ Model download: PASS"
else
echo "❌ Model download: FAIL"
exit 1
fi
echo ""
# Test 2: Face detection
echo "2⃣ Testing face detection..."
python scripts/run_detection.py --image assets/test.jpg --save_dir /tmp/uniface_test
if [ $? -eq 0 ] && [ -f /tmp/uniface_test/test_out.jpg ]; then
echo "✅ Face detection: PASS"
else
echo "❌ Face detection: FAIL"
exit 1
fi
echo ""
# Test 3: Face recognition
echo "3⃣ Testing face recognition..."
python scripts/run_recognition.py --image assets/test.jpg > /tmp/uniface_recognition.log
if [ $? -eq 0 ] && grep -q "Embedding shape" /tmp/uniface_recognition.log; then
echo "✅ Face recognition: PASS"
else
echo "❌ Face recognition: FAIL"
exit 1
fi
echo ""
echo "=== All Tests Passed! 🎉 ==="
```
**Run the test suite:**
```bash
chmod +x test_all_scripts.sh
./test_all_scripts.sh
```
---
## Performance Benchmarking
### Benchmark Detection Speed
```bash
# Test different models
for model in retinaface scrfd; do
echo "Testing $model..."
python scripts/run_detection.py \
--image assets/test.jpg \
--method $model \
--iterations 50
done
```
### Benchmark Recognition Speed
```bash
# Test different recognizers
for recognizer in arcface mobileface; do
echo "Testing $recognizer..."
time python scripts/run_recognition.py \
--image assets/test.jpg \
--recognizer $recognizer
done
```
---
## 🐛 Common Issues
### Issue: "No module named 'uniface'"
```bash
# Solution: Install in editable mode
pip install -e .
```
### Issue: "Failed to load image"
```bash
# Check image exists
ls -lh assets/test.jpg
# Try with absolute path
python scripts/run_detection.py --image $(pwd)/assets/test.jpg
```
### Issue: "No faces detected"
```bash
# Lower confidence threshold
python scripts/run_detection.py \
--image assets/test.jpg \
--threshold 0.3
```
### Issue: Models downloading slowly
```bash
# Check internet connection
curl -I https://github.com/yakhyo/uniface/releases
# Or download manually
wget https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv2.onnx \
-O ~/.uniface/models/retinaface_mnet_v2.onnx
```
### Issue: CoreML not available on Mac
```bash
# Install CoreML-enabled ONNX Runtime
pip uninstall onnxruntime
pip install onnxruntime-silicon
# Verify
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Should show: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
```
---
## ✅ Script Status Summary
| Script | Status | API Updated | Tested |
|-----------------------|--------|-------------|--------|
| download_model.py | ✅ | ✅ | ✅ |
| run_detection.py | ✅ | ✅ | ✅ |
| run_recognition.py | ✅ | ✅ | ✅ |
| run_face_search.py | ✅ | ✅ | ✅ |
| sha256_generate.py | ✅ | N/A | ✅ |
All scripts are updated and working with the new dict-based API! 🎉
---
## 📝 Notes
- All scripts now use the factory functions (`create_detector`, `create_recognizer`)
- Scripts work with the new dict-based detection API
- Model download bug is fixed (enum vs string issue)
- CoreML acceleration is automatically detected on Apple Silicon
- All scripts include proper error handling
---
Need help with a specific script? Check the main [README.md](../README.md) or [QUICKSTART.md](../QUICKSTART.md)!

31
scripts/download_model.py Normal file
View File

@@ -0,0 +1,31 @@
import argparse
from uniface.constants import RetinaFaceWeights
from uniface.model_store import verify_model_weights
def main():
parser = argparse.ArgumentParser(description="Download and verify RetinaFace model weights.")
parser.add_argument(
"--model",
type=str,
choices=[m.name for m in RetinaFaceWeights],
help="Model to download (e.g. MNET_V2). If not specified, all models will be downloaded.",
)
args = parser.parse_args()
if args.model:
weight = RetinaFaceWeights[args.model]
print(f"📥 Downloading model: {weight.value}")
verify_model_weights(weight) # Pass enum, not string
else:
print("📥 Downloading all models...")
for weight in RetinaFaceWeights:
verify_model_weights(weight) # Pass enum, not string
print("✅ All requested weights are ready and verified.")
if __name__ == "__main__":
main()

View File

@@ -1,23 +0,0 @@
#!/bin/bash
# Exit on errors
set -e
cd "$(dirname "$0")"/..
echo "Deleting existing release-related files..."
rm -rf dist/ build/ *.egg-info
pip install --upgrade pip
pip install twine
echo "Creating a package for the current release (PyPI compatible)..."
python3 setup.py sdist bdist_wheel
echo "Release package created successfully in the 'dist/' folder."
echo "Uploading the package to PyPI..."
twine upload dist/*
echo "Release uploaded successfully!"

87
scripts/run_detection.py Normal file
View File

@@ -0,0 +1,87 @@
import os
import cv2
import time
import argparse
import numpy as np
# UPDATED: Use the factory function and import from the new location
from uniface.detection import create_detector
from uniface.visualization import draw_detections
def run_inference(detector, image_path: str, vis_threshold: float = 0.6, save_dir: str = "outputs"):
"""
Run face detection on a single image.
Args:
detector: Initialized face detector.
image_path (str): Path to input image.
vis_threshold (float): Threshold for drawing detections.
save_dir (str): Directory to save output image.
"""
image = cv2.imread(image_path)
if image is None:
print(f"❌ Error: Failed to load image from '{image_path}'")
return
# 1. Get the list of face dictionaries from the detector
faces = detector.detect(image)
if faces:
# 2. Unpack the data into separate lists
bboxes = [face['bbox'] for face in faces]
scores = [face['confidence'] for face in faces]
landmarks = [face['landmarks'] for face in faces]
# 3. Pass the unpacked lists to the drawing function
draw_detections(image, bboxes, scores, landmarks, vis_threshold=0.6)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f"{os.path.splitext(os.path.basename(image_path))[0]}_out.jpg")
cv2.imwrite(output_path, image)
print(f"✅ Output saved at: {output_path}")
def main():
parser = argparse.ArgumentParser(description="Run face detection on an image.")
parser.add_argument("--image", type=str, required=True, help="Path to the input image")
parser.add_argument(
"--method",
type=str,
default="retinaface",
choices=['retinaface', 'scrfd'],
help="Detection method to use."
)
parser.add_argument("--threshold", type=float, default=0.6, help="Visualization confidence threshold")
parser.add_argument("--iterations", type=int, default=1, help="Number of inference runs for benchmarking")
parser.add_argument("--save_dir", type=str, default="outputs", help="Directory to save output images")
parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
args = parser.parse_args()
if args.verbose:
from uniface import enable_logging
enable_logging()
print(f"Initializing detector: {args.method}")
detector = create_detector(method=args.method)
avg_time = 0
for i in range(args.iterations):
start = time.time()
run_inference(detector, args.image, args.threshold, args.save_dir)
elapsed = time.time() - start
print(f"[{i + 1}/{args.iterations}] ⏱️ Inference time: {elapsed:.4f} seconds")
if i >= 0: # Avoid counting the first run if it includes model loading time
avg_time += elapsed
if args.iterations > 1:
# Adjust average calculation to exclude potential first-run overhead
effective_iterations = max(1, args.iterations)
print(
f"\n🔥 Average inference time over {effective_iterations} runs: {avg_time / effective_iterations:.4f} seconds")
if __name__ == "__main__":
main()

104
scripts/run_face_search.py Normal file
View File

@@ -0,0 +1,104 @@
import argparse
import cv2
import numpy as np
# Use the new high-level factory functions
from uniface.detection import create_detector
from uniface.face_utils import compute_similarity
from uniface.recognition import create_recognizer
def extract_reference_embedding(detector, recognizer, image_path: str) -> np.ndarray:
"""Extracts a normalized embedding from the first face found in an image."""
image = cv2.imread(image_path)
if image is None:
raise RuntimeError(f"Failed to load image: {image_path}")
faces = detector.detect(image)
if not faces:
raise RuntimeError("No faces found in reference image.")
# Get landmarks from the first detected face dictionary
landmarks = np.array(faces[0]["landmarks"])
# Use normalized embedding for more reliable similarity comparison
embedding = recognizer.get_normalized_embedding(image, landmarks)
return embedding
def run_video(detector, recognizer, ref_embedding: np.ndarray, threshold: float = 0.4):
"""Run real-time face recognition from a webcam feed."""
cap = cv2.VideoCapture(0)
if not cap.isOpened():
raise RuntimeError("Webcam could not be opened.")
print("Webcam started. Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
break
faces = detector.detect(frame)
# Loop through each detected face
for face in faces:
# Extract bbox and landmarks from the dictionary
bbox = face["bbox"]
landmarks = np.array(face["landmarks"])
x1, y1, x2, y2 = map(int, bbox)
# Get the normalized embedding for the current face
embedding = recognizer.get_normalized_embedding(frame, landmarks)
# Compare with the reference embedding
sim = compute_similarity(ref_embedding, embedding)
# Draw results
label = f"Match ({sim:.2f})" if sim > threshold else f"Unknown ({sim:.2f})"
color = (0, 255, 0) if sim > threshold else (0, 0, 255)
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
cv2.imshow("Face Recognition", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description="Face recognition using a reference image.")
parser.add_argument("--image", type=str, required=True, help="Path to the reference face image.")
parser.add_argument(
"--detector", type=str, default="scrfd", choices=["retinaface", "scrfd"], help="Face detection method."
)
parser.add_argument(
"--recognizer",
type=str,
default="arcface",
choices=["arcface", "mobileface", "sphereface"],
help="Face recognition method.",
)
parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
args = parser.parse_args()
if args.verbose:
from uniface import enable_logging
enable_logging()
print("Initializing models...")
detector = create_detector(method=args.detector)
recognizer = create_recognizer(method=args.recognizer)
print("Extracting reference embedding...")
ref_embedding = extract_reference_embedding(detector, recognizer, args.image)
run_video(detector, recognizer, ref_embedding)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,81 @@
import cv2
import argparse
import numpy as np
# Use the new high-level factory functions for consistency
from uniface.detection import create_detector
from uniface.recognition import create_recognizer
def run_inference(detector, recognizer, image_path: str):
"""
Detect faces and extract embeddings from a single image.
Args:
detector: Initialized face detector.
recognizer: Initialized face recognition model.
image_path (str): Path to the input image.
"""
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
if not faces:
print("No faces detected.")
return
print(f"Detected {len(faces)} face(s). Extracting embeddings for the first face...")
# Process the first detected face
first_face = faces[0]
landmarks = np.array(first_face['landmarks']) # Convert landmarks to numpy array
# Extract embedding using the landmarks from the face dictionary
embedding = recognizer.get_embedding(image, landmarks)
norm_embedding = recognizer.get_normalized_embedding(image, landmarks)
# Print some info about the embeddings
print(f" - Embedding shape: {embedding.shape}")
print(f" - L2 norm of unnormalized embedding: {np.linalg.norm(embedding):.4f}")
print(f" - L2 norm of normalized embedding: {np.linalg.norm(norm_embedding):.4f}")
def main():
parser = argparse.ArgumentParser(description="Extract face embeddings from a single image.")
parser.add_argument("--image", type=str, required=True, help="Path to the input image.")
parser.add_argument(
"--detector",
type=str,
default="retinaface",
choices=['retinaface', 'scrfd'],
help="Face detection method to use."
)
parser.add_argument(
"--recognizer",
type=str,
default="arcface",
choices=['arcface', 'mobileface', 'sphereface'],
help="Face recognition method to use."
)
parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
args = parser.parse_args()
if args.verbose:
from uniface import enable_logging
enable_logging()
print(f"Initializing detector: {args.detector}")
detector = create_detector(method=args.detector)
print(f"Initializing recognizer: {args.recognizer}")
recognizer = create_recognizer(method=args.recognizer)
run_inference(detector, recognizer, args.image)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,35 @@
import argparse
import hashlib
from pathlib import Path
def compute_sha256(file_path: Path, chunk_size: int = 8192) -> str:
sha256_hash = hashlib.sha256()
with file_path.open("rb") as f:
for chunk in iter(lambda: f.read(chunk_size), b""):
sha256_hash.update(chunk)
return sha256_hash.hexdigest()
def main():
parser = argparse.ArgumentParser(
description="Compute SHA256 hash of a model weight file."
)
parser.add_argument(
"file",
type=Path,
help="Path to the model weight file (.onnx, .pth, etc)."
)
args = parser.parse_args()
if not args.file.exists() or not args.file.is_file():
print(f"File does not exist: {args.file}")
return
sha256 = compute_sha256(args.file)
print(f"`SHA256 hash for '{args.file.name}':\n{sha256}")
if __name__ == "__main__":
main()

View File

@@ -1,43 +0,0 @@
import os
from setuptools import setup, find_packages
# Read the README file for the long description
long_description = ""
if os.path.exists("README.md"):
with open("README.md", "r", encoding="utf-8") as f:
long_description = f.read()
setup(
name="uniface",
version="0.1.1",
packages=find_packages(),
install_requires=[
"numpy",
"opencv-python",
"onnx",
"onnxruntime",
"requests",
"torch"
],
extras_require={
"dev": ["pytest"],
},
description="UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Age, and Gender Detection",
long_description=long_description,
long_description_content_type="text/markdown",
author="Yakhyokhuja Valikhujaev",
author_email="yakhyo9696@gmail.com",
url="https://github.com/yakhyo/uniface",
license="MIT",
classifiers=[
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Topic :: Software Development :: Libraries :: Python Modules",
],
keywords="face detection, face recognition, facial landmark, facial attribute, onnx, opencv, retinaface",
python_requires=">=3.8",
)

57
test.py
View File

@@ -1,57 +0,0 @@
import os
import cv2
import numpy as np
from uniface import RetinaFace, draw_detections
def run_inference(image_path, save_image=False, vis_threshold=0.6):
"""
Perform inference on an image, draw detections, and optionally save the output image.
Args:
image_path (str): Path to the input image.
save_image (bool): Whether to save the output image with detections.
vis_threshold (float): Confidence threshold for displaying detections.
"""
# Load the image
original_image = cv2.imread(image_path)
if original_image is None:
print(f"Error: Could not read image from {image_path}")
return
# Perform face detection
boxes, landmarks = retinaface_inference.detect(original_image)
# Draw detections on the image
draw_detections(original_image, (boxes, landmarks), vis_threshold)
# Save the output image if requested
if save_image:
im_name = os.path.splitext(os.path.basename(image_path))[0]
save_name = f"{im_name}_out.jpg"
cv2.imwrite(save_name, original_image)
print(f"Image saved at '{save_name}'")
if __name__ == '__main__':
import time
# Initialize and run the ONNX inference
retinaface_inference = RetinaFace(
model="retinaface_mnet_v2",
conf_thresh=0.5,
pre_nms_topk=5000,
nms_thresh=0.4,
post_nms_topk=750,
)
img_path = "assets/test.jpg"
avg = 0
for _ in range(50):
st = time.time()
run_inference(img_path, save_image=True, vis_threshold=0.6)
d = time.time() - st
print(d)
avg += d
print("avg", avg / 50)

View File

@@ -1,6 +1,8 @@
import pytest
import numpy as np
from uniface import RetinaFace
import pytest
from uniface.constants import RetinaFaceWeights
from uniface.detection import RetinaFace
@pytest.fixture
@@ -9,7 +11,7 @@ def retinaface_model():
Fixture to initialize the RetinaFace model for testing.
"""
return RetinaFace(
model="retinaface_mnet_v2",
model_name=RetinaFaceWeights.MNET_V2,
conf_thresh=0.5,
pre_nms_topk=5000,
nms_thresh=0.4,
@@ -31,20 +33,27 @@ def test_inference_on_640x640_image(retinaface_model):
# Generate a mock 640x640 BGR image
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
# Run inference
detections, landmarks = retinaface_model.detect(mock_image)
# Run inference - returns list of dictionaries
faces = retinaface_model.detect(mock_image)
# Check output types
assert isinstance(detections, np.ndarray), "Detections should be a numpy array."
assert isinstance(landmarks, np.ndarray), "Landmarks should be a numpy array."
# Check output type
assert isinstance(faces, list), "Detections should be a list."
# Check that detections have the expected shape
if detections.size > 0: # If faces are detected
assert detections.shape[1] == 5, "Each detection should have 5 values (x1, y1, x2, y2, score)."
# Check that each face has the expected structure
for face in faces:
assert isinstance(face, dict), "Each detection should be a dictionary."
assert "bbox" in face, "Each detection should have a 'bbox' key."
assert "confidence" in face, "Each detection should have a 'confidence' key."
assert "landmarks" in face, "Each detection should have a 'landmarks' key."
# Check landmarks shape
if landmarks.size > 0:
assert landmarks.shape[1:] == (5, 2), "Landmarks should have shape (N, 5, 2)."
# Check bbox format
bbox = face["bbox"]
assert len(bbox) == 4, "BBox should have 4 values (x1, y1, x2, y2)."
# Check landmarks format
landmarks = face["landmarks"]
assert len(landmarks) == 5, "Should have 5 landmark points."
assert all(len(pt) == 2 for pt in landmarks), "Each landmark should be (x, y)."
def test_confidence_threshold(retinaface_model):
@@ -55,12 +64,12 @@ def test_confidence_threshold(retinaface_model):
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
# Run inference
detections, _ = retinaface_model.detect(mock_image)
faces = retinaface_model.detect(mock_image)
# Ensure all detections have confidence scores above the threshold
if detections.size > 0: # If faces are detected
confidence_scores = detections[:, 4]
assert (confidence_scores >= 0.5).all(), "Some detections have confidence below the threshold."
for face in faces:
confidence = face["confidence"]
assert confidence >= 0.5, f"Detection has confidence {confidence} below threshold 0.5"
def test_no_faces_detected(retinaface_model):
@@ -71,8 +80,7 @@ def test_no_faces_detected(retinaface_model):
empty_image = np.zeros((640, 640, 3), dtype=np.uint8)
# Run inference
detections, landmarks = retinaface_model.detect(empty_image)
faces = retinaface_model.detect(empty_image)
# Ensure no detections or landmarks are found
assert detections.size == 0, "Detections should be empty for a blank image."
assert landmarks.size == 0, "Landmarks should be empty for a blank image."
# Ensure no detections are found
assert len(faces) == 0, "Should detect no faces in a blank image."

View File

@@ -1,4 +1,4 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
# Copyright 2025 Yakhyokhuja Valikhujaev
#
# Licensed under the MIT License.
# You may obtain a copy of the License at
@@ -11,18 +11,52 @@
# See the License for the specific language governing permissions and
# limitations under the License.
__license__ = "MIT"
__author__ = "Yakhyokhuja Valikhujaev"
__version__ = "1.0.0"
from uniface.retinaface import RetinaFace
from uniface.log import Logger
from uniface.face_utils import compute_similarity, face_alignment
from uniface.log import Logger, enable_logging
from uniface.model_store import verify_model_weights
from uniface.version import __version__, __author__
from uniface.visualization import draw_detections
from .attribute import AgeGender
try:
from .attribute import Emotion
except ImportError:
Emotion = None # PyTorch not installed
from .detection import SCRFD, RetinaFace, create_detector, detect_faces, list_available_detectors
from .landmark import Landmark106, create_landmarker
from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
__all__ = [
"__author__",
"__license__",
"__version__",
"__author__"
# Factory functions
"create_detector",
"create_landmarker",
"create_recognizer",
"detect_faces",
"list_available_detectors",
# Detection models
"RetinaFace",
"Logger",
"SCRFD",
# Recognition models
"ArcFace",
"MobileFace",
"SphereFace",
# Landmark models
"Landmark106",
# Attribute models
"AgeGender",
"Emotion",
# Utilities
"compute_similarity",
"draw_detections",
"face_alignment",
"verify_model_weights",
"draw_detections"
"Logger",
"enable_logging",
]

View File

@@ -0,0 +1,106 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Dict, Any, List, Union
import numpy as np
from uniface.attribute.age_gender import AgeGender
from uniface.attribute.base import Attribute
from uniface.constants import AgeGenderWeights, DDAMFNWeights
# Emotion requires PyTorch - make it optional
try:
from uniface.attribute.emotion import Emotion
_EMOTION_AVAILABLE = True
except ImportError:
Emotion = None
_EMOTION_AVAILABLE = False
# Public API for the attribute module
__all__ = [
"AgeGender",
"Emotion",
"create_attribute_predictor",
"predict_attributes"
]
# A mapping from model enums to their corresponding attribute classes
_ATTRIBUTE_MODELS = {
**{model: AgeGender for model in AgeGenderWeights},
}
# Add Emotion models only if PyTorch is available
if _EMOTION_AVAILABLE:
_ATTRIBUTE_MODELS.update({model: Emotion for model in DDAMFNWeights})
def create_attribute_predictor(
model_name: Union[AgeGenderWeights, DDAMFNWeights],
**kwargs: Any
) -> Attribute:
"""
Factory function to create an attribute predictor instance.
This high-level API simplifies the creation of attribute models by
dynamically selecting the correct class based on the provided model enum.
Args:
model_name: The enum corresponding to the desired attribute model
(e.g., AgeGenderWeights.DEFAULT or DDAMFNWeights.AFFECNET7).
**kwargs: Additional keyword arguments to pass to the model's constructor.
Returns:
An initialized instance of an Attribute predictor class (e.g., AgeGender).
Raises:
ValueError: If the provided model_name is not a supported enum.
"""
model_class = _ATTRIBUTE_MODELS.get(model_name)
if model_class is None:
raise ValueError(f"Unsupported attribute model: {model_name}. "
f"Please choose from AgeGenderWeights or DDAMFNWeights.")
# Pass model_name to the constructor, as some classes might need it
return model_class(model_name=model_name, **kwargs)
def predict_attributes(
image: np.ndarray,
detections: List[Dict[str, np.ndarray]],
predictor: Attribute
) -> List[Dict[str, Any]]:
"""
High-level API to predict attributes for multiple detected faces.
This function iterates through a list of face detections, runs the
specified attribute predictor on each one, and appends the results back
into the detection dictionary.
Args:
image (np.ndarray): The full input image in BGR format.
detections (List[Dict]): A list of detection results, where each dict
must contain a 'bbox' and optionally 'landmark'.
predictor (Attribute): An initialized attribute predictor instance,
created by `create_attribute_predictor`.
Returns:
The list of detections, where each dictionary is updated with a new
'attributes' key containing the prediction result.
"""
for face in detections:
# Initialize attributes dict if it doesn't exist
if 'attributes' not in face:
face['attributes'] = {}
if isinstance(predictor, AgeGender):
gender, age = predictor(image, face['bbox'])
face['attributes']['gender'] = gender
face['attributes']['age'] = age
elif isinstance(predictor, Emotion):
emotion, confidence = predictor(image, face['landmark'])
face['attributes']['emotion'] = emotion
face['attributes']['confidence'] = confidence
return detections

View File

@@ -0,0 +1,172 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import List, Tuple, Union
import cv2
import numpy as np
from uniface.attribute.base import Attribute
from uniface.constants import AgeGenderWeights
from uniface.face_utils import bbox_center_alignment
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
__all__ = ["AgeGender"]
class AgeGender(Attribute):
"""
Age and gender prediction model using ONNX Runtime.
This class inherits from the base `Attribute` class and implements the
functionality for predicting age (in years) and gender (0 for female,
1 for male) from a face image. It requires a bounding box to locate the face.
"""
def __init__(self, model_name: AgeGenderWeights = AgeGenderWeights.DEFAULT) -> None:
"""
Initializes the AgeGender prediction model.
Args:
model_name (AgeGenderWeights): The enum specifying the model weights
to load.
"""
Logger.info(f"Initializing AgeGender with model={model_name.name}")
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initializes the ONNX model and creates an inference session.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get model input details from the loaded model
input_meta = self.session.get_inputs()[0]
self.input_name = input_meta.name
self.input_size = tuple(input_meta.shape[2:4]) # (height, width)
self.output_names = [output.name for output in self.session.get_outputs()]
Logger.info(f"Successfully initialized AgeGender model with input size {self.input_size}")
except Exception as e:
Logger.error(f"Failed to load AgeGender model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize AgeGender model: {e}")
def preprocess(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> np.ndarray:
"""
Aligns the face based on the bounding box and preprocesses it for inference.
Args:
image (np.ndarray): The full input image in BGR format.
bbox (Union[List, np.ndarray]): The face bounding box coordinates [x1, y1, x2, y2].
Returns:
np.ndarray: The preprocessed image blob ready for inference.
"""
bbox = np.asarray(bbox)
width, height = bbox[2] - bbox[0], bbox[3] - bbox[1]
center = ((bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2)
scale = self.input_size[1] / (max(width, height) * 1.5)
# **Rotation parameter restored here**
rotation = 0.0
aligned_face, _ = bbox_center_alignment(image, center, self.input_size[1], scale, rotation)
blob = cv2.dnn.blobFromImage(
aligned_face, scalefactor=1.0, size=self.input_size[::-1], mean=(0.0, 0.0, 0.0), swapRB=True
)
return blob
def postprocess(self, prediction: np.ndarray) -> Tuple[str, int]:
"""
Processes the raw model output to extract gender and age.
Args:
prediction (np.ndarray): The raw output from the model inference.
Returns:
Tuple[str, int]: A tuple containing the predicted gender label ("Female" or "Male")
and age (in years).
"""
# First two values are gender logits
gender_id = int(np.argmax(prediction[:2]))
gender = "Female" if gender_id == 0 else "Male"
# Third value is normalized age, scaled by 100
age = int(np.round(prediction[2] * 100))
return gender, age
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[str, int]:
"""
Predicts age and gender for a single face specified by a bounding box.
Args:
image (np.ndarray): The full input image in BGR format.
bbox (Union[List, np.ndarray]): The face bounding box coordinates [x1, y1, x2, y2].
Returns:
Tuple[str, int]: A tuple containing the predicted gender label and age.
"""
face_blob = self.preprocess(image, bbox)
prediction = self.session.run(self.output_names, {self.input_name: face_blob})[0][0]
gender, age = self.postprocess(prediction)
return gender, age
# TODO: below is only for testing, remove it later
if __name__ == "__main__":
# To run this script, you need to have uniface.detection installed
# or available in your path.
from uniface.constants import RetinaFaceWeights
from uniface.detection import create_detector
print("Initializing models for live inference...")
# 1. Initialize the face detector
# Using a smaller model for faster real-time performance
detector = create_detector(model_name=RetinaFaceWeights.MNET_V2)
# 2. Initialize the attribute predictor
age_gender_predictor = AgeGender()
# 3. Start webcam capture
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open webcam.")
exit()
print("Starting webcam feed. Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture frame.")
break
# Detect faces in the current frame
detections = detector.detect(frame)
# For each detected face, predict age and gender
for detection in detections:
box = detection["bbox"]
x1, y1, x2, y2 = map(int, box)
# Predict attributes
gender, age = age_gender_predictor.predict(frame, box)
# Prepare text and draw on the frame
label = f"{gender}, {age}"
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
# Display the resulting frame
cv2.imshow("Age and Gender Inference (Press 'q' to quit)", frame)
# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord("q"):
break
# Release resources
cap.release()
cv2.destroyAllWindows()
print("Inference stopped.")

91
uniface/attribute/base.py Normal file
View File

@@ -0,0 +1,91 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from typing import Any
import numpy as np
class Attribute(ABC):
"""
Abstract base class for face attribute models.
This class defines the common interface that all attribute models
(e.g., age-gender, emotion) must implement. It ensures a consistent API
across different attribute prediction modules in the library, making them
interchangeable and easy to use.
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initializes the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime, PyTorch), and any necessary
warm-up procedures to prepare the model for prediction.
"""
raise NotImplementedError("Subclasses must implement the _initialize_model method.")
@abstractmethod
def preprocess(self, image: np.ndarray, *args: Any) -> Any:
"""
Preprocesses the input data for the model.
This method should take a raw image and any other necessary data
(like bounding boxes or landmarks) and convert it into the format
expected by the model's inference engine (e.g., a blob or tensor).
Args:
image (np.ndarray): The input image containing the face, typically
in BGR format.
*args: Additional arguments required for preprocessing, such as
bounding boxes or facial landmarks.
Returns:
The preprocessed data ready for model inference.
"""
raise NotImplementedError("Subclasses must implement the preprocess method.")
@abstractmethod
def postprocess(self, prediction: Any) -> Any:
"""
Postprocesses the raw model output into a human-readable format.
This method takes the raw output from the model's inference and
converts it into a meaningful result, such as an age value, a gender
label, or an emotion category.
Args:
prediction (Any): The raw output from the model's inference.
Returns:
The final, processed attributes.
"""
raise NotImplementedError("Subclasses must implement the postprocess method.")
@abstractmethod
def predict(self, image: np.ndarray, *args: Any) -> Any:
"""
Performs end-to-end attribute prediction on a given image.
This method orchestrates the full pipeline: it calls the preprocess,
inference, and postprocess steps to return the final, user-friendly
attribute prediction.
Args:
image (np.ndarray): The input image containing the face.
*args: Additional data required for prediction, such as a bounding
box or landmarks.
Returns:
The final predicted attributes.
"""
raise NotImplementedError("Subclasses must implement the predict method.")
def __call__(self, *args, **kwargs) -> Any:
"""
Provides a convenient, callable shortcut for the `predict` method.
"""
return self.predict(*args, **kwargs)

View File

@@ -0,0 +1,170 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import torch
import numpy as np
from typing import Tuple, Union, List
from uniface.attribute.base import Attribute
from uniface.log import Logger
from uniface.constants import DDAMFNWeights
from uniface.face_utils import face_alignment
from uniface.model_store import verify_model_weights
__all__ = ["Emotion"]
class Emotion(Attribute):
"""
Emotion recognition model using a TorchScript model.
This class inherits from the base `Attribute` class and implements the
functionality for predicting one of several emotion categories from a face
image. It requires 5-point facial landmarks for alignment.
"""
def __init__(
self,
model_weights: DDAMFNWeights = DDAMFNWeights.AFFECNET7,
input_size: Tuple[int, int] = (112, 112),
) -> None:
"""
Initializes the emotion recognition model.
Args:
model_weights (DDAMFNWeights): The enum for the model weights to load.
input_size (Tuple[int, int]): The expected input size for the model.
"""
Logger.info(f"Initializing Emotion with model={model_weights.name}")
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.input_size = input_size
self.model_path = verify_model_weights(model_weights)
# Define emotion labels based on the selected model
self.emotion_labels = ["Neutral", "Happy", "Sad", "Surprise", "Fear", "Disgust", "Angry"]
if model_weights == DDAMFNWeights.AFFECNET8:
self.emotion_labels.append("Contempt")
self._initialize_model()
def _initialize_model(self) -> None:
"""
Loads and initializes the TorchScript model for inference.
"""
try:
self.model = torch.jit.load(self.model_path, map_location=self.device)
self.model.eval()
# Warm-up with a dummy input for faster first inference
dummy_input = torch.randn(1, 3, *self.input_size).to(self.device)
with torch.no_grad():
self.model(dummy_input)
Logger.info(f"Successfully initialized Emotion model on {self.device}")
except Exception as e:
Logger.error(f"Failed to load Emotion model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize Emotion model: {e}")
def preprocess(self, image: np.ndarray, landmark: Union[List, np.ndarray]) -> torch.Tensor:
"""
Aligns the face using landmarks and preprocesses it into a tensor.
Args:
image (np.ndarray): The full input image in BGR format.
landmark (Union[List, np.ndarray]): The 5-point facial landmarks.
Returns:
torch.Tensor: The preprocessed image tensor ready for inference.
"""
landmark = np.asarray(landmark)
aligned_image, _ = face_alignment(image, landmark)
# Convert BGR to RGB, resize, normalize, and convert to a CHW tensor
rgb_image = cv2.cvtColor(aligned_image, cv2.COLOR_BGR2RGB)
resized_image = cv2.resize(rgb_image, self.input_size).astype(np.float32) / 255.0
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
normalized_image = (resized_image - mean) / std
transposed_image = normalized_image.transpose((2, 0, 1))
return torch.from_numpy(transposed_image).unsqueeze(0).to(self.device)
def postprocess(self, prediction: torch.Tensor) -> Tuple[str, float]:
"""
Processes the raw model output to get the emotion label and confidence score.
"""
probabilities = torch.nn.functional.softmax(prediction, dim=1).squeeze().cpu().numpy()
pred_index = np.argmax(probabilities)
emotion_label = self.emotion_labels[pred_index]
confidence = float(probabilities[pred_index])
return emotion_label, confidence
def predict(self, image: np.ndarray, landmark: Union[List, np.ndarray]) -> Tuple[str, float]:
"""
Predicts the emotion from a single face specified by its landmarks.
"""
input_tensor = self.preprocess(image, landmark)
with torch.no_grad():
output = self.model(input_tensor)
if isinstance(output, tuple):
output = output[0]
return self.postprocess(output)
# TODO: below is only for testing, remove it later
if __name__ == "__main__":
from uniface.detection import create_detector
from uniface.constants import RetinaFaceWeights
print("Initializing models for live inference...")
# 1. Initialize the face detector
# Using a smaller model for faster real-time performance
detector = create_detector(model_name=RetinaFaceWeights.MNET_V2)
# 2. Initialize the attribute predictor
emotion_predictor = Emotion()
# 3. Start webcam capture
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open webcam.")
exit()
print("Starting webcam feed. Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture frame.")
break
# Detect faces in the current frame.
# This method returns a list of dictionaries for each detected face.
detections = detector.detect(frame)
# For each detected face, predict the emotion
for detection in detections:
box = detection['bbox']
landmark = detection['landmarks']
x1, y1, x2, y2 = map(int, box)
# Predict attributes using the landmark
emotion, confidence = emotion_predictor.predict(frame, landmark)
# Prepare text and draw on the frame
label = f"{emotion} ({confidence:.2f})"
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (255, 0, 0), 2)
# Display the resulting frame
cv2.imshow("Emotion Inference (Press 'q' to quit)", frame)
# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release resources
cap.release()
cv2.destroyAllWindows()
print("Inference stopped.")

View File

@@ -1,4 +1,4 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
@@ -7,7 +7,6 @@ import math
import itertools
import numpy as np
import torch
from typing import Tuple, List
@@ -44,7 +43,7 @@ def resize_image(frame, target_shape: Tuple[int, int] = (640, 640)) -> Tuple[np.
return image, resize_factor
def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> torch.Tensor:
def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> np.ndarray:
"""
Generate anchor boxes for a given image size.
@@ -52,7 +51,7 @@ def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> torch.Tensor:
image_size (Tuple[int, int]): Input image size (width, height). Defaults to (640, 640).
Returns:
torch.Tensor: Anchor box coordinates as a tensor.
np.ndarray: Anchor box coordinates as a NumPy array.
"""
image_size = image_size
@@ -79,11 +78,11 @@ def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> torch.Tensor:
for cy, cx in itertools.product(dense_cy, dense_cx):
anchors += [cx, cy, s_kx, s_ky]
output = torch.Tensor(anchors).view(-1, 4)
output = np.array(anchors, dtype=np.float32).reshape(-1, 4)
return output
def nms(dets: List[np.ndarray], threshold: float):
def non_max_supression(dets: List[np.ndarray], threshold: float):
"""
Apply Non-Maximum Suppression (NMS) to reduce overlapping bounding boxes based on a threshold.
@@ -123,56 +122,57 @@ def nms(dets: List[np.ndarray], threshold: float):
return keep
def decode_boxes(loc, priors, variances=[0.1, 0.2]) -> torch.Tensor:
def decode_boxes(loc, priors, variances=[0.1, 0.2]) -> np.ndarray:
"""
Decode locations from predictions using priors to undo
the encoding done for offset regression at train time.
Args:
loc (tensor): Location predictions for loc layers, shape: [num_priors, 4]
priors (tensor): Prior boxes in center-offset form, shape: [num_priors, 4]
loc (np.ndarray): Location predictions for loc layers, shape: [num_priors, 4]
priors (np.ndarray): Prior boxes in center-offset form, shape: [num_priors, 4]
variances (list[float]): Variances of prior boxes
Returns:
tensor: Decoded bounding box predictions
np.ndarray: Decoded bounding box predictions
"""
# Compute centers of predicted boxes
cxcy = priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:]
# Compute widths and heights of predicted boxes
wh = priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])
wh = priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])
# Convert center, size to corner coordinates
boxes = torch.empty_like(loc)
boxes = np.zeros_like(loc)
boxes[:, :2] = cxcy - wh / 2 # xmin, ymin
boxes[:, 2:] = cxcy + wh / 2 # xmax, ymax
return boxes
def decode_landmarks(predictions, priors, variances=[0.1, 0.2]) -> torch.Tensor:
def decode_landmarks(predictions, priors, variances=[0.1, 0.2]) -> np.ndarray:
"""
Decode landmarks from predictions using prior boxes to reverse the encoding done during training.
Decode landmark predictions using prior boxes.
Args:
predictions (tensor): Landmark predictions for localization layers.
Shape: [num_priors, 10] where each prior contains 5 landmark (x, y) pairs.
priors (tensor): Prior boxes in center-offset form.
Shape: [num_priors, 4], where each prior has (cx, cy, width, height).
variances (list[float]): Variances of the prior boxes to scale the decoded values.
predictions (np.ndarray): Landmark predictions, shape: [num_priors, 10]
priors (np.ndarray): Prior boxes, shape: [num_priors, 4]
variances (list): Scaling factors for landmark offsets.
Returns:
landmarks (tensor): Decoded landmark predictions.
Shape: [num_priors, 10] where each row contains the decoded (x, y) pairs for 5 landmarks.
np.ndarray: Decoded landmarks, shape: [num_priors, 10]
"""
# Reshape predictions to [num_priors, 5, 2] to handle each pair (x, y) in a batch
predictions = predictions.view(predictions.size(0), 5, 2)
# Reshape predictions to [num_priors, 5, 2] to process landmark points
predictions = predictions.reshape(predictions.shape[0], 5, 2)
# Perform the same operation on all landmark pairs at once
landmarks = priors[:, :2].unsqueeze(1) + predictions * variances[0] * priors[:, 2:].unsqueeze(1)
# Expand priors to match (num_priors, 5, 2)
priors_xy = np.repeat(priors[:, :2][:, np.newaxis, :], 5, axis=1) # (num_priors, 5, 2)
priors_wh = np.repeat(priors[:, 2:][:, np.newaxis, :], 5, axis=1) # (num_priors, 5, 2)
# Compute absolute landmark positions
landmarks = priors_xy + predictions * variances[0] * priors_wh
# Flatten back to [num_priors, 10]
landmarks = landmarks.view(landmarks.size(0), -1)
landmarks = landmarks.reshape(landmarks.shape[0], -1)
return landmarks

View File

@@ -1,26 +1,164 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from enum import Enum
from typing import Dict
# fmt: off
class SphereFaceWeights(str, Enum):
"""
Trained on MS1M V2 dataset with 5.8 million images of 85k identities.
https://github.com/yakhyo/face-recognition
"""
SPHERE20 = "sphere20"
SPHERE36 = "sphere36"
MODEL_URLS: Dict[str, str] = {
'retinaface_mnet025': 'https://github.com/yakhyo/uniface/releases/download/v0.1.1/retinaface_mv1_0.25.onnx',
'retinaface_mnet050': 'https://github.com/yakhyo/uniface/releases/download/v0.1.1/retinaface_mv1_0.50.onnx',
'retinaface_mnet_v1': 'https://github.com/yakhyo/uniface/releases/download/v0.1.1/retinaface_mv1.onnx',
'retinaface_mnet_v2': 'https://github.com/yakhyo/uniface/releases/download/v0.1.1/retinaface_mv2.onnx',
'retinaface_r18': 'https://github.com/yakhyo/uniface/releases/download/v0.1.1/retinaface_r18.onnx',
'retinaface_r34': 'https://github.com/yakhyo/uniface/releases/download/v0.1.1/retinaface_r34.onnx'
class MobileFaceWeights(str, Enum):
"""
Trained on MS1M V2 dataset with 5.8 million images of 85k identities.
https://github.com/yakhyo/face-recognition
"""
MNET_025 = "mobilenetv1_025"
MNET_V2 = "mobilenetv2"
MNET_V3_SMALL = "mobilenetv3_small"
MNET_V3_LARGE = "mobilenetv3_large"
class ArcFaceWeights(str, Enum):
"""
Pretrained weights from ArcFace model (insightface).
https://github.com/deepinsight/insightface
"""
MNET = "arcface_mnet"
RESNET = "arcface_resnet"
class RetinaFaceWeights(str, Enum):
"""
Trained on WIDER FACE dataset.
https://github.com/yakhyo/retinaface-pytorch
"""
MNET_025 = "retinaface_mnet025"
MNET_050 = "retinaface_mnet050"
MNET_V1 = "retinaface_mnet_v1"
MNET_V2 = "retinaface_mnet_v2"
RESNET18 = "retinaface_r18"
RESNET34 = "retinaface_r34"
class SCRFDWeights(str, Enum):
"""
Trained on WIDER FACE dataset.
https://github.com/deepinsight/insightface
"""
SCRFD_10G_KPS = "scrfd_10g"
SCRFD_500M_KPS = "scrfd_500m"
class DDAMFNWeights(str, Enum):
"""
Trained on AffectNet dataset.
https://github.com/SainingZhang/DDAMFN/tree/main/DDAMFN
"""
AFFECNET7 = "affecnet7"
AFFECNET8 = "affecnet8"
class AgeGenderWeights(str, Enum):
"""
Trained on CelebA dataset.
https://github.com/deepinsight/insightface
"""
DEFAULT = "age_gender"
class LandmarkWeights(str, Enum):
"""
MobileNet 0.5 from Insightface
https://github.com/deepinsight/insightface/tree/master/alignment/coordinate_reg
"""
DEFAULT = "2d_106"
# fmt: on
MODEL_URLS: Dict[Enum, str] = {
# RetinaFace
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv1_0.25.onnx',
RetinaFaceWeights.MNET_050: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv1_0.50.onnx',
RetinaFaceWeights.MNET_V1: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv1.onnx',
RetinaFaceWeights.MNET_V2: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv2.onnx',
RetinaFaceWeights.RESNET18: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_r18.onnx',
RetinaFaceWeights.RESNET34: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_r34.onnx',
# MobileFace
MobileFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_V2: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_V3_SMALL: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_V3_LARGE: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
# SphereFace
SphereFaceWeights.SPHERE20: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
SphereFaceWeights.SPHERE36: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
# ArcFace
ArcFaceWeights.MNET: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/w600k_mbf.onnx',
ArcFaceWeights.RESNET: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/w600k_r50.onnx',
# SCRFD
SCRFDWeights.SCRFD_10G_KPS: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/scrfd_10g_kps.onnx',
SCRFDWeights.SCRFD_500M_KPS: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/scrfd_500m_kps.onnx',
# DDAFM
DDAMFNWeights.AFFECNET7: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/affecnet7.script',
DDAMFNWeights.AFFECNET8: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/affecnet8.script',
# AgeGender
AgeGenderWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/genderage.onnx',
# Landmarks
LandmarkWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/2d106det.onnx',
}
MODEL_SHA256: Dict[str, str] = {
'retinaface_mnet025': 'b7a7acab55e104dce6f32cdfff929bd83946da5cd869b9e2e9bdffafd1b7e4a5',
'retinaface_mnet050': 'd8977186f6037999af5b4113d42ba77a84a6ab0c996b17c713cc3d53b88bfc37',
'retinaface_mnet_v1': '75c961aaf0aff03d13c074e9ec656e5510e174454dd4964a161aab4fe5f04153',
'retinaface_mnet_v2': '3ca44c045651cabeed1193a1fae8946ad1f3a55da8fa74b341feab5a8319f757',
'retinaface_r18': 'e8b5ddd7d2c3c8f7c942f9f10cec09d8e319f78f09725d3f709631de34fb649d',
'retinaface_r34': 'bd0263dc2a465d32859555cb1741f2d98991eb0053696e8ee33fec583d30e630'
MODEL_SHA256: Dict[Enum, str] = {
# RetinaFace
RetinaFaceWeights.MNET_025: 'b7a7acab55e104dce6f32cdfff929bd83946da5cd869b9e2e9bdffafd1b7e4a5',
RetinaFaceWeights.MNET_050: 'd8977186f6037999af5b4113d42ba77a84a6ab0c996b17c713cc3d53b88bfc37',
RetinaFaceWeights.MNET_V1: '75c961aaf0aff03d13c074e9ec656e5510e174454dd4964a161aab4fe5f04153',
RetinaFaceWeights.MNET_V2: '3ca44c045651cabeed1193a1fae8946ad1f3a55da8fa74b341feab5a8319f757',
RetinaFaceWeights.RESNET18: 'e8b5ddd7d2c3c8f7c942f9f10cec09d8e319f78f09725d3f709631de34fb649d',
RetinaFaceWeights.RESNET34: 'bd0263dc2a465d32859555cb1741f2d98991eb0053696e8ee33fec583d30e630',
# MobileFace
MobileFaceWeights.MNET_025: '#',
MobileFaceWeights.MNET_V2: '#',
MobileFaceWeights.MNET_V3_SMALL: '#',
MobileFaceWeights.MNET_V3_LARGE: '#',
# SphereFace
SphereFaceWeights.SPHERE20: '#',
SphereFaceWeights.SPHERE36: '#',
# ArcFace
ArcFaceWeights.MNET: '9cc6e4a75f0e2bf0b1aed94578f144d15175f357bdc05e815e5c4a02b319eb4f',
ArcFaceWeights.RESNET: '4c06341c33c2ca1f86781dab0e829f88ad5b64be9fba56e56bc9ebdefc619e43',
# SCRFD
SCRFDWeights.SCRFD_10G_KPS: '5838f7fe053675b1c7a08b633df49e7af5495cee0493c7dcf6697200b85b5b91',
SCRFDWeights.SCRFD_500M_KPS: '5e4447f50245bbd7966bd6c0fa52938c61474a04ec7def48753668a9d8b4ea3a',
# DDAFM
DDAMFNWeights.AFFECNET7: '10535bf8b6afe8e9d6ae26cea6c3add9a93036e9addb6adebfd4a972171d015d',
DDAMFNWeights.AFFECNET8: '8c66963bc71db42796a14dfcbfcd181b268b65a3fc16e87147d6a3a3d7e0f487',
# AgeGender
AgeGenderWeights.DEFAULT: '4fde69b1c810857b88c64a335084f1c3fe8f01246c9a191b48c7bb756d6652fb',
# Landmark
LandmarkWeights.DEFAULT: 'f001b856447c413801ef5c42091ed0cd516fcd21f2d6b79635b1e733a7109dbf',
}
CHUNK_SIZE = 8192

View File

@@ -0,0 +1,145 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import numpy as np
from typing import Tuple, Dict, Any, List
from .scrfd import SCRFD
from .base import BaseDetector
from .retinaface import RetinaFace
# Global cache for detector instances
_detector_cache: Dict[str, BaseDetector] = {}
def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> List[Dict[str, Any]]:
"""
High-level face detection function.
Args:
image (np.ndarray): Input image as numpy array.
method (str): Detection method to use. Options: 'retinaface', 'scrfd'.
**kwargs: Additional arguments passed to the detector.
Returns:
List[Dict[str, Any]]: A list of dictionaries, where each dictionary represents a detected face and contains:
- 'bbox' (List[float]): [x1, y1, x2, y2] bounding box coordinates.
- 'confidence' (float): The confidence score of the detection.
- 'landmarks' (List[List[float]]): 5-point facial landmarks.
Example:
>>> from uniface import detect_faces
>>> image = cv2.imread("your_image.jpg")
>>> faces = detect_faces(image, method='retinaface', conf_thresh=0.8)
>>> for face in faces:
... print(f"Found face with confidence: {face['confidence']}")
... print(f"BBox: {face['bbox']}")
"""
method_name = method.lower()
sorted_kwargs = sorted(kwargs.items())
cache_key = f"{method_name}_{str(sorted_kwargs)}"
if cache_key not in _detector_cache:
# Pass kwargs to create the correctly configured detector
_detector_cache[cache_key] = create_detector(method, **kwargs)
detector = _detector_cache[cache_key]
return detector.detect(image)
def create_detector(method: str = 'retinaface', **kwargs) -> BaseDetector:
"""
Factory function to create face detectors.
Args:
method (str): Detection method. Options:
- 'retinaface': RetinaFace detector (default)
- 'scrfd': SCRFD detector (fast and accurate)
**kwargs: Detector-specific parameters
Returns:
BaseDetector: Initialized detector instance
Raises:
ValueError: If method is not supported
Examples:
>>> # Basic usage
>>> detector = create_detector('retinaface')
>>> # SCRFD detector with custom parameters
>>> detector = create_detector(
... 'scrfd',
... model_name=SCRFDWeights.SCRFD_10G_KPS,
... conf_thresh=0.8,
... input_size=(640, 640)
... )
>>> # RetinaFace detector
>>> detector = create_detector(
... 'retinaface',
... model_name=RetinaFaceWeights.MNET_V2,
... conf_thresh=0.8,
... nms_thresh=0.4
... )
"""
method = method.lower()
if method == 'retinaface':
return RetinaFace(**kwargs)
elif method == 'scrfd':
return SCRFD(**kwargs)
else:
available_methods = ['retinaface', 'scrfd']
raise ValueError(
f"Unsupported detection method: '{method}'. "
f"Available methods: {available_methods}"
)
def list_available_detectors() -> Dict[str, Dict[str, Any]]:
"""
List all available detection methods with their descriptions and parameters.
Returns:
Dict[str, Dict[str, Any]]: Dictionary of detector information
"""
return {
'retinaface': {
'description': 'RetinaFace detector with high accuracy',
'supports_landmarks': True,
'paper': 'https://arxiv.org/abs/1905.00641',
'default_params': {
'model_name': 'mnet_v2',
'conf_thresh': 0.5,
'nms_thresh': 0.4,
'input_size': (640, 640)
}
},
'scrfd': {
'description': 'SCRFD detector - fast and accurate with efficient architecture',
'supports_landmarks': True,
'paper': 'https://arxiv.org/abs/2105.04714',
'default_params': {
'model_name': 'scrfd_10g_kps',
'conf_thresh': 0.5,
'nms_thresh': 0.4,
'input_size': (640, 640)
}
}
}
__all__ = [
'detect_faces',
'create_detector',
'list_available_detectors',
'SCRFD',
'RetinaFace',
'BaseDetector',
]

100
uniface/detection/base.py Normal file
View File

@@ -0,0 +1,100 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""
Base classes for face detection.
"""
import numpy as np
from abc import ABC, abstractmethod
from typing import Tuple, Dict, Any
class BaseDetector(ABC):
"""
Abstract base class for all face detectors.
This class defines the interface that all face detectors must implement,
ensuring consistency across different detection methods.
"""
def __init__(self, **kwargs):
"""Initialize the detector with configuration parameters."""
self.config = kwargs
@abstractmethod
def detect(self, image: np.ndarray, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
"""
Detect faces in an image.
Args:
image (np.ndarray): Input image as numpy array with shape (H, W, C)
**kwargs: Additional detection parameters
Returns:
Tuple[np.ndarray, np.ndarray]: (detections, landmarks)
- detections: Bounding boxes with confidence scores, shape (N, 5)
Format: [x_min, y_min, x_max, y_max, confidence]
- landmarks: Facial landmark points, shape (N, 5, 2) for 5-point landmarks
or (N, 68, 2) for 68-point landmarks. Empty array if not supported.
"""
pass
@abstractmethod
def preprocess(self, image: np.ndarray) -> np.ndarray:
"""
Preprocess input image for detection.
Args:
image (np.ndarray): Input image
Returns:
np.ndarray: Preprocessed image tensor
"""
pass
@abstractmethod
def postprocess(self, outputs, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
"""
Postprocess model outputs to get final detections.
Args:
outputs: Raw model outputs
**kwargs: Additional postprocessing parameters
Returns:
Tuple[np.ndarray, np.ndarray]: (detections, landmarks)
"""
pass
def __str__(self) -> str:
"""String representation of the detector."""
return f"{self.__class__.__name__}({self.config})"
def __repr__(self) -> str:
"""Detailed string representation."""
return self.__str__()
@property
def supports_landmarks(self) -> bool:
"""
Whether this detector supports landmark detection.
Returns:
bool: True if landmarks are supported, False otherwise
"""
return hasattr(self, '_supports_landmarks') and self._supports_landmarks
def get_info(self) -> Dict[str, Any]:
"""
Get detector information and configuration.
Returns:
Dict[str, Any]: Detector information
"""
return {
'name': self.__class__.__name__,
'supports_landmarks': self._supports_landmarks,
'config': self.config
}

View File

@@ -0,0 +1,329 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import numpy as np
from typing import Tuple, List, Literal, Dict, Any
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.constants import RetinaFaceWeights
from uniface.onnx_utils import create_onnx_session
from .base import BaseDetector
from .utils import (
non_max_supression,
resize_image,
decode_boxes,
generate_anchors,
decode_landmarks
)
class RetinaFace(BaseDetector):
"""
Face detector based on the RetinaFace architecture.
Title: "RetinaFace: Single-stage Dense Face Localisation in the Wild"
Paper: https://arxiv.org/abs/1905.00641
Args:
**kwargs: Keyword arguments passed to BaseDetector and RetinaFace. Supported keys include:
model_name (RetinaFaceWeights, optional): Model weights to use. Defaults to `RetinaFaceWeights.MNET_V2`.
conf_thresh (float, optional): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float, optional): Non-maximum suppression (NMS) IoU threshold. Defaults to 0.4.
pre_nms_topk (int, optional): Number of top-scoring boxes considered before NMS. Defaults to 5000.
post_nms_topk (int, optional): Max number of detections kept after NMS. Defaults to 750.
dynamic_size (bool, optional): If True, generate anchors dynamically per input image. Defaults to False.
input_size (Tuple[int, int], optional): Fixed input size (width, height) if `dynamic_size=False`. Defaults to (640, 640).
Attributes:
model_name (RetinaFaceWeights): Selected model variant.
conf_thresh (float): Threshold for confidence-based filtering.
nms_thresh (float): IoU threshold used for NMS.
pre_nms_topk (int): Limit on proposals before applying NMS.
post_nms_topk (int): Limit on retained detections after NMS.
dynamic_size (bool): Flag indicating dynamic or static input sizing.
input_size (Tuple[int, int]): Static input size if `dynamic_size=False`.
_model_path (str): Absolute path to the verified model weights.
_priors (np.ndarray): Precomputed anchor boxes (if static size).
_supports_landmarks (bool): Indicates landmark prediction support.
Raises:
ValueError: If the model weights are invalid or not found.
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._supports_landmarks = True # RetinaFace supports landmarks
self.model_name = kwargs.get('model_name', RetinaFaceWeights.MNET_V2)
self.conf_thresh = kwargs.get('conf_thresh', 0.5)
self.nms_thresh = kwargs.get('nms_thresh', 0.4)
self.pre_nms_topk = kwargs.get('pre_nms_topk', 5000)
self.post_nms_topk = kwargs.get('post_nms_topk', 750)
self.dynamic_size = kwargs.get('dynamic_size', False)
self.input_size = kwargs.get('input_size', (640, 640))
Logger.info(
f"Initializing RetinaFace with model={self.model_name}, conf_thresh={self.conf_thresh}, nms_thresh={self.nms_thresh}, "
f"input_size={self.input_size}"
)
# Get path to model weights
self._model_path = verify_model_weights(self.model_name)
Logger.info(f"Verified model weights located at: {self._model_path}")
# Precompute anchors if using static size
if not self.dynamic_size and self.input_size is not None:
self._priors = generate_anchors(image_size=self.input_size)
Logger.debug("Generated anchors for static input size.")
# Initialize model
self._initialize_model(self._model_path)
def _initialize_model(self, model_path: str) -> None:
"""
Initializes an ONNX model session from the given path.
Args:
model_path (str): The file path to the ONNX model.
Raises:
RuntimeError: If the model fails to load, logs an error and raises an exception.
"""
try:
self.session = create_onnx_session(model_path)
self.input_names = self.session.get_inputs()[0].name
self.output_names = [x.name for x in self.session.get_outputs()]
Logger.info(f"Successfully initialized the model from {model_path}")
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> np.ndarray:
"""Preprocess input image for model inference.
Args:
image (np.ndarray): Input image.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, C, H, W)
"""
image = np.float32(image) - np.array([104, 117, 123], dtype=np.float32)
image = image.transpose(2, 0, 1) # HWC to CHW
image = np.expand_dims(image, axis=0) # Add batch dimension (1, C, H, W)
return image
def inference(self, input_tensor: np.ndarray) -> List[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
Args:
input_tensor (np.ndarray): Preprocessed input tensor.
Returns:
Tuple[np.ndarray, np.ndarray]: Raw model outputs.
"""
return self.session.run(self.output_names, {self.input_names: input_tensor})
def detect(
self,
image: np.ndarray,
max_num: int = 0,
metric: Literal["default", "max"] = "max",
center_weight: float = 2.0
) -> List[Dict[str, Any]]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
Args:
image (np.ndarray): Input image as a NumPy array of shape (H, W, C).
max_num (int): Maximum number of detections to return. Use 0 to return all detections. Defaults to 0.
metric (Literal["default", "max"]): Metric for ranking detections when `max_num` is limited.
- "default": Prioritize detections closer to the image center.
- "max": Prioritize detections with larger bounding box areas.
center_weight (float): Weight for penalizing detections farther from the image center
when using the "default" metric. Defaults to 2.0.
Returns:
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
- 'bbox': [x1, y1, x2, y2] - Bounding box coordinates
- 'confidence': float - Detection confidence score
- 'landmarks': [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5]] - 5-point facial landmarks
"""
original_height, original_width = image.shape[:2]
if self.dynamic_size:
height, width, _ = image.shape
self._priors = generate_anchors(image_size=(height, width)) # generate anchors for each input image
resize_factor = 1.0 # No resizing
else:
image, resize_factor = resize_image(image, target_shape=self.input_size)
height, width, _ = image.shape
image_tensor = self.preprocess(image)
# ONNXRuntime inference
outputs = self.inference(image_tensor)
# Postprocessing
detections, landmarks = self.postprocess(outputs, resize_factor, shape=(width, height))
if max_num > 0 and detections.shape[0] > max_num:
# Calculate area of detections
areas = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack([
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0]
])
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
# Calculate scores based on the chosen metric
if metric == 'max':
scores = areas
else:
scores = areas - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(scores)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face_dict = {
'bbox': detections[i, :4].astype(float).tolist(),
'confidence': detections[i, 4].item(),
'landmarks': landmarks[i].astype(float).tolist()
}
faces.append(face_dict)
return faces
def postprocess(self, outputs: List[np.ndarray], resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
"""
Process the model outputs into final detection results.
Args:
outputs (List[np.ndarray]): Raw outputs from the detection model.
- outputs[0]: Location predictions (bounding box coordinates).
- outputs[1]: Class confidence scores.
- outputs[2]: Landmark predictions.
resize_factor (float): Factor used to resize the input image during preprocessing.
shape (Tuple[int, int]): Original shape of the image as (height, width).
Returns:
Tuple[np.ndarray, np.ndarray]: Processed results containing:
- detections (np.ndarray): Array of detected bounding boxes with confidence scores.
Shape: (num_detections, 5), where each row is [x_min, y_min, x_max, y_max, score].
- landmarks (np.ndarray): Array of detected facial landmarks.
Shape: (num_detections, 5, 2), where each row contains 5 landmark points (x, y).
"""
loc, conf, landmarks = outputs[0].squeeze(0), outputs[1].squeeze(0), outputs[2].squeeze(0)
# Decode boxes and landmarks
boxes = decode_boxes(loc, self._priors)
landmarks = decode_landmarks(landmarks, self._priors)
boxes, landmarks = self._scale_detections(boxes, landmarks, resize_factor, shape=(shape[0], shape[1]))
# Extract confidence scores for the face class
scores = conf[:, 1]
mask = scores > self.conf_thresh
# Filter by confidence threshold
boxes, landmarks, scores = boxes[mask], landmarks[mask], scores[mask]
# Sort by scores
order = scores.argsort()[::-1][:self.pre_nms_topk]
boxes, landmarks, scores = boxes[order], landmarks[order], scores[order]
# Apply NMS
detections = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
keep = non_max_supression(detections, self.nms_thresh)
detections, landmarks = detections[keep], landmarks[keep]
# Keep top-k detections
detections, landmarks = detections[:self.post_nms_topk], landmarks[:self.post_nms_topk]
landmarks = landmarks.reshape(-1, 5, 2).astype(np.int32)
return detections, landmarks
def _scale_detections(self, boxes: np.ndarray, landmarks: np.ndarray, resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
# Scale bounding boxes and landmarks to the original image size.
bbox_scale = np.array([shape[0], shape[1]] * 2)
boxes = boxes * bbox_scale / resize_factor
landmark_scale = np.array([shape[0], shape[1]] * 5)
landmarks = landmarks * landmark_scale / resize_factor
return boxes, landmarks
# TODO: below is only for testing, remove it later
def draw_bbox(frame, bbox, score, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = map(int, bbox) # Unpack 4 bbox values
cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness)
cv2.putText(frame, f"{score:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
for (x, y) in points.astype(np.int32):
cv2.circle(frame, (int(x), int(y)), radius, color, -1)
if __name__ == "__main__":
import cv2
detector = RetinaFace(model_name=RetinaFaceWeights.MNET_050)
print(detector.get_info())
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("❌ Failed to open webcam.")
exit()
print("📷 Webcam started. Press 'q' to exit.")
while True:
ret, frame = cap.read()
if not ret:
print("❌ Failed to read frame.")
break
# Get face detections as list of dictionaries
faces = detector.detect(frame)
# Process each detected face
for face in faces:
# Extract bbox and landmarks from dictionary
bbox = face['bbox'] # [x1, y1, x2, y2]
landmarks = face['landmarks'] # [[x1, y1], [x2, y2], ...]
confidence = face['confidence']
# Pass bbox and confidence separately
draw_bbox(frame, bbox, confidence)
# Convert landmarks to numpy array format if needed
if landmarks is not None and len(landmarks) > 0:
# Convert list of [x, y] pairs to numpy array
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
draw_keypoints(frame, points)
# Display face count
cv2.putText(frame, f"Faces: {len(faces)}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.imshow("FaceDetection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()

321
uniface/detection/scrfd.py Normal file
View File

@@ -0,0 +1,321 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Any, Dict, List, Literal, Tuple
import cv2
import numpy as np
from uniface.constants import SCRFDWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseDetector
from .utils import distance2bbox, distance2kps, non_max_supression, resize_image
__all__ = ["SCRFD"]
class SCRFD(BaseDetector):
"""
Face detector based on the SCRFD architecture.
Title: "Sample and Computation Redistribution for Efficient Face Detection"
Paper: https://arxiv.org/abs/2105.04714
Args:
**kwargs: Keyword arguments passed to BaseDetector and SCRFD. Supported keys include:
model_name (SCRFDWeights, optional): Predefined model enum (e.g., `SCRFD_10G_KPS`).
Specifies the SCRFD variant to load. Defaults to SCRFD_10G_KPS.
conf_thresh (float, optional): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float, optional): Non-Maximum Suppression threshold. Defaults to 0.4.
input_size (Tuple[int, int], optional): Input image size (width, height). Defaults to (640, 640).
Attributes:
conf_thresh (float): Threshold used to filter low-confidence detections.
nms_thresh (float): Threshold used during NMS to suppress overlapping boxes.
input_size (Tuple[int, int]): Image size to which inputs are resized before inference.
_fmc (int): Number of feature map levels used in the model.
_feat_stride_fpn (List[int]): Feature map strides corresponding to each detection level.
_num_anchors (int): Number of anchors per feature location.
_center_cache (Dict): Cached anchor centers for efficient forward passes.
_model_path (str): Absolute path to the downloaded/verified model weights.
Raises:
ValueError: If the model weights are invalid or not found.
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._supports_landmarks = True # SCRFD supports landmarks
model_name = kwargs.get("model_name", SCRFDWeights.SCRFD_10G_KPS)
conf_thresh = kwargs.get("conf_thresh", 0.5)
nms_thresh = kwargs.get("nms_thresh", 0.4)
input_size = kwargs.get("input_size", (640, 640))
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
self.input_size = input_size
# ------- SCRFD model params ------
self._fmc = 3
self._feat_stride_fpn = [8, 16, 32]
self._num_anchors = 2
self._center_cache = {}
# ---------------------------------
Logger.info(
f"Initializing SCRFD with model={model_name}, conf_thresh={conf_thresh}, nms_thresh={nms_thresh}, "
f"input_size={input_size}"
)
# Get path to model weights
self._model_path = verify_model_weights(model_name)
Logger.info(f"Verified model weights located at: {self._model_path}")
# Initialize model
self._initialize_model(self._model_path)
def _initialize_model(self, model_path: str) -> None:
"""
Initializes an ONNX model session from the given path.
Args:
model_path (str): The file path to the ONNX model.
Raises:
RuntimeError: If the model fails to load, logs an error and raises an exception.
"""
try:
self.session = create_onnx_session(model_path)
self.input_names = self.session.get_inputs()[0].name
self.output_names = [x.name for x in self.session.get_outputs()]
Logger.info(f"Successfully initialized the model from {model_path}")
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> Tuple[np.ndarray, Tuple[int, int]]:
"""Preprocess image for inference.
Args:
image (np.ndarray): Input image
Returns:
Tuple[np.ndarray, Tuple[int, int]]: Preprocessed blob and input size
"""
image = image.astype(np.float32)
image = (image - 127.5) / 127.5
image = image.transpose(2, 0, 1) # HWC to CHW
image = np.expand_dims(image, axis=0)
return image
def inference(self, input_tensor: np.ndarray) -> List[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
Args:
input_tensor (np.ndarray): Preprocessed input tensor.
Returns:
Tuple[np.ndarray, np.ndarray]: Raw model outputs.
"""
return self.session.run(self.output_names, {self.input_names: input_tensor})
def postprocess(self, outputs: List[np.ndarray], image_size: Tuple[int, int]):
scores_list = []
bboxes_list = []
kpss_list = []
image_size = image_size
fmc = self._fmc
for idx, stride in enumerate(self._feat_stride_fpn):
scores = outputs[idx]
bbox_preds = outputs[fmc + idx] * stride
kps_preds = outputs[2 * fmc + idx] * stride
# Generate anchors
fm_height = image_size[0] // stride
fm_width = image_size[1] // stride
cache_key = (fm_height, fm_width, stride)
if cache_key in self._center_cache:
anchor_centers = self._center_cache[cache_key]
else:
y, x = np.mgrid[:fm_height, :fm_width]
anchor_centers = np.stack((x, y), axis=-1).astype(np.float32)
anchor_centers = (anchor_centers * stride).reshape(-1, 2)
if self._num_anchors > 1:
anchor_centers = np.tile(anchor_centers[:, None, :], (1, self._num_anchors, 1)).reshape(-1, 2)
if len(self._center_cache) < 100:
self._center_cache[cache_key] = anchor_centers
pos_indices = np.where(scores >= self.conf_thresh)[0]
if len(pos_indices) == 0:
continue
bboxes = distance2bbox(anchor_centers, bbox_preds)[pos_indices]
scores_selected = scores[pos_indices]
scores_list.append(scores_selected)
bboxes_list.append(bboxes)
landmarks = distance2kps(anchor_centers, kps_preds)
landmarks = landmarks.reshape((landmarks.shape[0], -1, 2))
kpss_list.append(landmarks[pos_indices])
return scores_list, bboxes_list, kpss_list
def detect(
self, image: np.ndarray, max_num: int = 0, metric: Literal["default", "max"] = "max", center_weight: float = 2
) -> List[Dict[str, Any]]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
Args:
image (np.ndarray): Input image as a NumPy array of shape (H, W, C).
max_num (int): Maximum number of detections to return. Use 0 to return all detections. Defaults to 0.
metric (Literal["default", "max"]): Metric for ranking detections when `max_num` is limited.
- "default": Prioritize detections closer to the image center.
- "max": Prioritize detections with larger bounding box areas.
center_weight (float): Weight for penalizing detections farther from the image center
when using the "default" metric. Defaults to 2.0.
Returns:
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
- 'bbox': [x1, y1, x2, y2] - Bounding box coordinates
- 'confidence': float - Detection confidence score
- 'landmarks': [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5]] - 5-point facial landmarks
"""
original_height, original_width = image.shape[:2]
image, resize_factor = resize_image(image, target_shape=self.input_size)
image_tensor = self.preprocess(image)
# ONNXRuntime inference
outputs = self.inference(image_tensor)
scores_list, bboxes_list, kpss_list = self.postprocess(outputs, image_size=image.shape[:2])
# Handle case when no faces are detected
if not scores_list:
return []
scores = np.vstack(scores_list)
scores_ravel = scores.ravel()
order = scores_ravel.argsort()[::-1]
bboxes = np.vstack(bboxes_list) / resize_factor
landmarks = np.vstack(kpss_list) / resize_factor
pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)
pre_det = pre_det[order, :]
keep = non_max_supression(pre_det, threshold=self.nms_thresh)
detections = pre_det[keep, :]
landmarks = landmarks[order, :, :]
landmarks = landmarks[keep, :, :].astype(np.int32)
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == "max":
values = area
else:
values = area - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face_dict = {
"bbox": detections[i, :4].astype(float).tolist(),
"confidence": detections[i, 4].item(),
"landmarks": landmarks[i].astype(float).tolist(),
}
faces.append(face_dict)
return faces
# TODO: below is only for testing, remove it later
def draw_bbox(frame, bbox, score, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = map(int, bbox) # Unpack 4 bbox values
cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness)
cv2.putText(frame, f"{score:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
for x, y in points.astype(np.int32):
cv2.circle(frame, (int(x), int(y)), radius, color, -1)
if __name__ == "__main__":
detector = SCRFD(model_name=SCRFDWeights.SCRFD_500M_KPS)
print(detector.get_info())
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("❌ Failed to open webcam.")
exit()
print("📷 Webcam started. Press 'q' to exit.")
while True:
ret, frame = cap.read()
if not ret:
print("❌ Failed to read frame.")
break
# Get face detections as list of dictionaries
faces = detector.detect(frame)
# Process each detected face
for face in faces:
# Extract bbox and landmarks from dictionary
bbox = face["bbox"] # [x1, y1, x2, y2]
landmarks = face["landmarks"] # [[x1, y1], [x2, y2], ...]
confidence = face["confidence"]
# Pass bbox and confidence separately
draw_bbox(frame, bbox, confidence)
# Convert landmarks to numpy array format if needed
if landmarks is not None and len(landmarks) > 0:
# Convert list of [x, y] pairs to numpy array
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
draw_keypoints(frame, points)
# Display face count
cv2.putText(frame, f"Faces: {len(faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.imshow("FaceDetection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()

232
uniface/detection/utils.py Normal file
View File

@@ -0,0 +1,232 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import math
import itertools
import numpy as np
from typing import Tuple, List
def resize_image(frame, target_shape: Tuple[int, int] = (640, 640)) -> Tuple[np.ndarray, float]:
"""
Resize an image to fit within a target shape while keeping its aspect ratio.
Args:
frame (np.ndarray): Input image.
target_shape (Tuple[int, int]): Target size (width, height). Defaults to (640, 640).
Returns:
Tuple[np.ndarray, float]: Resized image on a blank canvas and the resize factor.
"""
width, height = target_shape
# Aspect-ratio preserving resize
im_ratio = float(frame.shape[0]) / frame.shape[1]
model_ratio = height / width
if im_ratio > model_ratio:
new_height = height
new_width = int(new_height / im_ratio)
else:
new_width = width
new_height = int(new_width * im_ratio)
resize_factor = float(new_height) / frame.shape[0]
resized_frame = cv2.resize(frame, (new_width, new_height))
# Create blank image and place resized image on it
image = np.zeros((height, width, 3), dtype=np.uint8)
image[:new_height, :new_width, :] = resized_frame
return image, resize_factor
def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> np.ndarray:
"""
Generate anchor boxes for a given image size.
Args:
image_size (Tuple[int, int]): Input image size (width, height). Defaults to (640, 640).
Returns:
np.ndarray: Anchor box coordinates as a NumPy array.
"""
image_size = image_size
steps = [8, 16, 32]
min_sizes = [[16, 32], [64, 128], [256, 512]]
anchors = []
feature_maps = [
[
math.ceil(image_size[0] / step),
math.ceil(image_size[1] / step)
] for step in steps
]
for k, (map_height, map_width) in enumerate(feature_maps):
step = steps[k]
for i, j in itertools.product(range(map_height), range(map_width)):
for min_size in min_sizes[k]:
s_kx = min_size / image_size[1]
s_ky = min_size / image_size[0]
dense_cx = [x * step / image_size[1] for x in [j + 0.5]]
dense_cy = [y * step / image_size[0] for y in [i + 0.5]]
for cy, cx in itertools.product(dense_cy, dense_cx):
anchors += [cx, cy, s_kx, s_ky]
output = np.array(anchors, dtype=np.float32).reshape(-1, 4)
return output
def non_max_supression(dets: List[np.ndarray], threshold: float):
"""
Apply Non-Maximum Suppression (NMS) to reduce overlapping bounding boxes based on a threshold.
Args:
dets (numpy.ndarray): Array of detections with each row as [x1, y1, x2, y2, score].
threshold (float): IoU threshold for suppression.
Returns:
list: Indices of bounding boxes retained after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= threshold)[0]
order = order[inds + 1]
return keep
def decode_boxes(loc, priors, variances=[0.1, 0.2]) -> np.ndarray:
"""
Decode locations from predictions using priors to undo
the encoding done for offset regression at train time.
Args:
loc (np.ndarray): Location predictions for loc layers, shape: [num_priors, 4]
priors (np.ndarray): Prior boxes in center-offset form, shape: [num_priors, 4]
variances (list[float]): Variances of prior boxes
Returns:
np.ndarray: Decoded bounding box predictions
"""
# Compute centers of predicted boxes
cxcy = priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:]
# Compute widths and heights of predicted boxes
wh = priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])
# Convert center, size to corner coordinates
boxes = np.zeros_like(loc)
boxes[:, :2] = cxcy - wh / 2 # xmin, ymin
boxes[:, 2:] = cxcy + wh / 2 # xmax, ymax
return boxes
def decode_landmarks(predictions, priors, variances=[0.1, 0.2]) -> np.ndarray:
"""
Decode landmark predictions using prior boxes.
Args:
predictions (np.ndarray): Landmark predictions, shape: [num_priors, 10]
priors (np.ndarray): Prior boxes, shape: [num_priors, 4]
variances (list): Scaling factors for landmark offsets.
Returns:
np.ndarray: Decoded landmarks, shape: [num_priors, 10]
"""
# Reshape predictions to [num_priors, 5, 2] to process landmark points
predictions = predictions.reshape(predictions.shape[0], 5, 2)
# Expand priors to match (num_priors, 5, 2)
priors_xy = np.repeat(priors[:, :2][:, np.newaxis, :], 5, axis=1) # (num_priors, 5, 2)
priors_wh = np.repeat(priors[:, 2:][:, np.newaxis, :], 5, axis=1) # (num_priors, 5, 2)
# Compute absolute landmark positions
landmarks = priors_xy + predictions * variances[0] * priors_wh
# Flatten back to [num_priors, 10]
landmarks = landmarks.reshape(landmarks.shape[0], -1)
return landmarks
def distance2bbox(points, distance, max_shape=None):
"""Decode distance prediction to bounding box.
Args:
points (Tensor): Shape (n, 2), [x, y].
distance (Tensor): Distance from the given point to 4
boundaries (left, top, right, bottom).
max_shape (tuple): Shape of the image.
Returns:
Tensor: Decoded bounding boxes with shape (n, 4).
"""
x1 = points[:, 0] - distance[:, 0]
y1 = points[:, 1] - distance[:, 1]
x2 = points[:, 0] + distance[:, 2]
y2 = points[:, 1] + distance[:, 3]
if max_shape is not None:
x1 = x1.clamp(min=0, max=max_shape[1])
y1 = y1.clamp(min=0, max=max_shape[0])
x2 = x2.clamp(min=0, max=max_shape[1])
y2 = y2.clamp(min=0, max=max_shape[0])
else:
x1 = np.maximum(x1, 0)
y1 = np.maximum(y1, 0)
x2 = np.maximum(x2, 0)
y2 = np.maximum(y2, 0)
return np.stack([x1, y1, x2, y2], axis=-1)
def distance2kps(points, distance, max_shape=None):
"""Decode distance prediction to keypoints.
Args:
points (Tensor): Shape (n, 2), [x, y].
distance (Tensor): Distance from the given point to 4
boundaries (left, top, right, bottom).
max_shape (tuple): Shape of the image.
Returns:
Tensor: Decoded keypoints with shape (n, 2k).
"""
preds = []
for i in range(0, distance.shape[1], 2):
px = points[:, i % 2] + distance[:, i]
py = points[:, i % 2 + 1] + distance[:, i + 1]
if max_shape is not None:
px = px.clamp(min=0, max=max_shape[1])
py = py.clamp(min=0, max=max_shape[0])
preds.append(px)
preds.append(py)
return np.stack(preds, axis=-1)

170
uniface/face_utils.py Normal file
View File

@@ -0,0 +1,170 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
from skimage.transform import SimilarityTransform
from typing import Tuple
__all__ = ["face_alignment", "compute_similarity", "bbox_center_alignment", "transform_points_2d"]
# Reference alignment for facial landmarks (ArcFace)
reference_alignment: np.ndarray = np.array(
[
[38.2946, 51.6963],
[73.5318, 51.5014],
[56.0252, 71.7366],
[41.5493, 92.3655],
[70.7299, 92.2041]
],
dtype=np.float32
)
def estimate_norm(landmark: np.ndarray, image_size: int = 112) -> Tuple[np.ndarray, np.ndarray]:
"""
Estimate the normalization transformation matrix for facial landmarks.
Args:
landmark (np.ndarray): Array of shape (5, 2) representing the coordinates of the facial landmarks.
image_size (int, optional): The size of the output image. Default is 112.
Returns:
np.ndarray: The 2x3 transformation matrix for aligning the landmarks.
np.ndarray: The 2x3 inverse transformation matrix for aligning the landmarks.
Raises:
AssertionError: If the input landmark array does not have the shape (5, 2)
or if image_size is not a multiple of 112 or 128.
"""
assert landmark.shape == (5, 2), "Landmark array must have shape (5, 2)."
assert image_size % 112 == 0 or image_size % 128 == 0, "Image size must be a multiple of 112 or 128."
if image_size % 112 == 0:
ratio = float(image_size) / 112.0
diff_x = 0.0
else:
ratio = float(image_size) / 128.0
diff_x = 8.0 * ratio
# Adjust reference alignment based on ratio and diff_x
alignment = reference_alignment * ratio
alignment[:, 0] += diff_x
# Compute the transformation matrix
transform = SimilarityTransform()
transform.estimate(landmark, alignment)
matrix = transform.params[0:2, :]
inverse_matrix = np.linalg.inv(transform.params)[0:2, :]
return matrix, inverse_matrix
def face_alignment(image: np.ndarray, landmark: np.ndarray, image_size: int = 112) -> Tuple[np.ndarray, np.ndarray]:
"""
Align the face in the input image based on the given facial landmarks.
Args:
image (np.ndarray): Input image as a NumPy array.
landmark (np.ndarray): Array of shape (5, 2) representing the coordinates of the facial landmarks.
image_size (int, optional): The size of the aligned output image. Default is 112.
Returns:
np.ndarray: The aligned face as a NumPy array.
np.ndarray: The 2x3 transformation matrix used for alignment.
"""
# Get the transformation matrix
M, M_inv = estimate_norm(landmark, image_size)
# Warp the input image to align the face
warped = cv2.warpAffine(image, M, (image_size, image_size), borderValue=0.0)
return warped, M_inv
def compute_similarity(feat1: np.ndarray, feat2: np.ndarray, normalized: bool = False) -> np.float32:
"""Computing Similarity between two faces.
Args:
feat1 (np.ndarray): First embedding.
feat2 (np.ndarray): Second embedding.
normalized (bool): Set True if the embeddings are already L2 normalized.
Returns:
np.float32: Cosine similarity.
"""
feat1 = feat1.ravel()
feat2 = feat2.ravel()
if normalized:
return np.dot(feat1, feat2)
else:
return np.dot(feat1, feat2) / (np.linalg.norm(feat1) * np.linalg.norm(feat2) + 1e-5)
def bbox_center_alignment(image, center, output_size, scale, rotation):
"""
Apply center-based alignment, scaling, and rotation to an image.
Args:
image (np.ndarray): Input image.
center (Tuple[float, float]): Center point (e.g., face center from bbox).
output_size (int): Desired output image size (square).
scale (float): Scaling factor to zoom in/out.
rotation (float): Rotation angle in degrees (clockwise).
Returns:
cropped (np.ndarray): Aligned and cropped image.
M (np.ndarray): 2x3 affine transform matrix used.
"""
# Convert rotation from degrees to radians
rot = float(rotation) * np.pi / 180.0
# Scale the image
t1 = SimilarityTransform(scale=scale)
# Translate the center point to the origin (after scaling)
cx = center[0] * scale
cy = center[1] * scale
t2 = SimilarityTransform(translation=(-1 * cx, -1 * cy))
# Apply rotation around origin (center of face)
t3 = SimilarityTransform(rotation=rot)
# Translate origin to center of output image
t4 = SimilarityTransform(translation=(output_size / 2, output_size / 2))
# Combine all transformations in order: scale → center shift → rotate → recentralize
t = t1 + t2 + t3 + t4
# Extract 2x3 affine matrix
M = t.params[0:2]
# Warp the image using OpenCV
cropped = cv2.warpAffine(image, M, (output_size, output_size), borderValue=0.0)
return cropped, M
def transform_points_2d(points: np.ndarray, transform: np.ndarray) -> np.ndarray:
"""
Apply a 2D affine transformation to an array of 2D points.
Args:
points (np.ndarray): An (N, 2) array of 2D points.
transform (np.ndarray): A (2, 3) affine transformation matrix.
Returns:
np.ndarray: Transformed (N, 2) array of points.
"""
transformed = np.zeros_like(points, dtype=np.float32)
for i in range(points.shape[0]):
point = np.array([points[i, 0], points[i, 1], 1.0], dtype=np.float32)
result = np.dot(transform, point)
transformed[i] = result[:2]
return transformed

View File

@@ -0,0 +1,32 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from .models import Landmark106
from .base import BaseLandmarker
def create_landmarker(method: str = '2d106det', **kwargs) -> BaseLandmarker:
"""
Factory function to create facial landmark predictors.
Args:
method (str): Landmark prediction method. Options: '106'.
**kwargs: Model-specific parameters.
Returns:
Initialized landmarker instance.
"""
method = method.lower()
if method == '2d106det':
return Landmark106(**kwargs)
else:
available = ['2d106det']
raise ValueError(f"Unsupported method: '{method}'. Available: {available}")
__all__ = [
"create_landmarker",
"Landmark106",
"BaseLandmarker"
]

30
uniface/landmark/base.py Normal file
View File

@@ -0,0 +1,30 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
import numpy as np
class BaseLandmarker(ABC):
"""
Abstract Base Class for all facial landmark models.
"""
@abstractmethod
def get_landmarks(self, image: np.ndarray, bbox: np.ndarray) -> np.ndarray:
"""
Predicts facial landmarks for a given face bounding box.
This method defines the standard interface for all landmark predictors.
It takes a full image and a bounding box for a single face and returns
the predicted keypoints for that face.
Args:
image (np.ndarray): The full source image in BGR format.
bbox (np.ndarray): A bounding box of a face [x1, y1, x2, y2].
Returns:
np.ndarray: An array of predicted landmark points with shape (N, 2),
where N is the number of landmarks.
"""
raise NotImplementedError

214
uniface/landmark/models.py Normal file
View File

@@ -0,0 +1,214 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
from typing import Tuple
from uniface.log import Logger
from uniface.constants import LandmarkWeights
from uniface.model_store import verify_model_weights
from uniface.face_utils import bbox_center_alignment, transform_points_2d
from uniface.onnx_utils import create_onnx_session
from .base import BaseLandmarker
__all__ = ['Landmark']
class Landmark106(BaseLandmarker):
"""Facial landmark model for predicting 106 facial keypoints.
This class implements the BaseLandmarker and provides an end-to-end
pipeline for 106-point facial landmark detection. It handles model
loading, preprocessing of a face crop based on a bounding box,
inference, and post-processing to map landmarks back to the
original image coordinates.
Args:
model_name (LandmarkWeights): The enum specifying the landmark model to load.
Defaults to `LandmarkWeights.DEFAULT`.
input_size (Tuple[int, int]): The resolution (width, height) for the model's
input. Defaults to (192, 192).
Example:
>>> # Assume 'image' is a loaded image and 'bbox' is a face bounding box
>>> # bbox = [x1, y1, x2, y2]
>>>
>>> landmarker = Landmark106()
>>> landmarks = landmarker.get_landmarks(image, bbox)
>>> print(landmarks.shape)
(106, 2)
"""
def __init__(
self,
model_name: LandmarkWeights = LandmarkWeights.DEFAULT,
input_size: Tuple[int, int] = (192, 192)
) -> None:
Logger.info(
f"Initializing Facial Landmark with model={model_name}, "
f"input_size={input_size}"
)
self.input_size = input_size
self.input_std = 1.0
self.input_mean = 0.0
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self):
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get input configuration
input_metadata = self.session.get_inputs()[0]
input_shape = input_metadata.shape
self.input_size = tuple(input_shape[2:4][::-1]) # Update input size from model
# Get input/output names
self.input_names = [input.name for input in self.session.get_inputs()]
self.output_names = [output.name for output in self.session.get_outputs()]
# Determine landmark dimensions from output shape
output_shape = self.session.get_outputs()[0].shape
self.lmk_dim = 2 # x,y coordinates
self.lmk_num = output_shape[1] // self.lmk_dim # Number of landmarks
Logger.info(f"Model initialized with {self.lmk_num} landmarks")
except Exception as e:
Logger.error(f"Failed to load landmark model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize landmark model: {e}")
def preprocess(self, image: np.ndarray, bbox: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""Prepares a face crop for inference.
This method takes a face bounding box, performs a center alignment to
warp the face into the model's required input size, and then creates
a normalized blob ready for the ONNX session.
Args:
image (np.ndarray): The full source image in BGR format.
bbox (np.ndarray): The bounding box of the face [x1, y1, x2, y2].
Returns:
Tuple[np.ndarray, np.ndarray]: A tuple containing:
- The preprocessed image blob ready for inference.
- The affine transformation matrix used for alignment.
"""
width, height = bbox[2] - bbox[0], bbox[3] - bbox[1]
center = ((bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2)
scale = self.input_size[0] / (max(width, height) * 1.5)
aligned_face, transform_matrix = bbox_center_alignment(image, center, self.input_size[0], scale, 0.0)
face_blob = cv2.dnn.blobFromImage(
aligned_face, 1.0 / self.input_std, self.input_size,
(self.input_mean, self.input_mean, self.input_mean), swapRB=True
)
return face_blob, transform_matrix
def postprocess(self, predictions: np.ndarray, transform_matrix: np.ndarray) -> np.ndarray:
"""Converts raw model predictions back to original image coordinates.
This method reshapes the model's flat output array into landmark points,
denormalizes them to the model's input space, and then applies an
inverse affine transformation to map them back to the original image space.
Args:
predictions (np.ndarray): Raw landmark coordinates from the model output.
transform_matrix (np.ndarray): The affine transformation matrix from preprocessing.
Returns:
np.ndarray: An array of landmark points in the original image's coordinates.
"""
landmarks = predictions.reshape((-1, 2))
landmarks[:, 0:2] += 1
landmarks[:, 0:2] *= (self.input_size[0] // 2)
inverse_matrix = cv2.invertAffineTransform(transform_matrix)
landmarks = transform_points_2d(landmarks, inverse_matrix)
return landmarks
def get_landmarks(self, image: np.ndarray, bbox: np.ndarray) -> np.ndarray:
"""Predicts facial landmarks for the given image and face bounding box.
This is the main public method that orchestrates the full pipeline of
preprocessing, inference, and post-processing.
Args:
image (np.ndarray): The full source image in BGR format.
bbox (np.ndarray): A bounding box of a face [x1, y1, x2, y2].
Returns:
np.ndarray: An array of predicted landmark points with shape (106, 2).
"""
face_blob, transform_matrix = self.preprocess(image, bbox)
raw_predictions = self.session.run(
self.output_names, {self.input_names[0]: face_blob}
)[0][0]
landmarks = self.postprocess(raw_predictions, transform_matrix)
return landmarks
# TODO: For testing purposes only, remote later
if __name__ == "__main__":
# UPDATED: Use the high-level factory functions
from uniface.detection import create_detector
from uniface.landmark import create_landmarker
# 1. Create the detector and landmarker using the new API
face_detector = create_detector('retinaface')
landmarker = create_landmarker() # Uses the default '2d106det' method
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Webcam not available.")
exit()
print("Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
print("Frame capture failed.")
break
# 2. The detect method returns a list of dictionaries
faces = face_detector.detect(frame)
if not faces:
cv2.imshow("Facial Landmark Detection", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
continue
# 3. Loop through the list of face dictionaries
for face in faces:
# Extract the bounding box
bbox = face['bbox']
# 4. Get landmarks for the current face using its bounding box
landmarks = landmarker.get_landmarks(frame, bbox)
# --- Drawing Logic ---
# Draw the landmarks
for (x, y) in landmarks.astype(int):
cv2.circle(frame, (x, y), 2, (0, 255, 0), -1)
# Draw the bounding box
x1, y1, x2, y2 = map(int, bbox)
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
cv2.imshow("Facial Landmark Detection", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

View File

@@ -1,7 +1,28 @@
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
Logger = logging.getLogger("retinaface")
# Create logger for uniface
Logger = logging.getLogger("uniface")
Logger.setLevel(logging.WARNING) # Only show warnings/errors by default
Logger.addHandler(logging.NullHandler())
def enable_logging(level=logging.INFO):
"""
Enable verbose logging for uniface.
Args:
level: Logging level (logging.DEBUG, logging.INFO, etc.)
Example:
>>> from uniface import enable_logging
>>> enable_logging() # Show INFO logs
"""
Logger.handlers.clear()
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter(
"%(asctime)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
))
Logger.addHandler(handler)
Logger.setLevel(level)
Logger.propagate = False

View File

@@ -1,102 +1,112 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import os
import hashlib
import requests
from uniface.log import Logger
import uniface.constants as const
def verify_model_weights(model_name: str, root: str = '~/.uniface/models') -> str:
"""
Ensures model weights are available by downloading if missing and verifying integrity with a SHA-256 hash.
Checks if the specified model weights file exists in `root`. If missing, downloads from a predefined URL.
The file is then verified using its SHA-256 hash. If verification fails, the corrupted file is deleted,
and an error is raised.
Args:
model_name (str): Name of the model weights to verify or download.
root (str, optional): Directory to store the model weights. Defaults to '~/.uniface/models'.
Returns:
str: Path to the verified model weights file.
Raises:
ValueError: If the model is not found or if verification fails.
ConnectionError: If downloading the file fails.
Examples:
>>> # Download and verify 'retinaface_mnet025' weights
>>> verify_model_weights('retinaface_mnet025')
'/home/user/.uniface/models/retinaface_mnet025.onnx'
>>> # Use a custom directory
>>> verify_model_weights('retinaface_r34', root='/custom/dir')
'/custom/dir/retinaface_r34.onnx'
"""
root = os.path.expanduser(root)
os.makedirs(root, exist_ok=True)
model_path = os.path.join(root, f'{model_name}.onnx')
if not os.path.exists(model_path):
url = const.MODEL_URLS.get(model_name)
if not url:
Logger.error(f"No URL found for model '{model_name}'")
raise ValueError(f"No URL found for model '{model_name}'")
Logger.info(f"Downloading '{model_name}' from {url}")
download_file(url, model_path)
Logger.info(f"Successfully '{model_name}' downloaded to {model_path}")
expected_hash = const.MODEL_SHA256.get(model_name)
if expected_hash and not verify_file_hash(model_path, expected_hash):
os.remove(model_path) # Remove corrupted file
Logger.warning("Corrupted weight detected. Removing...")
raise ValueError(f"Hash mismatch for '{model_name}'. The file may be corrupted; please try downloading again.")
return model_path
def download_file(url: str, dest_path: str) -> None:
"""Download a file from a URL in chunks and save it to the destination path."""
try:
response = requests.get(url, stream=True)
response.raise_for_status()
with open(dest_path, "wb") as file:
for chunk in response.iter_content(chunk_size=const.CHUNK_SIZE):
if chunk:
file.write(chunk)
except requests.RequestException as e:
raise ConnectionError(f"Failed to download file from {url}. Error: {e}")
def verify_file_hash(file_path: str, expected_hash: str) -> bool:
"""Compute the SHA-256 hash of the file and compare it with the expected hash."""
file_hash = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(const.CHUNK_SIZE), b""):
file_hash.update(chunk)
actual_hash = file_hash.hexdigest()
if actual_hash != expected_hash:
Logger.warning(f"Expected hash: {expected_hash}, but got: {actual_hash}")
return actual_hash == expected_hash
if __name__ == "__main__":
model_names = [
'retinaface_mnet025',
'retinaface_mnet050',
'retinaface_mnet_v1',
'retinaface_mnet_v2',
'retinaface_r18',
'retinaface_r34'
]
# Download each model in the list
for model_name in model_names:
model_path = verify_model_weights(model_name)
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import os
import hashlib
import requests
from tqdm import tqdm
from uniface.log import Logger
import uniface.constants as const
__all__ = ['verify_model_weights']
def verify_model_weights(model_name: str, root: str = '~/.uniface/models') -> str:
"""
Ensure model weights are present, downloading and verifying them using SHA-256 if necessary.
Given a model identifier from an Enum class (e.g., `RetinaFaceWeights.MNET_V2`), this function checks if
the corresponding `.onnx` weight file exists locally. If not, it downloads the file from a predefined URL.
After download, the files integrity is verified using a SHA-256 hash. If verification fails, the file is deleted
and an error is raised.
Args:
model_name (Enum): Model weight identifier (e.g., `RetinaFaceWeights.MNET_V2`, `ArcFaceWeights.RESNET`, etc.).
root (str, optional): Directory to store or locate the model weights. Defaults to '~/.uniface/models'.
Returns:
str: Absolute path to the verified model weights file.
Raises:
ValueError: If the model is unknown or SHA-256 verification fails.
ConnectionError: If downloading the file fails.
Examples:
>>> from uniface.models import RetinaFaceWeights, verify_model_weights
>>> verify_model_weights(RetinaFaceWeights.MNET_V2)
'/home/user/.uniface/models/retinaface_mnet_v2.onnx'
>>> verify_model_weights(RetinaFaceWeights.RESNET34, root='/custom/dir')
'/custom/dir/retinaface_r34.onnx'
"""
root = os.path.expanduser(root)
os.makedirs(root, exist_ok=True)
# Keep model_name as enum for dictionary lookup
url = const.MODEL_URLS.get(model_name)
if not url:
Logger.error(f"No URL found for model '{model_name}'")
raise ValueError(f"No URL found for model '{model_name}'")
file_ext = os.path.splitext(url)[1]
model_path = os.path.normpath(os.path.join(root, f'{model_name.value}{file_ext}'))
if not os.path.exists(model_path):
Logger.info(f"Downloading model '{model_name}' from {url}")
try:
download_file(url, model_path)
Logger.info(f"Successfully downloaded '{model_name}' to {model_path}")
except Exception as e:
Logger.error(f"Failed to download model '{model_name}': {e}")
raise ConnectionError(f"Download failed for '{model_name}'")
expected_hash = const.MODEL_SHA256.get(model_name)
if expected_hash and not verify_file_hash(model_path, expected_hash):
os.remove(model_path) # Remove corrupted file
Logger.warning("Corrupted weight detected. Removing...")
raise ValueError(f"Hash mismatch for '{model_name}'. The file may be corrupted; please try downloading again.")
return model_path
def download_file(url: str, dest_path: str) -> None:
"""Download a file from a URL in chunks and save it to the destination path."""
try:
response = requests.get(url, stream=True)
response.raise_for_status()
with open(dest_path, "wb") as file, tqdm(
desc=f"Downloading {dest_path}",
unit='B',
unit_scale=True,
unit_divisor=1024
) as progress:
for chunk in response.iter_content(chunk_size=const.CHUNK_SIZE):
if chunk:
file.write(chunk)
progress.update(len(chunk))
except requests.RequestException as e:
raise ConnectionError(f"Failed to download file from {url}. Error: {e}")
def verify_file_hash(file_path: str, expected_hash: str) -> bool:
"""Compute the SHA-256 hash of the file and compare it with the expected hash."""
file_hash = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(const.CHUNK_SIZE), b""):
file_hash.update(chunk)
actual_hash = file_hash.hexdigest()
if actual_hash != expected_hash:
Logger.warning(f"Expected hash: {expected_hash}, but got: {actual_hash}")
return actual_hash == expected_hash
if __name__ == "__main__":
model_names = [model.value for model in const.RetinaFaceWeights]
# Download each model in the list
for model_name in model_names:
model_path = verify_model_weights(model_name)

87
uniface/onnx_utils.py Normal file
View File

@@ -0,0 +1,87 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""
Utilities for ONNX Runtime configuration and provider selection.
"""
from typing import List
import onnxruntime as ort
from uniface.log import Logger
def get_available_providers() -> List[str]:
"""
Get list of available ONNX Runtime execution providers for the current platform.
Automatically detects and prioritizes hardware acceleration:
- CoreML on Apple Silicon (M1/M2/M3/M4)
- CUDA on NVIDIA GPUs
- CPU as fallback (always available)
Returns:
List[str]: Ordered list of execution providers to use
Examples:
>>> providers = get_available_providers()
>>> # On M4 Mac: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
>>> # On Linux with CUDA: ['CUDAExecutionProvider', 'CPUExecutionProvider']
>>> # On CPU-only: ['CPUExecutionProvider']
"""
available = ort.get_available_providers()
providers = []
# Priority order: CoreML > CUDA > CPU
if "CoreMLExecutionProvider" in available:
providers.append("CoreMLExecutionProvider")
Logger.info("CoreML acceleration enabled (Apple Silicon)")
if "CUDAExecutionProvider" in available:
providers.append("CUDAExecutionProvider")
Logger.info("CUDA acceleration enabled (NVIDIA GPU)")
# CPU is always available as fallback
providers.append("CPUExecutionProvider")
if len(providers) == 1:
Logger.info("Using CPU execution (no hardware acceleration detected)")
return providers
def create_onnx_session(model_path: str, providers: List[str] = None) -> ort.InferenceSession:
"""
Create an ONNX Runtime inference session with optimal provider selection.
Args:
model_path (str): Path to the ONNX model file
providers (List[str], optional): List of providers to use.
If None, automatically detects best available providers.
Returns:
ort.InferenceSession: Configured ONNX Runtime session
Raises:
RuntimeError: If session creation fails
Examples:
>>> session = create_onnx_session("model.onnx")
>>> # Automatically uses best available providers
>>> session = create_onnx_session("model.onnx", providers=["CPUExecutionProvider"])
>>> # Force CPU-only execution
"""
if providers is None:
providers = get_available_providers()
try:
session = ort.InferenceSession(model_path, providers=providers)
active_provider = session.get_providers()[0]
Logger.debug(f"Session created with provider: {active_provider}")
return session
except Exception as e:
Logger.error(f"Failed to create ONNX session: {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize ONNX Runtime session: {e}") from e

View File

@@ -0,0 +1,63 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Dict
from .models import ArcFace, MobileFace, SphereFace
from .base import BaseRecognizer
from uniface.constants import ArcFaceWeights, MobileFaceWeights, SphereFaceWeights
def create_recognizer(method: str = 'arcface', **kwargs) -> BaseRecognizer:
"""
Factory function to create face recognizers.
This function initializes and returns a face recognizer instance based on the
specified method. It acts as a high-level interface to the underlying
model classes like ArcFace, MobileFace, etc.
Args:
method (str): The recognition method to use.
Options: 'arcface' (default), 'mobileface', 'sphereface'.
**kwargs: Model-specific parameters passed to the recognizer's constructor.
For example, `model_name` can be used to select a specific
pre-trained weight from the available enums (e.g., `ArcFaceWeights.MNET`).
Returns:
BaseRecognizer: An initialized recognizer instance ready for use.
Raises:
ValueError: If the specified `method` is not supported.
Examples:
>>> # Create the default ArcFace recognizer
>>> recognizer = create_recognizer()
>>> # Create a specific MobileFace recognizer
>>> from uniface.constants import MobileFaceWeights
>>> recognizer = create_recognizer(
... 'mobileface',
... model_name=MobileFaceWeights.MNET_V2
... )
>>> # Create a SphereFace recognizer
>>> recognizer = create_recognizer('sphereface')
"""
method = method.lower()
if method == 'arcface':
return ArcFace(**kwargs)
elif method == 'mobileface':
return MobileFace(**kwargs)
elif method == 'sphereface':
return SphereFace(**kwargs)
else:
available = ['arcface', 'mobileface', 'sphereface']
raise ValueError(f"Unsupported method: '{method}'. Available: {available}")
__all__ = [
"create_recognizer",
"ArcFace",
"MobileFace",
"SphereFace",
"BaseRecognizer",
]

145
uniface/recognition/base.py Normal file
View File

@@ -0,0 +1,145 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
import cv2
import numpy as np
from dataclasses import dataclass
from typing import Tuple, Union, List
from uniface.log import Logger
from uniface.face_utils import face_alignment
from uniface.onnx_utils import create_onnx_session
@dataclass
class PreprocessConfig:
"""
Configuration for preprocessing images before feeding them into the model.
"""
input_mean: Union[float, List[float]] = 127.5
input_std: Union[float, List[float]] = 127.5
input_size: Tuple[int, int] = (112, 112)
class BaseRecognizer(ABC):
"""
Abstract Base Class for all face recognition models.
It provides the core functionality for preprocessing, inference, and embedding extraction.
"""
@abstractmethod
def __init__(self, model_path: str, preprocessing: PreprocessConfig) -> None:
"""
Initializes the model. Subclasses must call this.
Args:
model_path (str): The direct path to the verified ONNX model.
preprocessing (PreprocessConfig): The configuration for preprocessing.
"""
self.input_mean = preprocessing.input_mean
self.input_std = preprocessing.input_std
self.input_size = preprocessing.input_size
self.model_path = model_path
self._initialize_model()
def _initialize_model(self) -> None:
"""
Loads the ONNX model and prepares it for inference.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
# Initialize model session with available providers
self.session = create_onnx_session(self.model_path)
# Extract input configuration
input_cfg = self.session.get_inputs()[0]
self.input_name = input_cfg.name
# Verify input dimensions match our configuration
input_shape = input_cfg.shape
model_input_size = tuple(input_shape[2:4][::-1]) # (width, height)
if model_input_size != self.input_size:
Logger.warning(f"Model input size {model_input_size} differs from configured size {self.input_size}")
# Extract output configuration
self.output_names = [output.name for output in self.session.get_outputs()]
self.output_shape = self.session.get_outputs()[0].shape
assert len(self.output_names) == 1, "Expected only one output node."
Logger.info(f"Successfully initialized face encoder from {self.model_path}")
except Exception as e:
Logger.error(f"Failed to load face encoder model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{self.model_path}'") from e
def preprocess(self, face_img: np.ndarray) -> np.ndarray:
"""
Preprocess the image: resize, normalize, and convert it to a blob.
Args:
face_img: Input image in BGR format.
Returns:
Preprocessed image as a NumPy array ready for inference.
"""
resized_img = cv2.resize(face_img, self.input_size)
if isinstance(self.input_std, (list, tuple)):
# Per-channel normalization
rgb_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB).astype(np.float32)
normalized_img = (rgb_img - np.array(self.input_mean, dtype=np.float32)) / \
np.array(self.input_std, dtype=np.float32)
# Change to NCHW (batch, channels, height, width)
blob = np.transpose(normalized_img, (2, 0, 1)) # CHW
blob = np.expand_dims(blob, axis=0) # NCHW
else:
# Single-value normalization
blob = cv2.dnn.blobFromImage(
resized_img,
scalefactor=1.0 / self.input_std,
size=self.input_size,
mean=(self.input_mean, self.input_mean, self.input_mean),
swapRB=True # Convert BGR to RGB
)
return blob
def get_embedding(self, image: np.ndarray, landmarks: np.ndarray) -> np.ndarray:
"""
Extracts face embedding from an image.
Args:
image: Input face image (BGR format).
landmarks: Facial landmarks (5 points for alignment).
Returns:
Face embedding vector (typically 512-dimensional).
"""
# Align face using landmarks
aligned_face, _ = face_alignment(image, landmarks)
# Generate embedding from aligned face
face_blob = self.preprocess(aligned_face)
embedding = self.session.run(self.output_names, {self.input_name: face_blob})[0]
return embedding
def get_normalized_embedding(self, image: np.ndarray, landmarks: np.ndarray) -> np.ndarray:
"""
Extracts a l2 normalized face embedding vector from an image.
Args:
image: Input face image (BGR format).
landmarks: Facial landmarks (5 points for alignment).
Returns:
Normalized face embedding vector (typically 512-dimensional).
"""
embedding = self.get_embedding(image, landmarks)
norm = np.linalg.norm(embedding)
return embedding / norm if norm > 0 else embedding

View File

@@ -0,0 +1,114 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Optional
from uniface.constants import ArcFaceWeights, MobileFaceWeights, SphereFaceWeights
from uniface.model_store import verify_model_weights
from .base import BaseRecognizer, PreprocessConfig
__all__ = ["ArcFace", "MobileFace", "SphereFace"]
class ArcFace(BaseRecognizer):
"""ArcFace model for robust face recognition.
This class provides a concrete implementation of the BaseRecognizer,
pre-configured for ArcFace models. It handles the loading of specific
ArcFace weights and sets up the appropriate default preprocessing.
Args:
model_name (ArcFaceWeights): The specific ArcFace model variant to use.
Defaults to `ArcFaceWeights.MNET`.
preprocessing (Optional[PreprocessConfig]): An optional custom preprocessing
configuration. If None, a default config for ArcFace is used.
Example:
>>> from uniface.recognition import ArcFace
>>> recognizer = ArcFace()
>>> # embedding = recognizer.get_normalized_embedding(image, landmarks)
"""
def __init__(
self,
model_name: ArcFaceWeights = ArcFaceWeights.MNET,
preprocessing: Optional[PreprocessConfig] = None
) -> None:
if preprocessing is None:
preprocessing = PreprocessConfig(
input_mean=127.5,
input_std=127.5,
input_size=(112, 112)
)
model_path = verify_model_weights(model_name)
super().__init__(model_path=model_path, preprocessing=preprocessing)
class MobileFace(BaseRecognizer):
"""Lightweight MobileFaceNet model for fast face recognition.
This class provides a concrete implementation of the BaseRecognizer,
pre-configured for MobileFaceNet models. It is optimized for speed,
making it suitable for edge devices.
Args:
model_name (MobileFaceWeights): The specific MobileFaceNet model variant to use.
Defaults to `MobileFaceWeights.MNET_V2`.
preprocessing (Optional[PreprocessConfig]): An optional custom preprocessing
configuration. If None, a default config for MobileFaceNet is used.
Example:
>>> from uniface.recognition import MobileFace
>>> recognizer = MobileFace()
>>> # embedding = recognizer.get_normalized_embedding(image, landmarks)
"""
def __init__(
self,
model_name: MobileFaceWeights = MobileFaceWeights.MNET_V2,
preprocessing: Optional[PreprocessConfig] = None
) -> None:
if preprocessing is None:
preprocessing = PreprocessConfig(
input_mean=127.5,
input_std=127.5,
input_size=(112, 112)
)
model_path = verify_model_weights(model_name)
super().__init__(model_path=model_path, preprocessing=preprocessing)
class SphereFace(BaseRecognizer):
"""SphereFace model using angular margin for face recognition.
This class provides a concrete implementation of the BaseRecognizer,
pre-configured for SphereFace models, which were among the first to
introduce angular margin loss functions.
Args:
model_name (SphereFaceWeights): The specific SphereFace model variant to use.
Defaults to `SphereFaceWeights.SPHERE20`.
preprocessing (Optional[PreprocessConfig]): An optional custom preprocessing
configuration. If None, a default config for SphereFace is used.
Example:
>>> from uniface.recognition import SphereFace
>>> recognizer = SphereFace()
>>> # embedding = recognizer.get_normalized_embedding(image, landmarks)
"""
def __init__(
self,
model_name: SphereFaceWeights = SphereFaceWeights.SPHERE20,
preprocessing: Optional[PreprocessConfig] = None
) -> None:
if preprocessing is None:
preprocessing = PreprocessConfig(
input_mean=127.5,
input_std=127.5,
input_size=(112, 112)
)
model_path = verify_model_weights(model_name)
super().__init__(model_path=model_path, preprocessing=preprocessing)

View File

@@ -1,256 +0,0 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import os
import cv2
import numpy as np
import onnxruntime as ort
import torch
from typing import Tuple, List, Optional, Literal
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.common import (
nms,
resize_image,
decode_boxes,
generate_anchors,
decode_landmarks
)
class RetinaFace:
"""
A class for face detection using the RetinaFace model.
Args:
model (str): Path or identifier of the model weights.
conf_thresh (float): Confidence threshold for detections. Defaults to 0.5.
nms_thresh (float): Non-maximum suppression threshold. Defaults to 0.4.
pre_nms_topk (int): Maximum number of detections before NMS. Defaults to 5000.
post_nms_topk (int): Maximum number of detections after NMS. Defaults to 750.
dynamic_size (Optional[bool]): Whether to adjust anchor generation dynamically based on image size. Defaults to False.
input_size (Optional[Tuple[int, int]]): Static input size for the model (width, height). Defaults to (640, 640).
Attributes:
conf_thresh (float): Confidence threshold for filtering detections.
nms_thresh (float): Threshold for NMS to remove duplicate detections.
pre_nms_topk (int): Maximum detections to consider before applying NMS.
post_nms_topk (int): Maximum detections retained after applying NMS.
dynamic_size (bool): Indicates if input size and anchors are dynamically adjusted.
input_size (Tuple[int, int]): The model's input image size.
_model_path (str): Path to the model weights.
_priors (torch.Tensor): Precomputed anchor boxes for static input size.
"""
def __init__(
self,
model: str,
conf_thresh: float = 0.5,
nms_thresh: float = 0.4,
pre_nms_topk: int = 5000,
post_nms_topk: int = 750,
dynamic_size: Optional[bool] = False,
input_size: Optional[Tuple[int, int]] = (640, 640), # Default input size if dynamic_size=False
) -> None:
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
self.pre_nms_topk = pre_nms_topk
self.post_nms_topk = post_nms_topk
self.dynamic_size = dynamic_size
self.input_size = input_size
Logger.info(
f"Initializing RetinaFace with model={model}, conf_thresh={conf_thresh}, nms_thresh={nms_thresh}, "
f"pre_nms_topk={pre_nms_topk}, post_nms_topk={post_nms_topk}, dynamic_size={dynamic_size}, "
f"input_size={input_size}"
)
# Get path to model weights
self._model_path = verify_model_weights(model)
Logger.info(f"Verified model weights located at: {self._model_path}")
# Precompute anchors if using static size
if not dynamic_size and input_size is not None:
self._priors = generate_anchors(image_size=input_size)
Logger.debug("Generated anchors for static input size.")
# Initialize model
self._initialize_model(self._model_path)
def _initialize_model(self, model_path: str) -> None:
"""
Initializes an ONNX model session from the given path.
Args:
model_path (str): The file path to the ONNX model.
Raises:
RuntimeError: If the model fails to load, logs an error and raises an exception.
"""
try:
self.session = ort.InferenceSession(model_path)
self.input_name = self.session.get_inputs()[0].name
Logger.info(f"Successfully initialized the model from {model_path}")
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}")
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> np.ndarray:
"""Preprocess input image for model inference.
Args:
image (np.ndarray): Input image.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, C, H, W)
"""
image = np.float32(image) - np.array([104, 117, 123], dtype=np.float32)
image = image.transpose(2, 0, 1) # HWC to CHW
image = np.expand_dims(image, axis=0) # Add batch dimension (1, C, H, W)
return image
def inference(self, input_tensor: np.ndarray) -> List[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
Args:
input_tensor (np.ndarray): Preprocessed input tensor.
Returns:
Tuple[np.ndarray, np.ndarray]: Raw model outputs.
"""
return self.session.run(None, {self.input_name: input_tensor})
def detect(
self,
image: np.ndarray,
max_num: Optional[int] = 0,
metric: Literal["default", "max"] = "default",
center_weight: Optional[float] = 2.0
) -> Tuple[np.ndarray, np.ndarray]:
"""
Perform face detection on an input image and return bounding boxes and landmarks.
Args:
image (np.ndarray): Input image as a NumPy array of shape (height, width, channels).
max_num (int, optional): Maximum number of detections to return. Defaults to 1.
metric (str, optional): Metric for ranking detections when `max_num` is specified.
Options:
- "default": Prioritize detections closer to the image center.
- "max": Prioritize detections with larger bounding box areas.
center_weight (float, optional): Weight for penalizing detections farther from the image center
when using the "default" metric. Defaults to 2.0.
Returns:
Tuple[np.ndarray, np.ndarray]: Detection results containing:
- detections (np.ndarray): Array of detected bounding boxes with confidence scores.
Shape: (num_detections, 5), where each row is [x_min, y_min, x_max, y_max, score].
- landmarks (np.ndarray): Array of detected facial landmarks.
Shape: (num_detections, 5, 2), where each row contains 5 landmark points (x, y).
"""
if self.dynamic_size:
height, width, _ = image.shape
self._priors = generate_anchors(image_size=(height, width)) # generate anchors for each input image
resize_factor = 1.0 # No resizing
else:
image, resize_factor = resize_image(image, target_shape=self.input_size)
height, width, _ = image.shape
image_tensor = self.preprocess(image)
# ONNXRuntime inference
outputs = self.inference(image_tensor)
# Postprocessing
detections, landmarks = self.postprocess(outputs, resize_factor, shape=(width, height))
if max_num > 0 and detections.shape[0] > max_num:
# Calculate area of detections
areas = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center = (height // 2, width // 2)
offsets = np.vstack([
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0]
])
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
# Calculate scores based on the chosen metric
if metric == 'max':
scores = areas
else:
scores = areas - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(scores)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
return detections, landmarks
def postprocess(self, outputs: List[np.ndarray], resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
"""
Process the model outputs into final detection results.
Args:
outputs (List[np.ndarray]): Raw outputs from the detection model.
- outputs[0]: Location predictions (bounding box coordinates).
- outputs[1]: Class confidence scores.
- outputs[2]: Landmark predictions.
resize_factor (float): Factor used to resize the input image during preprocessing.
shape (Tuple[int, int]): Original shape of the image as (height, width).
Returns:
Tuple[np.ndarray, np.ndarray]: Processed results containing:
- detections (np.ndarray): Array of detected bounding boxes with confidence scores.
Shape: (num_detections, 5), where each row is [x_min, y_min, x_max, y_max, score].
- landmarks (np.ndarray): Array of detected facial landmarks.
Shape: (num_detections, 5, 2), where each row contains 5 landmark points (x, y).
"""
loc, conf, landmarks = outputs[0].squeeze(0), outputs[1].squeeze(0), outputs[2].squeeze(0)
# Decode boxes and landmarks
boxes = decode_boxes(torch.tensor(loc), self._priors).cpu().numpy()
landmarks = decode_landmarks(torch.tensor(landmarks), self._priors).cpu().numpy()
boxes, landmarks = self._scale_detections(boxes, landmarks, resize_factor, shape=(shape[0], shape[1]))
# Extract confidence scores for the face class
scores = conf[:, 1]
mask = scores > self.conf_thresh
# Filter by confidence threshold
boxes, landmarks, scores = boxes[mask], landmarks[mask], scores[mask]
# Sort by scores
order = scores.argsort()[::-1][:self.pre_nms_topk]
boxes, landmarks, scores = boxes[order], landmarks[order], scores[order]
# Apply NMS
detections = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
keep = nms(detections, self.nms_thresh)
detections, landmarks = detections[keep], landmarks[keep]
# Keep top-k detections
detections, landmarks = detections[:self.post_nms_topk], landmarks[:self.post_nms_topk]
landmarks = landmarks.reshape(-1, 5, 2).astype(np.int32)
return detections, landmarks
def _scale_detections(self, boxes: np.ndarray, landmarks: np.ndarray, resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
"""Scale bounding boxes and landmarks to the original image size."""
bbox_scale = np.array([shape[0], shape[1]] * 2)
boxes = boxes * bbox_scale / resize_factor
landmark_scale = np.array([shape[0], shape[1]] * 5)
landmarks = landmarks * landmark_scale / resize_factor
return boxes, landmarks

View File

@@ -1,15 +0,0 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
#
# Licensed under the MIT License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/MIT
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
__version__ = "0.1.1"
__author__ = "Yakhyokhuja Valikhujaev"

View File

@@ -1,38 +1,50 @@
# Copyright 2024 Yakhyokhuja Valikhujaev
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
from typing import List, Union
def draw_detections(image, detections, vis_threshold=0.6):
def draw_detections(
image: np.ndarray,
bboxes: Union[np.ndarray, List[List[float]]],
scores: Union[np.ndarray, List[float]],
landmarks: Union[np.ndarray, List[List[List[float]]]],
vis_threshold: float = 0.6
):
"""
Draw bounding boxes and landmarks on the image.
Draws bounding boxes, scores, and landmarks from separate lists onto an image.
Args:
image (ndarray): Image to draw detections on.
detections (tuple): (bounding boxes, landmarks) as NumPy arrays.
vis_threshold (float): Confidence threshold for filtering detections.
image (np.ndarray): The image to draw on.
bboxes (list or np.ndarray): A list of bounding boxes, e.g., [[x1,y1,x2,y2], ...].
scores (list or np.ndarray): A list of confidence scores.
landmarks (list or np.ndarray): A list of landmark sets, e.g., [[[x,y],...],...].
vis_threshold (float): Confidence threshold for filtering which detections to draw.
"""
_colors = [(0, 0, 255), (0, 255, 255), (255, 0, 255), (0, 255, 0), (255, 0, 0)]
# Unpack detections
boxes, landmarks = detections
scores = boxes[:, 4]
# Filter detections by score
keep_indices = [i for i, score in enumerate(scores) if score >= vis_threshold]
# Filter detections by confidence threshold
filtered = scores >= vis_threshold
boxes = boxes[filtered, :4].astype(np.int32)
landmarks = landmarks[filtered]
scores = scores[filtered]
# Draw the filtered detections
for i in keep_indices:
bbox = np.array(bboxes[i], dtype=np.int32)
score = scores[i]
landmark_set = np.array(landmarks[i], dtype=np.int32)
print(f"#faces: {len(scores)}")
# Calculate adaptive thickness
thickness = max(1, int(min(bbox[2] - bbox[0], bbox[3] - bbox[1]) / 100))
# Draw bounding boxes, scores, and landmarks
for box, score, landmark in zip(boxes, scores, landmarks):
cv2.rectangle(image, box[:2], box[2:], (0, 0, 255), 2)
cv2.putText(image, f"{score:.2f}", (box[0], box[1] + 12), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)
for point, color in zip(landmark, _colors):
cv2.circle(image, tuple(point), 2, color, -1)
# Draw bounding box
cv2.rectangle(image, tuple(bbox[:2]), tuple(bbox[2:]), (0, 0, 255), thickness)
# Draw score
cv2.putText(image, f"{score:.2f}", (bbox[0], bbox[1] - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), thickness)
# Draw landmarks
for j, point in enumerate(landmark_set):
cv2.circle(image, tuple(point), thickness + 1, _colors[j], -1)