36 Commits

Author SHA1 Message Date
Yakhyokhuja Valikhujaev
7c98a60d26 fix: Python 3.10 does not support tomlib (#43) 2025-12-24 00:51:36 +09:00
Yakhyokhuja Valikhujaev
d97a3b2cb2 Merge pull request #42 from yakhyo/feat/standardize-outputs
feat: Standardize detection output and several other updates
2025-12-24 00:38:32 +09:00
yakhyo
2200ba063c docs: Update related docs and ruff formatting 2025-12-24 00:34:24 +09:00
yakhyo
9bcbfa65c2 feat: Update detection module output to datalasses 2025-12-24 00:00:00 +09:00
yakhyo
96306a0910 feat: Update github actions 2025-12-23 23:59:15 +09:00
Yakhyokhuja Valikhujaev
3389aa3e4c feat: Add MiniFasNet for Face Anti Spoofing (#41) 2025-12-20 22:34:47 +09:00
Yakhyokhuja Valikhujaev
b282e6ccc1 docs: Update related docs to face anonymization (#40) 2025-12-20 21:27:26 +09:00
Yakhyokhuja Valikhujaev
d085c6a822 feat: Add face blurring for privacy (#39)
* feat: Add face blurring for privacy

* chore: Revert back the version
2025-12-20 20:57:42 +09:00
yakhyo
13b518e96d chore: Upgrade version to v1.5.3 2025-12-15 15:09:54 +09:00
yakhyo
1b877bc9fc fix: Fix the version 2025-12-15 14:53:36 +09:00
Yakhyokhuja Valikhujaev
bb1d209f3b feat: Add BiSeNet face parsing model (#36)
* Add BiSeNet face parsing implementation

* Add parsing model weights configuration

* Export BiSeNet in main package

* Add face parsing tests

* Add face parsing examples and script

* Bump version to 1.5.0

* Update documentation for face parsing

* Fix face parsing notebook to use lips instead of mouth

* chore: Update the face parsing example

* fix: Fix model argument to use Enum

* ref: Move vis_parsing_map function into visualization.py

* docs: Update README.md
2025-12-15 14:50:15 +09:00
Yakhyokhuja Valikhujaev
54b769c0f1 feat: Add Face Parsing model BiSeNet model trained on CelebMask dataset (#35)
* Add BiSeNet face parsing implementation

* Add parsing model weights configuration

* Export BiSeNet in main package

* Add face parsing tests

* Add face parsing examples and script

* Bump version to 1.5.0

* Update documentation for face parsing

* Fix face parsing notebook to use lips instead of mouth

* chore: Update the face parsing example

* fix: Fix model argument to use Enum

* ref: Move vis_parsing_map function into visualization.py

* docs: Update README.md
2025-12-14 21:13:53 +09:00
Yakhyokhuja Valikhujaev
4d1921e531 feat: Add 2D Gaze estimation models (#34)
* feat: Add Gaze Estimation, update docs and Add example notebook, inference code

* docs: Update README.md
2025-12-14 14:07:46 +09:00
yakhyo
da8a5cf35b feat: Add yolov5n, update docs and ruff code format 2025-12-11 01:02:18 +09:00
Yakhyokhuja Valikhujaev
3982d677a9 fix: Fix type conversion and remove redundant type conversion (#29)
* ref: Remove type conversion and update face class

* fix: change the type to float32

* chore: Update all examples, testing with latest version

* docs: Update docs reflecting the recent changes
2025-12-10 00:18:11 +09:00
Yakhyokhuja Valikhujaev
f4458f0550 Revise model configurations in README.md
Updated model names and confidence thresholds for SCRFD and YOLOv5Face in the README.
2025-12-08 10:07:30 +09:00
Yakhyokhuja Valikhujaev
637316f077 feat: Update examples and some minor changes to UniFace API (#28)
* chore: Style changes and create jupyter notebook template

* docs: Update docstring for detection

* feat: Keyword only for common parameters: model_name, conf_thresh, nms_thresh, input_size

* chore: Update drawing and let the conf text optional for drawing

* feat: add fancy bbox draw

* docs: Add examples of using UniFace

* feat: Add version to all examples
2025-12-07 19:51:08 +09:00
Yakhyokhuja Valikhujaev
6b1d2a1ce6 feat: Add YOLOv5 face detection support (#26)
* feat: Add YOLOv5 face detection model

* docs: Update docs, add new model information

* feat: Add YOLOv5 face detection model

* test: Add testing and running
2025-12-03 23:35:56 +09:00
Yakhyokhuja Valikhujaev
a5e97ac484 Update README.md 2025-12-01 13:19:25 +09:00
Yakhyokhuja Valikhujaev
0c93598007 feat: Enhace emotion inference speed on ARM and add FaceAnalyzer, Face classes for ease of use. (#25)
* feat: Update linting and type annotations, return types in detect

* feat: add face analyzer and face classes

* chore: Update the format and clean up some docstrings

* docs: Update usage documentation

* feat: Change AgeGender model output to 0, 1 instead of string (Female, Male)

* test: Update testing code

* feat: Add Apple silicon backend for torchscript inference

* feat: Add face analyzer example and add run emotion for testing
2025-11-30 20:32:07 +09:00
Yakhyokhuja Valikhujaev
779952e3f8 Merge pull request #23 from yakhyo/test-files-update
feat: Some minor changes to code style and warning supression
2025-11-26 00:16:49 +09:00
yakhyo
39b50b62bd chore: Update the version 2025-11-26 00:15:45 +09:00
yakhyo
db7532ecf1 feat: Supress the warning and give info about onnx backend 2025-11-26 00:06:39 +09:00
yakhyo
4b8dc2c0f9 feat: Update jupyter notebooks to match the latest version of UniFace 2025-11-26 00:06:13 +09:00
yakhyo
0a2a10e165 docs: Update README.md 2025-11-26 00:05:40 +09:00
yakhyo
84cda5f56c chore: Code style formatting changes 2025-11-26 00:05:24 +09:00
yakhyo
0771a7959a chore: Code style formatting changes 2025-11-25 23:45:50 +09:00
yakhyo
15947eb605 chore: Change import order and style changes by Ruff 2025-11-25 23:35:00 +09:00
yakhyo
1ccc4f6b77 chore: Update print 2025-11-25 23:28:42 +09:00
yakhyo
189755a1a6 ref: Update some refactoring files for testing 2025-11-25 23:19:45 +09:00
Yakhyokhuja Valikhujaev
11363fe0a8 Merge pull request #18 from yakhyo/feat-20251115
ref: Add comprehensive test suite and enhance model functionality
2025-11-15 21:32:10 +09:00
yakhyo
fe3e70a352 release: Update release version to v1.1.0 2025-11-15 21:31:56 +09:00
yakhyo
8e218321a4 fix: Fix test issue where landmark variable name wrongly used 2025-11-15 21:28:21 +09:00
yakhyo
2c78f39e5d ref: Add comprehensive test suite and enhance model functionality
- Add new test files for age_gender, factory, landmark, recognition, scrfd, and utils
- Add new scripts for age_gender, landmarks, and video detection
- Update documentation in README.md, MODELS.md, QUICKSTART.md
- Improve model constants and face utilities
- Update detection models (retinaface, scrfd) with enhanced functionality
- Update project configuration in pyproject.toml
2025-11-15 21:09:37 +09:00
yakhyo
df673c4a3f remove release guide 2025-11-11 21:38:57 +09:00
yakhyo
496de7a491 Release v1.0.0: First stable release with complete feature set
- Added high-level pipeline API (FacePipeline, process_faces, compare_faces)
- Implemented face detection (RetinaFace, SCRFD)
- Implemented face recognition (ArcFace, MobileFace, SphereFace)
- Added landmark detection (106 points)
- Added attribute analysis (age, gender, emotion)
- Comprehensive examples and documentation
- Cleaned up legacy code and simplified documentation structure
2025-11-11 21:38:57 +09:00
82 changed files with 8984 additions and 1807 deletions

BIN
.github/logos/gaze_crop.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 716 KiB

BIN
.github/logos/gaze_org.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 673 KiB

View File

@@ -10,14 +10,20 @@ on:
- main
- develop
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
timeout-minutes: 15
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.11", "3.13"]
steps:
- name: Checkout code
@@ -27,7 +33,7 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache: "pip"
- name: Install dependencies
run: |
@@ -38,21 +44,18 @@ jobs:
run: |
python -c "import onnxruntime as ort; print('Available providers:', ort.get_available_providers())"
- name: Lint with ruff (if available)
run: |
pip install ruff || true
ruff check . --exit-zero || true
continue-on-error: true
- name: Lint with ruff
run: ruff check .
- name: Run tests
run: pytest -v --tb=short
- name: Test package imports
run: |
python -c "from uniface import RetinaFace, ArcFace, Landmark106, AgeGender; print('All imports successful')"
run: python -c "import uniface; print(f'uniface {uniface.__version__} loaded with {len(uniface.__all__)} exports')"
build:
runs-on: ubuntu-latest
timeout-minutes: 10
needs: test
steps:
@@ -63,7 +66,7 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: 'pip'
cache: "pip"
- name: Install build tools
run: |
@@ -84,4 +87,3 @@ jobs:
name: dist-python-${{ github.sha }}
path: dist/
retention-days: 7

View File

@@ -5,9 +5,14 @@ on:
tags:
- "v*.*.*" # Trigger only on version tags like v0.1.9
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
validate:
runs-on: ubuntu-latest
timeout-minutes: 5
outputs:
version: ${{ steps.get_version.outputs.version }}
tag_version: ${{ steps.get_version.outputs.tag_version }}
@@ -16,13 +21,18 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Get version from tag and pyproject.toml
id: get_version
run: |
TAG_VERSION=${GITHUB_REF#refs/tags/v}
echo "tag_version=$TAG_VERSION" >> $GITHUB_OUTPUT
PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' pyproject.toml)
PYPROJECT_VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml','rb'))['project']['version'])")
echo "version=$PYPROJECT_VERSION" >> $GITHUB_OUTPUT
echo "Tag version: v$TAG_VERSION"
@@ -38,12 +48,13 @@ jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 15
needs: validate
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
python-version: ["3.11", "3.13"]
steps:
- name: Checkout code
@@ -65,6 +76,7 @@ jobs:
publish:
runs-on: ubuntu-latest
timeout-minutes: 10
needs: [validate, test]
permissions:
contents: write

81
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,81 @@
# Contributing to UniFace
Thank you for considering contributing to UniFace! We welcome contributions of all kinds.
## How to Contribute
### Reporting Issues
- Use GitHub Issues to report bugs or suggest features
- Include clear descriptions and reproducible examples
- Check existing issues before creating new ones
### Pull Requests
1. Fork the repository
2. Create a new branch for your feature
3. Write clear, documented code with type hints
4. Add tests for new functionality
5. Ensure all tests pass
6. Submit a pull request with a clear description
### Code Style
This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
```bash
# Check for linting errors
ruff check .
# Auto-fix linting errors
ruff check . --fix
# Format code
ruff format .
```
**Guidelines:**
- Follow PEP8 guidelines
- Use type hints (Python 3.10+)
- Write docstrings for public APIs
- Line length: 120 characters
- Keep code simple and readable
All PRs must pass `ruff check .` before merging.
## Development Setup
```bash
git clone https://github.com/yakhyo/uniface.git
cd uniface
pip install -e ".[dev]"
```
## Running Tests
```bash
pytest tests/
```
## Examples
Example notebooks demonstrating library usage:
| Example | Notebook |
|---------|----------|
| Face Detection | [face_detection.ipynb](examples/face_detection.ipynb) |
| Face Alignment | [face_alignment.ipynb](examples/face_alignment.ipynb) |
| Face Recognition | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
| Face Verification | [face_verification.ipynb](examples/face_verification.ipynb) |
| Face Search | [face_search.ipynb](examples/face_search.ipynb) |
| Face Anonymization | [face_anonymization.ipynb](examples/face_anonymization.ipynb) |
## Questions?
Open an issue or start a discussion on GitHub.

356
MODELS.md
View File

@@ -10,14 +10,14 @@ Complete guide to all available models, their performance characteristics, and s
RetinaFace models are trained on the WIDER FACE dataset and provide excellent accuracy-speed tradeoffs.
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
|---------------------|--------|--------|--------|--------|--------|----------------------------|
| `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% | Mobile/Edge devices |
| `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% | Balanced mobile |
| `MNET_V2` | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% | **Recommended default** |
| `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% | Server/High accuracy |
| `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% | Maximum accuracy |
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
| -------------- | ------ | ----- | ------ | ------ | ------ | ----------------------------- |
| `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% | Mobile/Edge devices |
| `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% | Balanced mobile |
| `MNET_V2` ⭐ | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% | **Recommended default** |
| `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% | Server/High accuracy |
| `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% | Maximum accuracy |
**Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) - from [RetinaFace paper](https://arxiv.org/abs/1905.00641)
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
@@ -46,10 +46,10 @@ detector = RetinaFace(
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models offer state-of-the-art speed-accuracy tradeoffs.
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
|-----------------|--------|-------|--------|--------|--------|----------------------------|
| `SCRFD_500M` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% | Real-time applications |
| `SCRFD_10G` | 4.2M | 17MB | 95.16% | 93.87% | 83.05% | **High accuracy + speed** |
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
| ---------------- | ------ | ----- | ------ | ------ | ------ | ------------------------------- |
| `SCRFD_500M` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% | Real-time applications |
| `SCRFD_10G` ⭐ | 4.2M | 17MB | 95.16% | 93.87% | 83.05% | **High accuracy + speed** |
**Accuracy**: WIDER FACE validation set - from [SCRFD paper](https://arxiv.org/abs/2105.04714)
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
@@ -76,16 +76,66 @@ detector = SCRFD(
---
### YOLOv5-Face Family
YOLOv5-Face models provide excellent detection accuracy with 5-point facial landmarks, optimized for real-time applications.
| Model Name | Size | Easy | Medium | Hard | Use Case |
| -------------- | ---- | ------ | ------ | ------ | ------------------------------ |
| `YOLOV5N` | 11MB | 93.61% | 91.52% | 80.53% | Lightweight/Mobile |
| `YOLOV5S` ⭐ | 28MB | 94.33% | 92.61% | 83.15% | **Real-time + accuracy** |
| `YOLOV5M` | 82MB | 95.30% | 93.76% | 85.28% | High accuracy |
**Accuracy**: WIDER FACE validation set - from [YOLOv5-Face paper](https://arxiv.org/abs/2105.12931)
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
**Note**: Fixed input size of 640×640. Models exported to ONNX from [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face)
#### Usage
```python
from uniface import YOLOv5Face
from uniface.constants import YOLOv5FaceWeights
# Lightweight/Mobile
detector = YOLOv5Face(
model_name=YOLOv5FaceWeights.YOLOV5N,
conf_thresh=0.6,
nms_thresh=0.5
)
# Real-time detection (recommended)
detector = YOLOv5Face(
model_name=YOLOv5FaceWeights.YOLOV5S,
conf_thresh=0.6,
nms_thresh=0.5
)
# High accuracy
detector = YOLOv5Face(
model_name=YOLOv5FaceWeights.YOLOV5M,
conf_thresh=0.6
)
# Detect faces with landmarks
faces = detector.detect(image)
for face in faces:
bbox = face['bbox'] # [x1, y1, x2, y2]
confidence = face['confidence']
landmarks = face['landmarks'] # 5-point landmarks (5, 2)
```
---
## Face Recognition Models
### ArcFace
State-of-the-art face recognition using additive angular margin loss.
| Model Name | Backbone | Params | Size | Use Case |
|-------------|-------------|--------|-------|----------------------------|
| `MNET` | MobileNet | 2.0M | 8MB | **Balanced (recommended)** |
| `RESNET` | ResNet50 | 43.6M | 166MB | Maximum accuracy |
| Model Name | Backbone | Params | Size | Use Case |
| ----------- | --------- | ------ | ----- | -------------------------------- |
| `MNET` ⭐ | MobileNet | 2.0M | 8MB | **Balanced (recommended)** |
| `RESNET` | ResNet50 | 43.6M | 166MB | Maximum accuracy |
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
**Accuracy**: Benchmark on your own dataset or use standard face verification benchmarks
@@ -113,13 +163,15 @@ embedding = recognizer.get_normalized_embedding(image, landmarks)
Lightweight face recognition optimized for mobile devices.
| Model Name | Backbone | Params | Size | Use Case |
|-----------------|-----------------|--------|------|--------------------|
| `MNET_025` | MobileNetV1 0.25| 0.2M | 1MB | Ultra-lightweight |
| `MNET_V2` ⭐ | MobileNetV2 | 1.0M | 4MB | **Mobile/Edge** |
| `MNET_V3_SMALL` | MobileNetV3-S | 0.8M | 3MB | Mobile optimized |
| `MNET_V3_LARGE` | MobileNetV3-L | 2.5M | 10MB | Balanced mobile |
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | Use Case |
| ----------------- | ---------------- | ------ | ---- | ------ | ------ | ------ | -------- | --------------------- |
| `MNET_025` | MobileNetV1 0.25 | 0.36M | 1MB | 98.76% | 92.02% | 82.37% | 90.02% | Ultra-lightweight |
| `MNET_V2` ⭐ | MobileNetV2 | 2.29M | 4MB | 99.55% | 94.87% | 86.89% | 95.16% | **Mobile/Edge** |
| `MNET_V3_SMALL` | MobileNetV3-S | 1.25M | 3MB | 99.30% | 93.77% | 85.29% | 92.79% | Mobile optimized |
| `MNET_V3_LARGE` | MobileNetV3-L | 3.52M | 10MB | 99.53% | 94.56% | 86.79% | 95.13% | Balanced mobile |
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
**Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
**Note**: These models are lightweight alternatives to ArcFace for resource-constrained environments
#### Usage
@@ -138,12 +190,14 @@ recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2)
Face recognition using angular softmax loss.
| Model Name | Backbone | Params | Size | Use Case |
|-------------|----------|--------|------|----------------------|
| `SPHERE20` | Sphere20 | 13.0M | 50MB | Research/Comparison |
| `SPHERE36` | Sphere36 | 24.2M | 92MB | Research/Comparison |
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | Use Case |
| ------------ | -------- | ------ | ---- | ------ | ------ | ------ | -------- | ------------------- |
| `SPHERE20` | Sphere20 | 24.5M | 50MB | 99.67% | 95.61% | 88.75% | 96.58% | Research/Comparison |
| `SPHERE36` | Sphere36 | 34.6M | 92MB | 99.72% | 95.64% | 89.92% | 96.83% | Research/Comparison |
**Note**: SphereFace uses angular softmax loss, an earlier approach before ArcFace
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
**Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
**Note**: SphereFace uses angular softmax loss, an earlier approach before ArcFace. These models provide good accuracy with moderate resource requirements.
#### Usage
@@ -162,9 +216,9 @@ recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20)
High-precision facial landmark localization.
| Model Name | Points | Params | Size | Use Case |
|------------|--------|--------|------|-----------------------------|
| `2D106` | 106 | 3.7M | 14MB | Face alignment, analysis |
| Model Name | Points | Params | Size | Use Case |
| ---------- | ------ | ------ | ---- | ------------------------ |
| `2D106` | 106 | 3.7M | 14MB | Face alignment, analysis |
**Note**: Provides 106 facial keypoints for detailed face analysis and alignment
@@ -179,6 +233,7 @@ landmarks = landmarker.get_landmarks(image, bbox)
```
**Landmark Groups:**
- Face contour: 0-32 (33 points)
- Eyebrows: 33-50 (18 points)
- Nose: 51-62 (12 points)
@@ -191,9 +246,9 @@ landmarks = landmarker.get_landmarks(image, bbox)
### Age & Gender Detection
| Model Name | Attributes | Params | Size | Use Case |
|------------|-------------|--------|------|-------------------|
| `DEFAULT` | Age, Gender | 2.1M | 8MB | General purpose |
| Model Name | Attributes | Params | Size | Use Case |
| ----------- | ----------- | ------ | ---- | --------------- |
| `DEFAULT` | Age, Gender | 2.1M | 8MB | General purpose |
**Dataset**: Trained on CelebA
**Note**: Accuracy varies by demographic and image quality. Test on your specific use case.
@@ -205,17 +260,18 @@ from uniface import AgeGender
predictor = AgeGender()
gender, age = predictor.predict(image, bbox)
# Returns: ("Male"/"Female", age_in_years)
# Returns: (gender, age_in_years)
# gender: 0 for Female, 1 for Male
```
---
### Emotion Detection
| Model Name | Classes | Params | Size | Use Case |
|--------------|---------|--------|------|-----------------------|
| `AFFECNET7` | 7 | 0.5M | 2MB | 7-class emotion |
| `AFFECNET8` | 8 | 0.5M | 2MB | 8-class emotion |
| Model Name | Classes | Params | Size | Use Case |
| ------------- | ------- | ------ | ---- | --------------- |
| `AFFECNET7` | 7 | 0.5M | 2MB | 7-class emotion |
| `AFFECNET8` | 8 | 0.5M | 2MB | 8-class emotion |
**Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger
**Classes (8)**: Above + Contempt
@@ -235,115 +291,157 @@ emotion, confidence = predictor.predict(image, landmarks)
---
## Model Selection Guide
## Gaze Estimation Models
### By Use Case
### MobileGaze Family
#### Mobile/Edge Devices
- **Detection**: `RetinaFace(MNET_025)` or `SCRFD(SCRFD_500M)`
- **Recognition**: `MobileFace(MNET_V2)`
- **Priority**: Speed, small model size
Real-time gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
#### Real-Time Applications (Webcam, Video)
- **Detection**: `RetinaFace(MNET_V2)` or `SCRFD(SCRFD_500M)`
- **Recognition**: `ArcFace(MNET)`
- **Priority**: Speed-accuracy balance
| Model Name | Params | Size | MAE* | Use Case |
| -------------- | ------ | ------- | ----- | ----------------------------- |
| `RESNET18` | 11.7M | 43 MB | 12.84 | Balanced accuracy/speed |
| `RESNET34` ⭐ | 24.8M | 81.6 MB | 11.33 | **Recommended default** |
| `RESNET50` | 25.6M | 91.3 MB | 11.34 | High accuracy |
| `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 | Mobile/Edge devices |
| `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 | Lightweight/Real-time |
#### High-Accuracy Applications (Security, Verification)
- **Detection**: `SCRFD(SCRFD_10G)` or `RetinaFace(RESNET34)`
- **Recognition**: `ArcFace(RESNET)`
- **Priority**: Maximum accuracy
*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better
#### Server/Cloud Deployment
- **Detection**: `SCRFD(SCRFD_10G)`
- **Recognition**: `ArcFace(RESNET)`
- **Priority**: Accuracy, batch processing
**Dataset**: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)
**Training**: 200 epochs with classification-based approach (binned angles)
#### Usage
```python
from uniface import MobileGaze
from uniface.constants import GazeWeights
import numpy as np
# Default (recommended)
gaze_estimator = MobileGaze() # Uses RESNET34
# Lightweight model
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
# Estimate gaze from face crop
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f"Pitch: {np.degrees(pitch):.1f}°, Yaw: {np.degrees(yaw):.1f}°")
```
**Note**: Requires face crop as input. Use face detection first to obtain bounding boxes.
---
### By Hardware
## Face Parsing Models
#### Apple Silicon (M1/M2/M3/M4)
**Recommended**: All models work well with CoreML acceleration
### BiSeNet Family
```bash
pip install uniface[silicon]
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
| Model Name | Params | Size | Classes | Use Case |
| -------------- | ------ | ------- | ------- | ----------------------------- |
| `RESNET18` ⭐ | 13.3M | 50.7 MB | 19 | **Recommended default** |
| `RESNET34` | 24.1M | 89.2 MB | 19 | Higher accuracy |
**19 Facial Component Classes:**
1. Background
2. Skin
3. Left Eyebrow
4. Right Eyebrow
5. Left Eye
6. Right Eye
7. Eye Glasses
8. Left Ear
9. Right Ear
10. Ear Ring
11. Nose
12. Mouth
13. Upper Lip
14. Lower Lip
15. Neck
16. Neck Lace
17. Cloth
18. Hair
19. Hat
**Dataset**: Trained on CelebAMask-HQ
**Architecture**: BiSeNet with ResNet backbone
**Input Size**: 512×512 (automatically resized)
#### Usage
```python
from uniface.parsing import BiSeNet
from uniface.constants import ParsingWeights
from uniface.visualization import vis_parsing_maps
import cv2
# Default (recommended)
parser = BiSeNet() # Uses RESNET18
# Higher accuracy model
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
# Parse face image (already cropped)
mask = parser.parse(face_image)
# Visualize with overlay
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
# mask shape: (H, W) with values 0-18 representing classes
print(f"Detected {len(np.unique(mask))} facial components")
```
**Recommended models**:
- **Fast**: `SCRFD(SCRFD_500M)` - Lightweight, real-time capable
- **Balanced**: `RetinaFace(MNET_V2)` - Good accuracy/speed tradeoff
- **Accurate**: `SCRFD(SCRFD_10G)` - High accuracy
**Applications:**
- Face makeup and beauty applications
- Virtual try-on systems
- Face editing and manipulation
- Facial feature extraction
- Portrait segmentation
**Benchmark on your M4**: `python scripts/run_detection.py --iterations 100`
#### NVIDIA GPU (CUDA)
**Recommended**: Larger models for maximum throughput
```bash
pip install uniface[gpu]
```
**Recommended models**:
- **Fast**: `SCRFD(SCRFD_500M)` - Maximum throughput
- **Balanced**: `SCRFD(SCRFD_10G)` - Best overall
- **Accurate**: `RetinaFace(RESNET34)` - Highest accuracy
#### CPU Only
**Recommended**: Lightweight models
**Recommended models**:
- **Fast**: `RetinaFace(MNET_025)` - Smallest, fastest
- **Balanced**: `RetinaFace(MNET_V2)` - Recommended default
- **Accurate**: `SCRFD(SCRFD_10G)` - Best accuracy on CPU
**Note**: FPS values vary significantly based on image size, number of faces, and hardware. Always benchmark on your specific setup.
**Note**: Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
---
## Benchmark Details
## Anti-Spoofing Models
### How to Benchmark
### MiniFASNet Family
Run benchmarks on your own hardware:
Lightweight face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).
```bash
# Detection speed
python scripts/run_detection.py --image assets/test.jpg --iterations 100
| Model Name | Size | Scale | Use Case |
| ---------- | ------ | ----- | ----------------------------- |
| `V1SE` | 1.2 MB | 4.0 | Squeeze-and-excitation variant |
| `V2` ⭐ | 1.2 MB | 2.7 | **Recommended default** |
# Compare models
python scripts/run_detection.py --image assets/test.jpg --method retinaface --iterations 100
python scripts/run_detection.py --image assets/test.jpg --method scrfd --iterations 100
**Dataset**: Trained on face anti-spoofing datasets
**Output**: Returns (label_idx, score) where label_idx: 0=Fake, 1=Real
#### Usage
```python
from uniface import RetinaFace
from uniface.spoofing import MiniFASNet
from uniface.constants import MiniFASNetWeights
# Default (V2, recommended)
detector = RetinaFace()
spoofer = MiniFASNet()
# V1SE variant
spoofer = MiniFASNet(model_name=MiniFASNetWeights.V1SE)
# Detect and check liveness
faces = detector.detect(image)
for face in faces:
label_idx, score = spoofer.predict(image, face['bbox'])
# label_idx: 0 = Fake, 1 = Real
label = 'Real' if label_idx == 1 else 'Fake'
print(f"{label}: {score:.1%}")
```
### Accuracy Metrics Explained
- **WIDER FACE**: Standard face detection benchmark with three difficulty levels
- **Easy**: Large faces (>50px), clear backgrounds
- **Medium**: Medium-sized faces (30-50px), moderate occlusion
- **Hard**: Small faces (<30px), heavy occlusion, blur
*Accuracy values are from the original papers - see references below*
- **Model Size**: ONNX model file size (affects download time and memory)
- **Params**: Number of model parameters (affects inference speed)
### Important Notes
1. **Speed varies by**:
- Image resolution
- Number of faces in image
- Hardware (CPU/GPU/CoreML)
- Batch size
- Operating system
2. **Accuracy varies by**:
- Image quality
- Lighting conditions
- Face pose and occlusion
- Demographic factors
3. **Always benchmark on your specific use case** before choosing a model
**Note**: Requires face bounding box from a detector. Use with RetinaFace, SCRFD, or YOLOv5Face.
---
@@ -383,13 +481,19 @@ python scripts/download_model.py --model MNET_V2
### Model Training & Architectures
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
- **YOLOv5-Face Original**: [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face) - Original PyTorch implementation
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
### Papers
- **RetinaFace**: [Single-Shot Multi-Level Face Localisation in the Wild](https://arxiv.org/abs/1905.00641)
- **SCRFD**: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714)
- **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
- **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)

View File

@@ -7,8 +7,8 @@ Get up and running with UniFace in 5 minutes! This guide covers the most common
## Installation
```bash
# macOS (Apple Silicon)
pip install uniface[silicon]
# macOS (Apple Silicon) - automatically includes ARM64 optimizations
pip install uniface
# Linux/Windows with NVIDIA GPU
pip install uniface[gpu]
@@ -39,12 +39,13 @@ faces = detector.detect(image)
# Print results
for i, face in enumerate(faces):
print(f"Face {i+1}:")
print(f" Confidence: {face['confidence']:.2f}")
print(f" BBox: {face['bbox']}")
print(f" Landmarks: {len(face['landmarks'])} points")
print(f" Confidence: {face.confidence:.2f}")
print(f" BBox: {face.bbox}")
print(f" Landmarks: {len(face.landmarks)} points")
```
**Output:**
```
Face 1:
Confidence: 0.99
@@ -69,12 +70,18 @@ image = cv2.imread("photo.jpg")
faces = detector.detect(image)
# Extract visualization data
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
bboxes = [f.bbox for f in faces]
scores = [f.confidence for f in faces]
landmarks = [f.landmarks for f in faces]
# Draw on image
draw_detections(image, bboxes, scores, landmarks, vis_threshold=0.6)
draw_detections(
image=image,
bboxes=bboxes,
scores=scores,
landmarks=landmarks,
vis_threshold=0.6,
)
# Save result
cv2.imwrite("output.jpg", image)
@@ -106,22 +113,23 @@ faces2 = detector.detect(image2)
if faces1 and faces2:
# Extract embeddings
emb1 = recognizer.get_normalized_embedding(image1, faces1[0]['landmarks'])
emb2 = recognizer.get_normalized_embedding(image2, faces2[0]['landmarks'])
emb1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
emb2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)
# Compute similarity (cosine similarity)
similarity = np.dot(emb1, emb2.T)[0][0]
# Interpret result
if similarity > 0.6:
print(f"Same person (similarity: {similarity:.3f})")
print(f"Same person (similarity: {similarity:.3f})")
else:
print(f"Different people (similarity: {similarity:.3f})")
print(f"Different people (similarity: {similarity:.3f})")
else:
print("No faces detected")
```
**Similarity thresholds:**
- `> 0.6`: Same person (high confidence)
- `0.4 - 0.6`: Uncertain (manual review)
- `< 0.4`: Different people
@@ -151,10 +159,15 @@ while True:
faces = detector.detect(frame)
# Draw results
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(frame, bboxes, scores, landmarks)
bboxes = [f.bbox for f in faces]
scores = [f.confidence for f in faces]
landmarks = [f.landmarks for f in faces]
draw_detections(
image=frame,
bboxes=bboxes,
scores=scores,
landmarks=landmarks,
)
# Show frame
cv2.imshow("UniFace - Press 'q' to quit", frame)
@@ -186,11 +199,13 @@ faces = detector.detect(image)
# Predict attributes
for i, face in enumerate(faces):
gender, age = age_gender.predict(image, face['bbox'])
print(f"Face {i+1}: {gender}, {age} years old")
gender, age = age_gender.predict(image, face.bbox)
gender_str = 'Female' if gender == 0 else 'Male'
print(f"Face {i+1}: {gender_str}, {age} years old")
```
**Output:**
```
Face 1: Male, 32 years old
Face 2: Female, 28 years old
@@ -215,7 +230,7 @@ image = cv2.imread("photo.jpg")
faces = detector.detect(image)
if faces:
landmarks = landmarker.get_landmarks(image, faces[0]['bbox'])
landmarks = landmarker.get_landmarks(image, faces[0].bbox)
print(f"Detected {len(landmarks)} landmarks")
# Draw landmarks
@@ -227,7 +242,223 @@ if faces:
---
## 7. Batch Processing (3 minutes)
## 7. Gaze Estimation (2 minutes)
Estimate where a person is looking:
```python
import cv2
import numpy as np
from uniface import RetinaFace, MobileGaze
from uniface.visualization import draw_gaze
# Initialize models
detector = RetinaFace()
gaze_estimator = MobileGaze()
# Load image
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
# Estimate gaze for each face
for i, face in enumerate(faces):
x1, y1, x2, y2 = map(int, face.bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size > 0:
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f"Face {i+1}: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
# Draw gaze direction
draw_gaze(image, face.bbox, pitch, yaw)
cv2.imwrite("gaze_output.jpg", image)
```
**Output:**
```
Face 1: pitch=5.2°, yaw=-12.3°
Face 2: pitch=-8.1°, yaw=15.7°
```
---
## 8. Face Parsing (2 minutes)
Segment face into semantic components (skin, eyes, nose, mouth, hair, etc.):
```python
import cv2
import numpy as np
from uniface.parsing import BiSeNet
from uniface.visualization import vis_parsing_maps
# Initialize parser
parser = BiSeNet() # Uses ResNet18 by default
# Load face image (already cropped)
face_image = cv2.imread("face.jpg")
# Parse face into 19 components
mask = parser.parse(face_image)
# Visualize with overlay
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
# Convert back to BGR for saving
vis_bgr = cv2.cvtColor(vis_result, cv2.COLOR_RGB2BGR)
cv2.imwrite("parsed_face.jpg", vis_bgr)
print(f"Detected {len(np.unique(mask))} facial components")
```
**Output:**
```
Detected 12 facial components
```
**19 Facial Component Classes:**
- Background, Skin, Eyebrows (L/R), Eyes (L/R), Eye Glasses
- Ears (L/R), Ear Ring, Nose, Mouth, Lips (Upper/Lower)
- Neck, Neck Lace, Cloth, Hair, Hat
---
## 9. Face Anonymization (2 minutes)
Automatically blur faces for privacy protection:
```python
from uniface.privacy import anonymize_faces
import cv2
# One-liner: automatic detection and blurring
image = cv2.imread("group_photo.jpg")
anonymized = anonymize_faces(image, method='pixelate')
cv2.imwrite("anonymized.jpg", anonymized)
print("Faces anonymized successfully!")
```
**Manual control with custom parameters:**
```python
from uniface import RetinaFace
from uniface.privacy import BlurFace
# Initialize detector and blurrer
detector = RetinaFace()
blurrer = BlurFace(method='gaussian', blur_strength=5.0)
# Detect and anonymize
faces = detector.detect(image)
anonymized = blurrer.anonymize(image, faces)
cv2.imwrite("output.jpg", anonymized)
```
**Available blur methods:**
```python
# Pixelation (news media standard)
blurrer = BlurFace(method='pixelate', pixel_blocks=8)
# Gaussian blur (smooth, natural)
blurrer = BlurFace(method='gaussian', blur_strength=4.0)
# Black boxes (maximum privacy)
blurrer = BlurFace(method='blackout', color=(0, 0, 0))
# Elliptical blur (natural face shape)
blurrer = BlurFace(method='elliptical', blur_strength=3.0, margin=30)
# Median blur (edge-preserving)
blurrer = BlurFace(method='median', blur_strength=3.0)
```
**Webcam anonymization:**
```python
import cv2
from uniface import RetinaFace
from uniface.privacy import BlurFace
detector = RetinaFace()
blurrer = BlurFace(method='pixelate')
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
faces = detector.detect(frame)
frame = blurrer.anonymize(frame, faces, inplace=True)
cv2.imshow('Anonymized', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
```
**Command-line tool:**
```bash
# Anonymize image with pixelation
python scripts/run_anonymization.py --image photo.jpg
# Real-time webcam anonymization
python scripts/run_anonymization.py --webcam --method gaussian
# Custom blur strength
python scripts/run_anonymization.py --image photo.jpg --method gaussian --blur-strength 5.0
```
---
## 10. Face Anti-Spoofing (2 minutes)
Detect if a face is real or fake (photo, video replay, mask):
```python
from uniface import RetinaFace
from uniface.spoofing import MiniFASNet
detector = RetinaFace()
spoofer = MiniFASNet() # Uses V2 by default
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
for i, face in enumerate(faces):
label_idx, score = spoofer.predict(image, face.bbox)
# label_idx: 0 = Fake, 1 = Real
label = 'Real' if label_idx == 1 else 'Fake'
print(f"Face {i+1}: {label} ({score:.1%})")
```
**Output:**
```
Face 1: Real (98.5%)
```
**Command-line tool:**
```bash
# Image
python scripts/run_spoofing.py --image photo.jpg
# Webcam
python scripts/run_spoofing.py --source 0
```
---
## 11. Batch Processing (3 minutes)
Process multiple images:
@@ -260,35 +491,86 @@ print("Done!")
---
## 8. Model Selection
## 12. Model Selection
Choose the right model for your use case:
### Detection Models
```python
from uniface import create_detector
from uniface.constants import RetinaFaceWeights, SCRFDWeights
from uniface.detection import RetinaFace, SCRFD, YOLOv5Face
from uniface.constants import RetinaFaceWeights, SCRFDWeights, YOLOv5FaceWeights
# Fast detection (mobile/edge devices)
detector = create_detector(
'retinaface',
detector = RetinaFace(
model_name=RetinaFaceWeights.MNET_025,
conf_thresh=0.7
)
# Balanced (recommended)
detector = create_detector(
'retinaface',
detector = RetinaFace(
model_name=RetinaFaceWeights.MNET_V2
)
# Real-time with high accuracy
detector = YOLOv5Face(
model_name=YOLOv5FaceWeights.YOLOV5S,
conf_thresh=0.6,
nms_thresh=0.5
)
# High accuracy (server/GPU)
detector = create_detector(
'scrfd',
detector = SCRFD(
model_name=SCRFDWeights.SCRFD_10G_KPS,
conf_thresh=0.5
)
```
### Recognition Models
```python
from uniface import ArcFace, MobileFace, SphereFace
from uniface.constants import MobileFaceWeights, SphereFaceWeights
# ArcFace (recommended for most use cases)
recognizer = ArcFace() # Best accuracy
# MobileFace (lightweight for mobile/edge)
recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2) # Fast, small size
# SphereFace (angular margin approach)
recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20) # Alternative method
```
### Gaze Estimation Models
```python
from uniface import MobileGaze
from uniface.constants import GazeWeights
# Default (recommended)
gaze_estimator = MobileGaze() # Uses RESNET34
# Lightweight (mobile/edge devices)
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
# High accuracy
gaze_estimator = MobileGaze(model_name=GazeWeights.RESNET50)
```
### Face Parsing Models
```python
from uniface.parsing import BiSeNet
from uniface.constants import ParsingWeights
# Default (recommended, 50.7 MB)
parser = BiSeNet() # Uses RESNET18
# Higher accuracy (89.2 MB)
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
```
---
## Common Issues
@@ -316,20 +598,22 @@ print("Available providers:", ort.get_available_providers())
### 3. Slow Performance on Mac
Make sure you installed with CoreML support:
The standard installation includes ARM64 optimizations for Apple Silicon. If performance is slow, verify you're using the ARM64 build of Python:
```bash
pip install uniface[silicon]
python -c "import platform; print(platform.machine())"
# Should show: arm64 (not x86_64)
```
### 4. Import Errors
```python
# Correct imports
from uniface import RetinaFace, ArcFace, Landmark106
from uniface.detection import create_detector
# Correct imports
from uniface.detection import RetinaFace
from uniface.recognition import ArcFace
from uniface.landmark import Landmark106
# Wrong imports
# Wrong imports
from uniface import retinaface # Module, not class
```
@@ -337,7 +621,23 @@ from uniface import retinaface # Module, not class
## Next Steps
- **Detailed Examples**: Check the [examples/](examples/) folder for Jupyter notebooks
### Jupyter Notebook Examples
Explore interactive examples for common tasks:
| Example | Description | Notebook |
|---------|-------------|----------|
| **Face Detection** | Detect faces and facial landmarks | [face_detection.ipynb](examples/face_detection.ipynb) |
| **Face Alignment** | Align and crop faces for recognition | [face_alignment.ipynb](examples/face_alignment.ipynb) |
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
| **Face Anonymization** | Blur or pixelate faces for privacy protection | [face_anonymization.ipynb](examples/face_anonymization.ipynb) |
| **Gaze Estimation** | Estimate gaze direction | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
### Additional Resources
- **Model Benchmarks**: See [MODELS.md](MODELS.md) for performance comparisons
- **Full Documentation**: Read [README.md](README.md) for complete API reference
@@ -346,10 +646,8 @@ from uniface import retinaface # Module, not class
## References
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch)
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference)
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
---
Happy coding! 🚀

347
README.md
View File

@@ -1,27 +1,36 @@
# UniFace: All-in-One Face Analysis Library
<div align="center">
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
![Python](https://img.shields.io/badge/Python-3.10%2B-blue)
[![PyPI Version](https://img.shields.io/pypi/v/uniface.svg)](https://pypi.org/project/uniface/)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/uniface.svg)](https://pypi.org/project/uniface/)
[![CI](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
[![Downloads](https://pepy.tech/badge/uniface)](https://pepy.tech/project/uniface)
[![Downloads](https://static.pepy.tech/badge/uniface)](https://pepy.tech/project/uniface)
[![DeepWiki](https://img.shields.io/badge/DeepWiki-AI_Docs-blue.svg?logo=bookstack)](https://deepwiki.com/yakhyo/uniface)
</div>
<div align="center">
<img src=".github/logos/logo_web.webp" width=75%>
</div>
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, and attribute analysis with hardware acceleration support across platforms.
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, face parsing, gaze estimation, and attribute analysis with hardware acceleration support across platforms.
---
## Features
- **High-Speed Face Detection**: ONNX-optimized RetinaFace and SCRFD models
- **High-Speed Face Detection**: ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models
- **Facial Landmark Detection**: Accurate 106-point landmark localization
- **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
- **Face Parsing**: BiSeNet-based semantic segmentation with 19 facial component classes
- **Gaze Estimation**: Real-time gaze direction prediction with MobileGaze
- **Attribute Analysis**: Age, gender, and emotion detection
- **Anti-Spoofing**: Face liveness detection with MiniFASNet models
- **Face Anonymization**: Privacy-preserving face blurring with 5 methods (pixelate, gaussian, blackout, elliptical, median)
- **Face Alignment**: Precise alignment for downstream tasks
- **Hardware Acceleration**: CoreML (Apple Silicon), CUDA (NVIDIA), CPU fallback
- **Hardware Acceleration**: ARM64 optimizations (Apple Silicon), CUDA (NVIDIA), CPU fallback
- **Simple API**: Intuitive factory functions and clean interfaces
- **Production-Ready**: Type hints, comprehensive logging, PEP8 compliant
@@ -39,31 +48,24 @@ pip install uniface
#### macOS (Apple Silicon - M1/M2/M3/M4)
For optimal performance with **CoreML acceleration** (3-5x faster):
For Apple Silicon Macs, the standard installation automatically includes optimized ARM64 support:
```bash
# Standard installation (CPU only)
pip install uniface
# With CoreML acceleration (recommended for M-series chips)
pip install uniface[silicon]
```
**Verify CoreML is available:**
```python
import onnxruntime as ort
print(ort.get_available_providers())
# Should show: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
```
The base `onnxruntime` package (included with uniface) has native Apple Silicon support with ARM64 optimizations built-in since version 1.13+.
#### Linux/Windows with NVIDIA GPU
For CUDA acceleration on NVIDIA GPUs:
```bash
# With CUDA acceleration
pip install uniface[gpu]
```
**Requirements:**
- CUDA 11.x or 12.x
- cuDNN 8.x
- See [ONNX Runtime GPU requirements](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html)
@@ -103,9 +105,9 @@ faces = detector.detect(image)
# Process results
for face in faces:
bbox = face['bbox'] # [x1, y1, x2, y2]
confidence = face['confidence']
landmarks = face['landmarks'] # 5-point landmarks
bbox = face.bbox # np.ndarray [x1, y1, x2, y2]
confidence = face.confidence
landmarks = face.landmarks # np.ndarray (5, 2) landmarks
print(f"Face detected with confidence: {confidence:.2f}")
```
@@ -123,8 +125,8 @@ recognizer = ArcFace()
faces1 = detector.detect(image1)
faces2 = detector.detect(image2)
embedding1 = recognizer.get_normalized_embedding(image1, faces1[0]['landmarks'])
embedding2 = recognizer.get_normalized_embedding(image2, faces2[0]['landmarks'])
embedding1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
embedding2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)
# Compare faces
similarity = compute_similarity(embedding1, embedding2)
@@ -140,7 +142,7 @@ detector = RetinaFace()
landmarker = Landmark106()
faces = detector.detect(image)
landmarks = landmarker.get_landmarks(image, faces[0]['bbox'])
landmarks = landmarker.get_landmarks(image, faces[0].bbox)
# Returns 106 (x, y) landmark points
```
@@ -153,10 +155,125 @@ detector = RetinaFace()
age_gender = AgeGender()
faces = detector.detect(image)
gender, age = age_gender.predict(image, faces[0]['bbox'])
print(f"{gender}, {age} years old")
gender, age = age_gender.predict(image, faces[0].bbox)
gender_str = 'Female' if gender == 0 else 'Male'
print(f"{gender_str}, {age} years old")
```
### Gaze Estimation
```python
from uniface import RetinaFace, MobileGaze
from uniface.visualization import draw_gaze
import numpy as np
detector = RetinaFace()
gaze_estimator = MobileGaze()
faces = detector.detect(image)
for face in faces:
x1, y1, x2, y2 = map(int, face.bbox[:4])
face_crop = image[y1:y2, x1:x2]
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f"Gaze: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
# Visualize
draw_gaze(image, face.bbox, pitch, yaw)
```
### Face Parsing
```python
from uniface.parsing import BiSeNet
from uniface.visualization import vis_parsing_maps
# Initialize parser
parser = BiSeNet() # Uses ResNet18 by default
# Parse face image (already cropped)
mask = parser.parse(face_image)
# Visualize with overlay
import cv2
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
# mask contains 19 classes: skin, eyes, nose, mouth, hair, etc.
print(f"Unique classes: {len(np.unique(mask))}")
```
### Face Anti-Spoofing
Detect if a face is real or fake (photo, video replay, mask):
```python
from uniface import RetinaFace
from uniface.spoofing import MiniFASNet
detector = RetinaFace()
spoofer = MiniFASNet() # Uses V2 by default
faces = detector.detect(image)
for face in faces:
label_idx, score = spoofer.predict(image, face.bbox)
# label_idx: 0 = Fake, 1 = Real
label = 'Real' if label_idx == 1 else 'Fake'
print(f"{label}: {score:.1%}")
```
### Face Anonymization
Protect privacy by blurring or pixelating faces with 5 different methods:
```python
from uniface import RetinaFace
from uniface.privacy import BlurFace, anonymize_faces
import cv2
# Method 1: One-liner with automatic detection
image = cv2.imread("photo.jpg")
anonymized = anonymize_faces(image, method='pixelate')
cv2.imwrite("anonymized.jpg", anonymized)
# Method 2: Manual control with custom parameters
detector = RetinaFace()
blurrer = BlurFace(method='gaussian', blur_strength=5.0)
faces = detector.detect(image)
anonymized = blurrer.anonymize(image, faces)
# Available blur methods:
methods = {
'pixelate': BlurFace(method='pixelate', pixel_blocks=10), # Blocky effect (news media standard)
'gaussian': BlurFace(method='gaussian', blur_strength=3.0), # Smooth, natural blur
'blackout': BlurFace(method='blackout', color=(0, 0, 0)), # Solid color boxes (maximum privacy)
'elliptical': BlurFace(method='elliptical', margin=20), # Soft oval blur (natural face shape)
'median': BlurFace(method='median', blur_strength=3.0) # Edge-preserving blur
}
# Real-time webcam anonymization
cap = cv2.VideoCapture(0)
detector = RetinaFace()
blurrer = BlurFace(method='pixelate')
while True:
ret, frame = cap.read()
if not ret:
break
faces = detector.detect(frame)
frame = blurrer.anonymize(frame, faces, inplace=True)
cv2.imshow('Anonymized', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
```
---
## Documentation
@@ -172,29 +289,34 @@ print(f"{gender}, {age} years old")
### Factory Functions (Recommended)
```python
from uniface import create_detector, create_recognizer, create_landmarker
from uniface.detection import RetinaFace, SCRFD
from uniface.recognition import ArcFace
from uniface.landmark import Landmark106
from uniface.privacy import BlurFace, anonymize_faces
from uniface.constants import SCRFDWeights
# Create detector with default settings
detector = create_detector('retinaface')
detector = RetinaFace()
# Create with custom config
detector = create_detector(
'scrfd',
model_name='scrfd_10g_kps',
conf_thresh=0.8,
detector = SCRFD(
model_name=SCRFDWeights.SCRFD_10G_KPS, # SCRFDWeights.SCRFD_500M_KPS
conf_thresh=0.4,
input_size=(640, 640)
)
# Or with defaults settings: detector = SCRFD()
# Recognition and landmarks
recognizer = create_recognizer('arcface')
landmarker = create_landmarker('2d106det')
recognizer = ArcFace()
landmarker = Landmark106()
```
### Direct Model Instantiation
```python
from uniface import RetinaFace, SCRFD, ArcFace, MobileFace
from uniface.constants import RetinaFaceWeights
from uniface import RetinaFace, SCRFD, YOLOv5Face, ArcFace, MobileFace, SphereFace
from uniface.constants import RetinaFaceWeights, YOLOv5FaceWeights
# Detection
detector = RetinaFace(
@@ -202,10 +324,20 @@ detector = RetinaFace(
conf_thresh=0.5,
nms_thresh=0.4
)
# Or detector = RetinaFace()
# YOLOv5-Face detection
detector = YOLOv5Face(
model_name=YOLOv5FaceWeights.YOLOV5S,
conf_thresh=0.6,
nms_thresh=0.5
)
# Or detector = YOLOv5Face
# Recognition
recognizer = ArcFace() # Uses default weights
recognizer = MobileFace() # Lightweight alternative
recognizer = SphereFace() # Angular softmax alternative
```
### High-Level Detection API
@@ -214,26 +346,74 @@ recognizer = MobileFace() # Lightweight alternative
from uniface import detect_faces
# One-line face detection
faces = detect_faces(image, method='retinaface', conf_thresh=0.8)
faces = detect_faces(image, method='retinaface', conf_thresh=0.8) # methods: retinaface, scrfd, yolov5face
```
### Key Parameters (quick reference)
**Detection**
| Class | Key params (defaults) | Notes |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| `RetinaFace` | `model_name=RetinaFaceWeights.MNET_V2`, `conf_thresh=0.5`, `nms_thresh=0.4`, `input_size=(640, 640)`, `dynamic_size=False` | Supports 5-point landmarks |
| `SCRFD` | `model_name=SCRFDWeights.SCRFD_10G_KPS`, `conf_thresh=0.5`, `nms_thresh=0.4`, `input_size=(640, 640)` | Supports 5-point landmarks |
| `YOLOv5Face` | `model_name=YOLOv5FaceWeights.YOLOV5S`, `conf_thresh=0.6`, `nms_thresh=0.5`, `input_size=640` (fixed) | Supports 5-point landmarks; models: YOLOV5N/S/M; `input_size` must be 640 |
**Recognition**
| Class | Key params (defaults) | Notes |
| -------------- | ----------------------------------------- | ------------------------------------- |
| `ArcFace` | `model_name=ArcFaceWeights.MNET` | Returns 512-dim normalized embeddings |
| `MobileFace` | `model_name=MobileFaceWeights.MNET_V2` | Lightweight embeddings |
| `SphereFace` | `model_name=SphereFaceWeights.SPHERE20` | Angular softmax variant |
**Landmark & Attributes**
| Class | Key params (defaults) | Notes |
| --------------- | --------------------------------------------------------------------- | --------------------------------------- |
| `Landmark106` | No required params | 106-point landmarks |
| `AgeGender` | `model_name=AgeGenderWeights.DEFAULT`; `input_size` auto-detected | Requires bbox; ONNXRuntime |
| `Emotion` | `model_weights=DDAMFNWeights.AFFECNET7`, `input_size=(112, 112)` | Requires 5-point landmarks; TorchScript |
**Gaze Estimation**
| Class | Key params (defaults) | Notes |
| ------------- | ------------------------------------------ | ------------------------------------ |
| `MobileGaze` | `model_name=GazeWeights.RESNET34` | Returns (pitch, yaw) angles in radians; trained on Gaze360 |
**Face Parsing**
| Class | Key params (defaults) | Notes |
| ---------- | ---------------------------------------- | ------------------------------------ |
| `BiSeNet` | `model_name=ParsingWeights.RESNET18`, `input_size=(512, 512)` | 19 facial component classes; BiSeNet architecture with ResNet backbone |
**Anti-Spoofing**
| Class | Key params (defaults) | Notes |
| ------------- | ----------------------------------------- | ------------------------------------ |
| `MiniFASNet` | `model_name=MiniFASNetWeights.V2` | Returns (label_idx, score); 0=Fake, 1=Real |
---
## Model Performance
### Face Detection (WIDER FACE Dataset)
| Model | Easy | Medium | Hard | Use Case |
|--------------------|--------|--------|--------|-------------------------|
| retinaface_mnet025 | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| retinaface_mnet_v2 | 91.70% | 91.03% | 86.60% | Balanced (recommended) |
| retinaface_r34 | 94.16% | 93.12% | 88.90% | High accuracy |
| scrfd_500m | 90.57% | 88.12% | 68.51% | Real-time applications |
| scrfd_10g | 95.16% | 93.87% | 83.05% | Best accuracy/speed |
| Model | Easy | Medium | Hard | Use Case |
| ------------------ | ------ | ------ | ------ | ---------------------- |
| retinaface_mnet025 | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
| retinaface_mnet_v2 | 91.70% | 91.03% | 86.60% | Balanced (recommended) |
| retinaface_r34 | 94.16% | 93.12% | 88.90% | High accuracy |
| scrfd_500m | 90.57% | 88.12% | 68.51% | Real-time applications |
| scrfd_10g | 95.16% | 93.87% | 83.05% | Best accuracy/speed |
| yolov5n_face | 93.61% | 91.52% | 80.53% | Lightweight/Mobile |
| yolov5s_face | 94.33% | 92.61% | 83.15% | Real-time + accuracy |
| yolov5m_face | 95.30% | 93.76% | 85.28% | High accuracy |
*Accuracy values from original papers: [RetinaFace](https://arxiv.org/abs/1905.00641), [SCRFD](https://arxiv.org/abs/2105.04714)*
_Accuracy values from original papers: [RetinaFace](https://arxiv.org/abs/1905.00641), [SCRFD](https://arxiv.org/abs/2105.04714), [YOLOv5-Face](https://arxiv.org/abs/2105.12931)_
**Benchmark on your hardware:**
```bash
python scripts/run_detection.py --image assets/test.jpg --iterations 100
```
@@ -248,6 +428,21 @@ See [MODELS.md](MODELS.md) for detailed model information and selection guide.
## Examples
### Jupyter Notebooks
Interactive examples covering common face analysis tasks:
| Example | Description | Notebook |
|---------|-------------|----------|
| **Face Detection** | Detect faces and facial landmarks | [face_detection.ipynb](examples/face_detection.ipynb) |
| **Face Alignment** | Align and crop faces for recognition | [face_alignment.ipynb](examples/face_alignment.ipynb) |
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
| **Face Anonymization** | Blur or pixelate faces for privacy protection | [face_anonymization.ipynb](examples/face_anonymization.ipynb) |
| **Gaze Estimation** | Estimate gaze direction from face images | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
### Webcam Face Detection
```python
@@ -266,11 +461,17 @@ while True:
faces = detector.detect(frame)
# Extract data for visualization
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
bboxes = [f.bbox for f in faces]
scores = [f.confidence for f in faces]
landmarks = [f.landmarks for f in faces]
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=0.6)
draw_detections(
image=frame,
bboxes=bboxes,
scores=scores,
landmarks=landmarks,
vis_threshold=0.6,
)
cv2.imshow("Face Detection", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
@@ -296,7 +497,7 @@ for person_id, image_path in person_images.items():
faces = detector.detect(image)
if faces:
embedding = recognizer.get_normalized_embedding(
image, faces[0]['landmarks']
image, faces[0].landmarks
)
database[person_id] = embedding
@@ -305,7 +506,7 @@ query_image = cv2.imread("query.jpg")
query_faces = detector.detect(query_image)
if query_faces:
query_embedding = recognizer.get_normalized_embedding(
query_image, query_faces[0]['landmarks']
query_image, query_faces[0].landmarks
)
# Find best match
@@ -400,12 +601,29 @@ pip install -e ".[dev]"
# Run tests
pytest
# Format code
black uniface/
isort uniface/
```
### Code Formatting
This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
```bash
# Format code
ruff format .
# Check for linting errors
ruff check .
# Auto-fix linting errors
ruff check . --fix
```
Ruff configuration is in `pyproject.toml`. Key settings:
- Line length: 120
- Python target: 3.10+
- Import sorting: `uniface` as first-party
### Project Structure
```
@@ -414,7 +632,11 @@ uniface/
│ ├── detection/ # Face detection models
│ ├── recognition/ # Face recognition models
│ ├── landmark/ # Landmark detection
│ ├── parsing/ # Face parsing
│ ├── gaze/ # Gaze estimation
│ ├── attribute/ # Age, gender, emotion
│ ├── spoofing/ # Face anti-spoofing
│ ├── privacy/ # Face anonymization & blurring
│ ├── onnx_utils.py # ONNX Runtime utilities
│ ├── model_store.py # Model download & caching
│ └── visualization.py # Drawing utilities
@@ -427,21 +649,14 @@ uniface/
## References
### Model Training & Architectures
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet face parsing training code and pretrained weights
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
### Papers
- **RetinaFace**: [Single-Shot Multi-Level Face Localisation in the Wild](https://arxiv.org/abs/1905.00641)
- **SCRFD**: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714)
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
---
## Contributing
Contributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/yakhyo/uniface).

BIN
assets/einstien.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

BIN
assets/scientists.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

387
examples/face_parsing.ipynb Normal file

File diff suppressed because one or more lines are too long

368
examples/face_search.ipynb Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,12 +1,44 @@
[project]
name = "uniface"
version = "0.1.9"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Age, and Gender Detection"
version = "1.6.0"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
readme = "README.md"
license = { text = "MIT" }
authors = [
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }
authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
maintainers = [
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" },
]
requires-python = ">=3.11,<3.14"
keywords = [
"face-detection",
"face-recognition",
"facial-landmarks",
"face-parsing",
"face-segmentation",
"gaze-estimation",
"age-detection",
"gender-detection",
"computer-vision",
"deep-learning",
"onnx",
"onnxruntime",
"face-analysis",
"bisenet",
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]
dependencies = [
"numpy>=1.21.0",
"opencv-python>=4.5.0",
@@ -14,25 +46,56 @@ dependencies = [
"onnxruntime>=1.16.0",
"scikit-image>=0.19.0",
"requests>=2.28.0",
"tqdm>=4.64.0"
"tqdm>=4.64.0",
]
requires-python = ">=3.10"
[project.optional-dependencies]
dev = ["pytest>=7.0.0"]
dev = ["pytest>=7.0.0", "ruff>=0.4.0"]
gpu = ["onnxruntime-gpu>=1.16.0"]
silicon = ["onnxruntime-silicon>=1.16.0"]
[project.urls]
Homepage = "https://github.com/yakhyo/uniface"
Repository = "https://github.com/yakhyo/uniface"
Documentation = "https://github.com/yakhyo/uniface/blob/main/README.md"
"Quick Start" = "https://github.com/yakhyo/uniface/blob/main/QUICKSTART.md"
"Model Zoo" = "https://github.com/yakhyo/uniface/blob/main/MODELS.md"
[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"
[tool.setuptools]
packages = { find = {} }
packages = { find = { where = ["."], include = ["uniface*"] } }
[tool.setuptools.package-data]
"uniface" = ["*.txt", "*.md"]
uniface = ["py.typed"]
[tool.ruff]
line-length = 120
target-version = "py311"
exclude = [
".git",
".ruff_cache",
"__pycache__",
"build",
"dist",
"*.egg-info",
".venv",
"venv",
".pytest_cache",
".mypy_cache",
"*.ipynb",
]
[tool.ruff.format]
quote-style = "single"
[tool.ruff.lint]
select = ["E", "F", "I", "W"]
[tool.ruff.lint.flake8-quotes]
docstring-quotes = "double"
[tool.ruff.lint.isort]
known-first-party = ["uniface"]

View File

@@ -1,18 +1,85 @@
### `download_model.py`
# Scripts
# Download all models
Scripts for testing UniFace features.
## Available Scripts
| Script | Description |
|--------|-------------|
| `run_detection.py` | Face detection on image or webcam |
| `run_anonymization.py` | Face anonymization/blurring for privacy |
| `run_age_gender.py` | Age and gender prediction |
| `run_emotion.py` | Emotion detection (7 or 8 emotions) |
| `run_gaze_estimation.py` | Gaze direction estimation |
| `run_landmarks.py` | 106-point facial landmark detection |
| `run_recognition.py` | Face embedding extraction and comparison |
| `run_face_analyzer.py` | Complete face analysis (detection + recognition + attributes) |
| `run_face_search.py` | Real-time face matching against reference |
| `run_video_detection.py` | Face detection on video files |
| `batch_process.py` | Batch process folder of images |
| `download_model.py` | Download model weights |
| `sha256_generate.py` | Generate SHA256 hash for model files |
## Usage Examples
```bash
python scripts/download_model.py
# Face detection
python scripts/run_detection.py --image assets/test.jpg
python scripts/run_detection.py --webcam
# Face anonymization
python scripts/run_anonymization.py --image assets/test.jpg --method pixelate
python scripts/run_anonymization.py --webcam --method gaussian
python scripts/run_anonymization.py --image photo.jpg --method pixelate --pixel-blocks 5
# Age and gender
python scripts/run_age_gender.py --image assets/test.jpg
python scripts/run_age_gender.py --webcam
# Emotion detection
python scripts/run_emotion.py --image assets/test.jpg
python scripts/run_emotion.py --webcam
# Gaze estimation
python scripts/run_gaze_estimation.py --image assets/test.jpg
python scripts/run_gaze_estimation.py --webcam
# Landmarks
python scripts/run_landmarks.py --image assets/test.jpg
python scripts/run_landmarks.py --webcam
# Face recognition (extract embedding)
python scripts/run_recognition.py --image assets/test.jpg
# Face comparison
python scripts/run_recognition.py --image1 face1.jpg --image2 face2.jpg
# Face search (match webcam against reference)
python scripts/run_face_search.py --image reference.jpg
# Video processing
python scripts/run_video_detection.py --input video.mp4 --output output.mp4
# Batch processing
python scripts/batch_process.py --input images/ --output results/
# Download models
python scripts/download_model.py --model-type retinaface
python scripts/download_model.py # downloads all
```
# Download just RESNET18
## Common Options
| Option | Description |
|--------|-------------|
| `--image` | Path to input image |
| `--webcam` | Use webcam instead of image |
| `--method` | Choose detector: `retinaface`, `scrfd`, `yolov5face` |
| `--threshold` | Visualization confidence threshold (default: 0.25) |
| `--save_dir` | Output directory (default: `outputs`) |
## Quick Test
```bash
python scripts/download_model.py --model RESNET18
python scripts/run_detection.py --image assets/test.jpg
```
### `run_inference.py`
```bash
python scripts/run_inference.py --image assets/test.jpg --model MNET_V2 --iterations 10
```

View File

@@ -1,389 +0,0 @@
# Testing Scripts Guide
Complete guide to testing all scripts in the `scripts/` directory.
---
## 📁 Available Scripts
1. **download_model.py** - Download and verify model weights
2. **run_detection.py** - Face detection on images
3. **run_recognition.py** - Face recognition (extract embeddings)
4. **run_face_search.py** - Real-time face matching with webcam
5. **sha256_generate.py** - Generate SHA256 checksums for models
---
## Testing Each Script
### 1. Test Model Download
```bash
# Download a specific model
python scripts/download_model.py --model MNET_V2
# Download all RetinaFace models (takes ~5 minutes, ~200MB)
python scripts/download_model.py
# Verify models are cached
ls -lh ~/.uniface/models/
```
**Expected Output:**
```
📥 Downloading model: retinaface_mnet_v2
2025-11-08 00:00:00 - INFO - Downloading model 'RetinaFaceWeights.MNET_V2' from https://...
Downloading ~/.uniface/models/retinaface_mnet_v2.onnx: 100%|████| 3.5M/3.5M
2025-11-08 00:00:05 - INFO - Successfully downloaded 'RetinaFaceWeights.MNET_V2'
✅ All requested weights are ready and verified.
```
---
### 2. Test Face Detection
```bash
# Basic detection
python scripts/run_detection.py --image assets/test.jpg
# With custom settings
python scripts/run_detection.py \
--image assets/test.jpg \
--method scrfd \
--threshold 0.7 \
--save_dir outputs
# Benchmark mode (100 iterations)
python scripts/run_detection.py \
--image assets/test.jpg \
--iterations 100
```
**Expected Output:**
```
Initializing detector: retinaface
2025-11-08 00:00:00 - INFO - Initializing RetinaFace with model=RetinaFaceWeights.MNET_V2...
2025-11-08 00:00:01 - INFO - CoreML acceleration enabled (Apple Silicon)
✅ Output saved at: outputs/test_out.jpg
[1/1] ⏱️ Inference time: 0.0234 seconds
```
**Verify Output:**
```bash
# Check output image was created
ls -lh outputs/test_out.jpg
# View the image (macOS)
open outputs/test_out.jpg
```
---
### 3. Test Face Recognition (Embedding Extraction)
```bash
# Extract embeddings from an image
python scripts/run_recognition.py --image assets/test.jpg
# With different models
python scripts/run_recognition.py \
--image assets/test.jpg \
--detector scrfd \
--recognizer mobileface
```
**Expected Output:**
```
Initializing detector: retinaface
Initializing recognizer: arcface
2025-11-08 00:00:00 - INFO - Successfully initialized face encoder from ~/.uniface/models/w600k_mbf.onnx
Detected 1 face(s). Extracting embeddings for the first face...
- Embedding shape: (1, 512)
- L2 norm of unnormalized embedding: 64.2341
- L2 norm of normalized embedding: 1.0000
```
---
### 4. Test Real-Time Face Search (Webcam)
**Prerequisites:**
- Webcam connected
- Reference image with a clear face
```bash
# Basic usage
python scripts/run_face_search.py --image assets/test.jpg
# With custom models
python scripts/run_face_search.py \
--image assets/test.jpg \
--detector scrfd \
--recognizer arcface
```
**Expected Behavior:**
1. Webcam window opens
2. Faces are detected in real-time
3. Green box = Match (similarity > 0.4)
4. Red box = Unknown (similarity < 0.4)
5. Press 'q' to quit
**Expected Output:**
```
Initializing models...
2025-11-08 00:00:00 - INFO - CoreML acceleration enabled (Apple Silicon)
Extracting reference embedding...
Webcam started. Press 'q' to quit.
```
**Troubleshooting:**
```bash
# If webcam doesn't open
python -c "import cv2; cap = cv2.VideoCapture(0); print('Webcam OK' if cap.isOpened() else 'Webcam FAIL')"
# If no faces detected
# - Ensure good lighting
# - Face should be frontal and clearly visible
# - Try lowering threshold: edit script line 29, change 0.4 to 0.3
```
---
### 5. Test SHA256 Generator (For Developers)
```bash
# Generate checksum for a model file
python scripts/sha256_generate.py ~/.uniface/models/retinaface_mnet_v2.onnx
# Generate for all models
for model in ~/.uniface/models/*.onnx; do
python scripts/sha256_generate.py "$model"
done
```
---
## 🔍 Quick Verification Tests
### Test 1: Imports Work
```bash
python -c "
from uniface.detection import create_detector
from uniface.recognition import create_recognizer
print('✅ Imports successful')
"
```
### Test 2: Models Download
```bash
python -c "
from uniface import RetinaFace
detector = RetinaFace()
print('✅ Model downloaded and loaded')
"
```
### Test 3: Detection Works
```bash
python -c "
import cv2
import numpy as np
from uniface import RetinaFace
detector = RetinaFace()
image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detector.detect(image)
print(f'✅ Detection works, found {len(faces)} faces')
"
```
### Test 4: Recognition Works
```bash
python -c "
import cv2
import numpy as np
from uniface import RetinaFace, ArcFace
detector = RetinaFace()
recognizer = ArcFace()
image = cv2.imread('assets/test.jpg')
faces = detector.detect(image)
if faces:
landmarks = np.array(faces[0]['landmarks'])
embedding = recognizer.get_normalized_embedding(image, landmarks)
print(f'✅ Recognition works, embedding shape: {embedding.shape}')
else:
print('⚠️ No faces detected in test image')
"
```
---
## End-to-End Test Workflow
Run this complete workflow to verify everything works:
```bash
#!/bin/bash
# Save as test_all_scripts.sh
echo "=== Testing UniFace Scripts ==="
echo ""
# Test 1: Download models
echo "1⃣ Testing model download..."
python scripts/download_model.py --model MNET_V2
if [ $? -eq 0 ]; then
echo "✅ Model download: PASS"
else
echo "❌ Model download: FAIL"
exit 1
fi
echo ""
# Test 2: Face detection
echo "2⃣ Testing face detection..."
python scripts/run_detection.py --image assets/test.jpg --save_dir /tmp/uniface_test
if [ $? -eq 0 ] && [ -f /tmp/uniface_test/test_out.jpg ]; then
echo "✅ Face detection: PASS"
else
echo "❌ Face detection: FAIL"
exit 1
fi
echo ""
# Test 3: Face recognition
echo "3⃣ Testing face recognition..."
python scripts/run_recognition.py --image assets/test.jpg > /tmp/uniface_recognition.log
if [ $? -eq 0 ] && grep -q "Embedding shape" /tmp/uniface_recognition.log; then
echo "✅ Face recognition: PASS"
else
echo "❌ Face recognition: FAIL"
exit 1
fi
echo ""
echo "=== All Tests Passed! 🎉 ==="
```
**Run the test suite:**
```bash
chmod +x test_all_scripts.sh
./test_all_scripts.sh
```
---
## Performance Benchmarking
### Benchmark Detection Speed
```bash
# Test different models
for model in retinaface scrfd; do
echo "Testing $model..."
python scripts/run_detection.py \
--image assets/test.jpg \
--method $model \
--iterations 50
done
```
### Benchmark Recognition Speed
```bash
# Test different recognizers
for recognizer in arcface mobileface; do
echo "Testing $recognizer..."
time python scripts/run_recognition.py \
--image assets/test.jpg \
--recognizer $recognizer
done
```
---
## 🐛 Common Issues
### Issue: "No module named 'uniface'"
```bash
# Solution: Install in editable mode
pip install -e .
```
### Issue: "Failed to load image"
```bash
# Check image exists
ls -lh assets/test.jpg
# Try with absolute path
python scripts/run_detection.py --image $(pwd)/assets/test.jpg
```
### Issue: "No faces detected"
```bash
# Lower confidence threshold
python scripts/run_detection.py \
--image assets/test.jpg \
--threshold 0.3
```
### Issue: Models downloading slowly
```bash
# Check internet connection
curl -I https://github.com/yakhyo/uniface/releases
# Or download manually
wget https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv2.onnx \
-O ~/.uniface/models/retinaface_mnet_v2.onnx
```
### Issue: CoreML not available on Mac
```bash
# Install CoreML-enabled ONNX Runtime
pip uninstall onnxruntime
pip install onnxruntime-silicon
# Verify
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Should show: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
```
---
## ✅ Script Status Summary
| Script | Status | API Updated | Tested |
|-----------------------|--------|-------------|--------|
| download_model.py | ✅ | ✅ | ✅ |
| run_detection.py | ✅ | ✅ | ✅ |
| run_recognition.py | ✅ | ✅ | ✅ |
| run_face_search.py | ✅ | ✅ | ✅ |
| sha256_generate.py | ✅ | N/A | ✅ |
All scripts are updated and working with the new dict-based API! 🎉
---
## 📝 Notes
- All scripts now use the factory functions (`create_detector`, `create_recognizer`)
- Scripts work with the new dict-based detection API
- Model download bug is fixed (enum vs string issue)
- CoreML acceleration is automatically detected on Apple Silicon
- All scripts include proper error handling
---
Need help with a specific script? Check the main [README.md](../README.md) or [QUICKSTART.md](../QUICKSTART.md)!

98
scripts/batch_process.py Normal file
View File

@@ -0,0 +1,98 @@
# Batch face detection on a folder of images
# Usage: python batch_process.py --input images/ --output results/
import argparse
from pathlib import Path
import cv2
from tqdm import tqdm
from uniface import SCRFD, RetinaFace
from uniface.visualization import draw_detections
def get_image_files(input_dir: Path, extensions: tuple) -> list:
files = []
for ext in extensions:
files.extend(input_dir.glob(f'*.{ext}'))
files.extend(input_dir.glob(f'*.{ext.upper()}'))
return sorted(files)
def process_image(detector, image_path: Path, output_path: Path, threshold: float) -> int:
"""Process single image. Returns face count or -1 on error."""
image = cv2.imread(str(image_path))
if image is None:
return -1
faces = detector.detect(image)
# unpack face data for visualization
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(
image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
)
cv2.putText(
image,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2,
)
cv2.imwrite(str(output_path), image)
return len(faces)
def main():
parser = argparse.ArgumentParser(description='Batch process images with face detection')
parser.add_argument('--input', type=str, required=True, help='Input directory')
parser.add_argument('--output', type=str, required=True, help='Output directory')
parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
parser.add_argument('--threshold', type=float, default=0.6, help='Visualization threshold')
parser.add_argument('--extensions', type=str, default='jpg,jpeg,png,bmp', help='Image extensions')
args = parser.parse_args()
input_path = Path(args.input)
output_path = Path(args.output)
if not input_path.exists():
print(f"Error: Input directory '{args.input}' does not exist")
return
output_path.mkdir(parents=True, exist_ok=True)
extensions = tuple(ext.strip() for ext in args.extensions.split(','))
image_files = get_image_files(input_path, extensions)
if not image_files:
print(f'No images found with extensions {extensions}')
return
print(f'Found {len(image_files)} images')
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
success, errors, total_faces = 0, 0, 0
for img_path in tqdm(image_files, desc='Processing', unit='img'):
out_path = output_path / f'{img_path.stem}_detected{img_path.suffix}'
result = process_image(detector, img_path, out_path, args.threshold)
if result >= 0:
success += 1
total_faces += result
else:
errors += 1
print(f'\nFailed: {img_path.name}')
print(f'\nDone! {success} processed, {errors} errors, {total_faces} faces total')
if __name__ == '__main__':
main()

View File

@@ -1,31 +1,60 @@
import argparse
from uniface.constants import RetinaFaceWeights
from uniface.constants import (
AgeGenderWeights,
ArcFaceWeights,
DDAMFNWeights,
LandmarkWeights,
MobileFaceWeights,
RetinaFaceWeights,
SCRFDWeights,
SphereFaceWeights,
)
from uniface.model_store import verify_model_weights
MODEL_TYPES = {
'retinaface': RetinaFaceWeights,
'sphereface': SphereFaceWeights,
'mobileface': MobileFaceWeights,
'arcface': ArcFaceWeights,
'scrfd': SCRFDWeights,
'ddamfn': DDAMFNWeights,
'agegender': AgeGenderWeights,
'landmark': LandmarkWeights,
}
def download_models(model_enum):
for weight in model_enum:
print(f'Downloading: {weight.value}')
try:
verify_model_weights(weight)
print(f' Done: {weight.value}')
except Exception as e:
print(f' Failed: {e}')
def main():
parser = argparse.ArgumentParser(description="Download and verify RetinaFace model weights.")
parser = argparse.ArgumentParser(description='Download model weights')
parser.add_argument(
"--model",
'--model-type',
type=str,
choices=[m.name for m in RetinaFaceWeights],
help="Model to download (e.g. MNET_V2). If not specified, all models will be downloaded.",
choices=list(MODEL_TYPES.keys()),
help='Model type to download. If not specified, downloads all.',
)
args = parser.parse_args()
if args.model:
weight = RetinaFaceWeights[args.model]
print(f"📥 Downloading model: {weight.value}")
verify_model_weights(weight) # Pass enum, not string
if args.model_type:
print(f'Downloading {args.model_type} models...')
download_models(MODEL_TYPES[args.model_type])
else:
print("📥 Downloading all models...")
for weight in RetinaFaceWeights:
verify_model_weights(weight) # Pass enum, not string
print('Downloading all models...')
for name, model_enum in MODEL_TYPES.items():
print(f'\n{name}:')
download_models(model_enum)
print("✅ All requested weights are ready and verified.")
print('\nDone!')
if __name__ == "__main__":
if __name__ == '__main__':
main()

130
scripts/run_age_gender.py Normal file
View File

@@ -0,0 +1,130 @@
# Age and gender prediction on detected faces
# Usage: python run_age_gender.py --image path/to/image.jpg
# python run_age_gender.py --webcam
import argparse
import os
from pathlib import Path
import cv2
from uniface import SCRFD, AgeGender, RetinaFace
from uniface.visualization import draw_detections
def draw_age_gender_label(image, bbox, gender_id: int, age: int):
"""Draw age/gender label above the bounding box."""
x1, y1 = int(bbox[0]), int(bbox[1])
gender_str = 'Female' if gender_id == 0 else 'Male'
text = f'{gender_str}, {age}y'
(tw, th), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
cv2.rectangle(image, (x1, y1 - th - 10), (x1 + tw + 10, y1), (0, 255, 0), -1)
cv2.putText(image, text, (x1 + 5, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 2)
def process_image(
detector,
age_gender,
image_path: str,
save_dir: str = 'outputs',
threshold: float = 0.6,
):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
if not faces:
return
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(
image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
)
for i, face in enumerate(faces):
gender_id, age = age_gender.predict(image, face['bbox'])
gender_str = 'Female' if gender_id == 0 else 'Male'
print(f' Face {i + 1}: {gender_str}, {age} years old')
draw_age_gender_label(image, face['bbox'], gender_id, age)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_age_gender.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def run_webcam(detector, age_gender, threshold: float = 0.6):
cap = cv2.VideoCapture(0) # 0 = default webcam
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1) # mirror for natural interaction
if not ret:
break
faces = detector.detect(frame)
# unpack face data for visualization
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(
image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
)
for face in faces:
gender_id, age = age_gender.predict(frame, face['bbox']) # predict per face
draw_age_gender_label(frame, face['bbox'], gender_id, age)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2,
)
cv2.imshow('Age & Gender Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description='Run age and gender detection')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam')
parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
parser.add_argument('--threshold', type=float, default=0.6, help='Visualization threshold')
parser.add_argument('--save_dir', type=str, default='outputs')
args = parser.parse_args()
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
age_gender = AgeGender()
if args.webcam:
run_webcam(detector, age_gender, args.threshold)
else:
process_image(detector, age_gender, args.image, args.save_dir, args.threshold)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,207 @@
# Face anonymization/blurring for privacy
# Usage: python run_anonymization.py --image path/to/image.jpg --method pixelate
# python run_anonymization.py --webcam --method gaussian
import argparse
import os
import cv2
from uniface import RetinaFace
from uniface.privacy import BlurFace
def process_image(
detector,
blurrer: BlurFace,
image_path: str,
save_dir: str = 'outputs',
show_detections: bool = False,
):
"""Process a single image."""
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
# Detect faces
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
# Optionally draw detection boxes before blurring
if show_detections and faces:
from uniface.visualization import draw_detections
preview = image.copy()
bboxes = [face['bbox'] for face in faces]
scores = [face['confidence'] for face in faces]
landmarks = [face['landmarks'] for face in faces]
draw_detections(preview, bboxes, scores, landmarks)
# Show preview
cv2.imshow('Detections (Press any key to continue)', preview)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Anonymize faces
if faces:
anonymized = blurrer.anonymize(image, faces)
else:
anonymized = image
# Save output
os.makedirs(save_dir, exist_ok=True)
basename = os.path.splitext(os.path.basename(image_path))[0]
output_path = os.path.join(save_dir, f'{basename}_anonymized.jpg')
cv2.imwrite(output_path, anonymized)
print(f'Output saved: {output_path}')
def run_webcam(detector, blurrer: BlurFace):
"""Run real-time anonymization on webcam."""
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1) # mirror for natural interaction
if not ret:
break
# Detect and anonymize
faces = detector.detect(frame)
if faces:
frame = blurrer.anonymize(frame, faces, inplace=True)
# Display info
cv2.putText(
frame,
f'Faces blurred: {len(faces)} | Method: {blurrer.method}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0),
2,
)
cv2.imshow('Face Anonymization (Press q to quit)', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(
description='Face anonymization using various blur methods',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Anonymize image with pixelation (default)
python run_anonymization.py --image photo.jpg
# Use Gaussian blur with custom strength
python run_anonymization.py --image photo.jpg --method gaussian --blur-strength 5.0
# Real-time webcam anonymization
python run_anonymization.py --webcam --method pixelate
# Black boxes for maximum privacy
python run_anonymization.py --image photo.jpg --method blackout
# Custom pixelation intensity
python run_anonymization.py --image photo.jpg --method pixelate --pixel-blocks 5
""",
)
# Input/output
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam for real-time anonymization')
parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory (default: outputs)')
# Blur method
parser.add_argument(
'--method',
type=str,
default='pixelate',
choices=['gaussian', 'pixelate', 'blackout', 'elliptical', 'median'],
help='Blur method (default: pixelate)',
)
# Method-specific parameters
parser.add_argument(
'--blur-strength',
type=float,
default=3.0,
help='Blur strength for gaussian/elliptical/median (default: 3.0)',
)
parser.add_argument(
'--pixel-blocks',
type=int,
default=20,
help='Number of pixel blocks for pixelate (default: 10, lower=more pixelated)',
)
parser.add_argument(
'--color',
type=str,
default='0,0,0',
help='Fill color for blackout as R,G,B (default: 0,0,0 for black)',
)
parser.add_argument('--margin', type=int, default=20, help='Margin for elliptical blur (default: 20)')
# Detection
parser.add_argument(
'--conf-thresh',
type=float,
default=0.5,
help='Detection confidence threshold (default: 0.5)',
)
# Visualization
parser.add_argument(
'--show-detections',
action='store_true',
help='Show detection boxes before blurring (image mode only)',
)
args = parser.parse_args()
# Validate input
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
# Parse color
color_values = [int(x) for x in args.color.split(',')]
if len(color_values) != 3:
parser.error('--color must be in format R,G,B (e.g., 0,0,0)')
color = tuple(color_values)
# Initialize detector
print(f'Initializing face detector (conf_thresh={args.conf_thresh})...')
detector = RetinaFace(conf_thresh=args.conf_thresh)
# Initialize blurrer
print(f'Initializing blur method: {args.method}')
blurrer = BlurFace(
method=args.method,
blur_strength=args.blur_strength,
pixel_blocks=args.pixel_blocks,
color=color,
margin=args.margin,
)
# Run
if args.webcam:
run_webcam(detector, blurrer)
else:
process_image(detector, blurrer, args.image, args.save_dir, args.show_detections)
if __name__ == '__main__':
main()

View File

@@ -1,87 +1,110 @@
import os
import cv2
import time
import argparse
import numpy as np
# Face detection on image or webcam
# Usage: python run_detection.py --image path/to/image.jpg
# python run_detection.py --webcam
# UPDATED: Use the factory function and import from the new location
from uniface.detection import create_detector
import argparse
import os
import cv2
from uniface.detection import SCRFD, RetinaFace, YOLOv5Face
from uniface.visualization import draw_detections
def run_inference(detector, image_path: str, vis_threshold: float = 0.6, save_dir: str = "outputs"):
"""
Run face detection on a single image.
Args:
detector: Initialized face detector.
image_path (str): Path to input image.
vis_threshold (float): Threshold for drawing detections.
save_dir (str): Directory to save output image.
"""
def process_image(detector, image_path: str, threshold: float = 0.6, save_dir: str = 'outputs'):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
print(f"Error: Failed to load image from '{image_path}'")
return
# 1. Get the list of face dictionaries from the detector
faces = detector.detect(image)
if faces:
# 2. Unpack the data into separate lists
bboxes = [face['bbox'] for face in faces]
scores = [face['confidence'] for face in faces]
landmarks = [face['landmarks'] for face in faces]
# 3. Pass the unpacked lists to the drawing function
draw_detections(image, bboxes, scores, landmarks, vis_threshold=0.6)
draw_detections(image, bboxes, scores, landmarks, vis_threshold=threshold)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f"{os.path.splitext(os.path.basename(image_path))[0]}_out.jpg")
output_path = os.path.join(save_dir, f'{os.path.splitext(os.path.basename(image_path))[0]}_out.jpg')
cv2.imwrite(output_path, image)
print(f"Output saved at: {output_path}")
print(f'Output saved: {output_path}')
def run_webcam(detector, threshold: float = 0.6):
cap = cv2.VideoCapture(0) # 0 = default webcam
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1) # mirror for natural interaction
if not ret:
break
faces = detector.detect(frame)
# unpack face data for visualization
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(
image=frame,
bboxes=bboxes,
scores=scores,
landmarks=landmarks,
vis_threshold=threshold,
draw_score=True,
fancy_bbox=True,
)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2,
)
cv2.imshow('Face Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description="Run face detection on an image.")
parser.add_argument("--image", type=str, required=True, help="Path to the input image")
parser.add_argument(
"--method",
type=str,
default="retinaface",
choices=['retinaface', 'scrfd'],
help="Detection method to use."
)
parser.add_argument("--threshold", type=float, default=0.6, help="Visualization confidence threshold")
parser.add_argument("--iterations", type=int, default=1, help="Number of inference runs for benchmarking")
parser.add_argument("--save_dir", type=str, default="outputs", help="Directory to save output images")
parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
parser = argparse.ArgumentParser(description='Run face detection')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam')
parser.add_argument('--method', type=str, default='retinaface', choices=['retinaface', 'scrfd', 'yolov5face'])
parser.add_argument('--threshold', type=float, default=0.25, help='Visualization threshold')
parser.add_argument('--save_dir', type=str, default='outputs')
args = parser.parse_args()
if args.verbose:
from uniface import enable_logging
enable_logging()
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
print(f"Initializing detector: {args.method}")
detector = create_detector(method=args.method)
if args.method == 'retinaface':
detector = RetinaFace()
elif args.method == 'scrfd':
detector = SCRFD()
else:
from uniface.constants import YOLOv5FaceWeights
avg_time = 0
for i in range(args.iterations):
start = time.time()
run_inference(detector, args.image, args.threshold, args.save_dir)
elapsed = time.time() - start
print(f"[{i + 1}/{args.iterations}] ⏱️ Inference time: {elapsed:.4f} seconds")
if i >= 0: # Avoid counting the first run if it includes model loading time
avg_time += elapsed
detector = YOLOv5Face(model_name=YOLOv5FaceWeights.YOLOV5M)
if args.iterations > 1:
# Adjust average calculation to exclude potential first-run overhead
effective_iterations = max(1, args.iterations)
print(
f"\n🔥 Average inference time over {effective_iterations} runs: {avg_time / effective_iterations:.4f} seconds")
if args.webcam:
run_webcam(detector, args.threshold)
else:
process_image(detector, args.image, args.threshold, args.save_dir)
if __name__ == "__main__":
if __name__ == '__main__':
main()

126
scripts/run_emotion.py Normal file
View File

@@ -0,0 +1,126 @@
# Emotion detection on detected faces
# Usage: python run_emotion.py --image path/to/image.jpg
# python run_emotion.py --webcam
import argparse
import os
from pathlib import Path
import cv2
from uniface import SCRFD, Emotion, RetinaFace
from uniface.visualization import draw_detections
def draw_emotion_label(image, bbox, emotion: str, confidence: float):
"""Draw emotion label above the bounding box."""
x1, y1 = int(bbox[0]), int(bbox[1])
text = f'{emotion} ({confidence:.2f})'
(tw, th), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
cv2.rectangle(image, (x1, y1 - th - 10), (x1 + tw + 10, y1), (255, 0, 0), -1)
cv2.putText(image, text, (x1 + 5, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2)
def process_image(
detector,
emotion_predictor,
image_path: str,
save_dir: str = 'outputs',
threshold: float = 0.6,
):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
if not faces:
return
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(
image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
)
for i, face in enumerate(faces):
emotion, confidence = emotion_predictor.predict(image, face['landmarks'])
print(f' Face {i + 1}: {emotion} (confidence: {confidence:.3f})')
draw_emotion_label(image, face['bbox'], emotion, confidence)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_emotion.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def run_webcam(detector, emotion_predictor, threshold: float = 0.6):
cap = cv2.VideoCapture(0) # 0 = default webcam
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1) # mirror for natural interaction
if not ret:
break
faces = detector.detect(frame)
# unpack face data for visualization
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=threshold)
for face in faces:
emotion, confidence = emotion_predictor.predict(frame, face['landmarks'])
draw_emotion_label(frame, face['bbox'], emotion, confidence)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2,
)
cv2.imshow('Emotion Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description='Run emotion detection')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam')
parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
parser.add_argument('--threshold', type=float, default=0.6, help='Visualization threshold')
parser.add_argument('--save_dir', type=str, default='outputs')
args = parser.parse_args()
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
emotion_predictor = Emotion()
if args.webcam:
run_webcam(detector, emotion_predictor, args.threshold)
else:
process_image(detector, emotion_predictor, args.image, args.save_dir, args.threshold)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,116 @@
# Face analysis using FaceAnalyzer
# Usage: python run_face_analyzer.py --image path/to/image.jpg
import argparse
import os
from pathlib import Path
import cv2
import numpy as np
from uniface import AgeGender, ArcFace, FaceAnalyzer, RetinaFace
from uniface.visualization import draw_detections
def draw_face_info(image, face, face_id):
"""Draw face ID and attributes above bounding box."""
x1, y1, x2, y2 = map(int, face.bbox)
lines = [f'ID: {face_id}', f'Conf: {face.confidence:.2f}']
if face.age and face.sex:
lines.append(f'{face.sex}, {face.age}y')
for i, line in enumerate(lines):
y_pos = y1 - 10 - (len(lines) - 1 - i) * 25
if y_pos < 20:
y_pos = y2 + 20 + i * 25
(tw, th), _ = cv2.getTextSize(line, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
cv2.rectangle(image, (x1, y_pos - th - 5), (x1 + tw + 10, y_pos + 5), (0, 255, 0), -1)
cv2.putText(image, line, (x1 + 5, y_pos), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 2)
def process_image(analyzer, image_path: str, save_dir: str = 'outputs', show_similarity: bool = True):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = analyzer.analyze(image)
print(f'Detected {len(faces)} face(s)')
if not faces:
return
for i, face in enumerate(faces, 1):
info = f' Face {i}: {face.sex}, {face.age}y' if face.age and face.sex else f' Face {i}'
if face.embedding is not None:
info += f' (embedding: {face.embedding.shape})'
print(info)
if show_similarity and len(faces) >= 2:
print('\nSimilarity Matrix:')
n = len(faces)
sim_matrix = np.zeros((n, n))
for i in range(n):
for j in range(i, n):
if i == j:
sim_matrix[i][j] = 1.0
else:
sim = faces[i].compute_similarity(faces[j])
sim_matrix[i][j] = sim
sim_matrix[j][i] = sim
print(' ', end='')
for i in range(n):
print(f' F{i + 1:2d} ', end='')
print('\n ' + '-' * (7 * n))
for i in range(n):
print(f'F{i + 1:2d} | ', end='')
for j in range(n):
print(f'{sim_matrix[i][j]:6.3f} ', end='')
print()
pairs = [(i, j, sim_matrix[i][j]) for i in range(n) for j in range(i + 1, n)]
pairs.sort(key=lambda x: x[2], reverse=True)
print('\nTop matches (>0.4 = same person):')
for i, j, sim in pairs[:3]:
status = 'Same' if sim > 0.4 else 'Different'
print(f' Face {i + 1} ↔ Face {j + 1}: {sim:.3f} ({status})')
bboxes = [f.bbox for f in faces]
scores = [f.confidence for f in faces]
landmarks = [f.landmarks for f in faces]
draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, fancy_bbox=True)
for i, face in enumerate(faces, 1):
draw_face_info(image, face, i)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_analysis.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def main():
parser = argparse.ArgumentParser(description='Face analysis with detection, recognition, and attributes')
parser.add_argument('--image', type=str, required=True, help='Path to input image')
parser.add_argument('--save_dir', type=str, default='outputs', help='Output directory')
parser.add_argument('--no-similarity', action='store_true', help='Skip similarity matrix computation')
args = parser.parse_args()
if not os.path.exists(args.image):
print(f'Error: Image not found: {args.image}')
return
detector = RetinaFace()
recognizer = ArcFace()
age_gender = AgeGender()
analyzer = FaceAnalyzer(detector, recognizer, age_gender)
process_image(analyzer, args.image, args.save_dir, show_similarity=not args.no_similarity)
if __name__ == '__main__':
main()

126
scripts/run_face_parsing.py Normal file
View File

@@ -0,0 +1,126 @@
# Face parsing on detected faces
# Usage: python run_face_parsing.py --image path/to/image.jpg
# python run_face_parsing.py --webcam
import argparse
import os
from pathlib import Path
import cv2
from uniface import RetinaFace
from uniface.constants import ParsingWeights
from uniface.parsing import BiSeNet
from uniface.visualization import vis_parsing_maps
def process_image(detector, parser, image_path: str, save_dir: str = 'outputs'):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
result_image = image.copy()
for i, face in enumerate(faces):
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size == 0:
continue
# Parse the face
mask = parser.parse(face_crop)
print(f' Face {i + 1}: parsed with {len(set(mask.flatten()))} unique classes')
# Visualize the parsing result
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
# Place the visualization back on the original image
result_image[y1:y2, x1:x2] = vis_result
# Draw bounding box
cv2.rectangle(result_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_parsing.jpg')
cv2.imwrite(output_path, result_image)
print(f'Output saved: {output_path}')
def run_webcam(detector, parser):
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
for face in faces:
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = frame[y1:y2, x1:x2]
if face_crop.size == 0:
continue
# Parse the face
mask = parser.parse(face_crop)
# Visualize the parsing result
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
# Place the visualization back on the frame
frame[y1:y2, x1:x2] = vis_result
# Draw bounding box
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Face Parsing', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser_arg = argparse.ArgumentParser(description='Run face parsing')
parser_arg.add_argument('--image', type=str, help='Path to input image')
parser_arg.add_argument('--webcam', action='store_true', help='Use webcam')
parser_arg.add_argument('--save_dir', type=str, default='outputs')
parser_arg.add_argument(
'--model', type=str, default=ParsingWeights.RESNET18, choices=[ParsingWeights.RESNET18, ParsingWeights.RESNET34]
)
args = parser_arg.parse_args()
if not args.image and not args.webcam:
parser_arg.error('Either --image or --webcam must be specified')
detector = RetinaFace()
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
if args.webcam:
run_webcam(detector, parser)
else:
process_image(detector, parser, args.image, args.save_dir)
if __name__ == '__main__':
main()

View File

@@ -1,68 +1,70 @@
# Real-time face search: match webcam faces against a reference image
# Usage: python run_face_search.py --image reference.jpg
import argparse
import cv2
import numpy as np
# Use the new high-level factory functions
from uniface.detection import create_detector
from uniface.detection import SCRFD, RetinaFace
from uniface.face_utils import compute_similarity
from uniface.recognition import create_recognizer
from uniface.recognition import ArcFace, MobileFace, SphereFace
def get_recognizer(name: str):
if name == 'arcface':
return ArcFace()
elif name == 'mobileface':
return MobileFace()
else:
return SphereFace()
def extract_reference_embedding(detector, recognizer, image_path: str) -> np.ndarray:
"""Extracts a normalized embedding from the first face found in an image."""
image = cv2.imread(image_path)
if image is None:
raise RuntimeError(f"Failed to load image: {image_path}")
raise RuntimeError(f'Failed to load image: {image_path}')
faces = detector.detect(image)
if not faces:
raise RuntimeError("No faces found in reference image.")
raise RuntimeError('No faces found in reference image.')
# Get landmarks from the first detected face dictionary
landmarks = np.array(faces[0]["landmarks"])
# Use normalized embedding for more reliable similarity comparison
embedding = recognizer.get_normalized_embedding(image, landmarks)
return embedding
landmarks = faces[0]['landmarks']
return recognizer.get_normalized_embedding(image, landmarks)
def run_video(detector, recognizer, ref_embedding: np.ndarray, threshold: float = 0.4):
"""Run real-time face recognition from a webcam feed."""
cap = cv2.VideoCapture(0)
def run_webcam(detector, recognizer, ref_embedding: np.ndarray, threshold: float = 0.4):
cap = cv2.VideoCapture(0) # 0 = default webcam
if not cap.isOpened():
raise RuntimeError("Webcam could not be opened.")
print("Webcam started. Press 'q' to quit.")
raise RuntimeError('Webcam could not be opened.')
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1) # mirror for natural interaction
if not ret:
break
faces = detector.detect(frame)
# Loop through each detected face
for face in faces:
# Extract bbox and landmarks from the dictionary
bbox = face["bbox"]
landmarks = np.array(face["landmarks"])
bbox = face['bbox']
landmarks = face['landmarks']
x1, y1, x2, y2 = map(int, bbox)
# Get the normalized embedding for the current face
embedding = recognizer.get_normalized_embedding(frame, landmarks)
sim = compute_similarity(ref_embedding, embedding) # compare with reference
# Compare with the reference embedding
sim = compute_similarity(ref_embedding, embedding)
# Draw results
label = f"Match ({sim:.2f})" if sim > threshold else f"Unknown ({sim:.2f})"
# green = match, red = unknown
label = f'Match ({sim:.2f})' if sim > threshold else f'Unknown ({sim:.2f})'
color = (0, 255, 0) if sim > threshold else (0, 0, 255)
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
cv2.imshow("Face Recognition", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
cv2.imshow('Face Recognition', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
@@ -70,35 +72,26 @@ def run_video(detector, recognizer, ref_embedding: np.ndarray, threshold: float
def main():
parser = argparse.ArgumentParser(description="Face recognition using a reference image.")
parser.add_argument("--image", type=str, required=True, help="Path to the reference face image.")
parser = argparse.ArgumentParser(description='Face search using a reference image')
parser.add_argument('--image', type=str, required=True, help='Reference face image')
parser.add_argument('--threshold', type=float, default=0.4, help='Match threshold')
parser.add_argument('--detector', type=str, default='scrfd', choices=['retinaface', 'scrfd'])
parser.add_argument(
"--detector", type=str, default="scrfd", choices=["retinaface", "scrfd"], help="Face detection method."
)
parser.add_argument(
"--recognizer",
'--recognizer',
type=str,
default="arcface",
choices=["arcface", "mobileface", "sphereface"],
help="Face recognition method.",
default='arcface',
choices=['arcface', 'mobileface', 'sphereface'],
)
parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
args = parser.parse_args()
if args.verbose:
from uniface import enable_logging
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
recognizer = get_recognizer(args.recognizer)
enable_logging()
print("Initializing models...")
detector = create_detector(method=args.detector)
recognizer = create_recognizer(method=args.recognizer)
print("Extracting reference embedding...")
print(f'Loading reference: {args.image}')
ref_embedding = extract_reference_embedding(detector, recognizer, args.image)
run_video(detector, recognizer, ref_embedding)
run_webcam(detector, recognizer, ref_embedding, args.threshold)
if __name__ == "__main__":
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,104 @@
# Gaze estimation on detected faces
# Usage: python run_gaze_estimation.py --image path/to/image.jpg
# python run_gaze_estimation.py --webcam
import argparse
import os
from pathlib import Path
import cv2
import numpy as np
from uniface import RetinaFace
from uniface.gaze import MobileGaze
from uniface.visualization import draw_gaze
def process_image(detector, gaze_estimator, image_path: str, save_dir: str = 'outputs'):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
for i, face in enumerate(faces):
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size == 0:
continue
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f' Face {i + 1}: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°')
# Draw both bbox and gaze arrow with angle text
draw_gaze(image, bbox, pitch, yaw, draw_angles=True)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_gaze.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def run_webcam(detector, gaze_estimator):
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
for face in faces:
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = frame[y1:y2, x1:x2]
if face_crop.size == 0:
continue
pitch, yaw = gaze_estimator.estimate(face_crop)
# Draw both bbox and gaze arrow
draw_gaze(frame, bbox, pitch, yaw)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Gaze Estimation', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description='Run gaze estimation')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam')
parser.add_argument('--save_dir', type=str, default='outputs')
args = parser.parse_args()
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
detector = RetinaFace()
gaze_estimator = MobileGaze()
if args.webcam:
run_webcam(detector, gaze_estimator)
else:
process_image(detector, gaze_estimator, args.image, args.save_dir)
if __name__ == '__main__':
main()

117
scripts/run_landmarks.py Normal file
View File

@@ -0,0 +1,117 @@
# 106-point facial landmark detection
# Usage: python run_landmarks.py --image path/to/image.jpg
# python run_landmarks.py --webcam
import argparse
import os
from pathlib import Path
import cv2
from uniface import SCRFD, Landmark106, RetinaFace
def process_image(detector, landmarker, image_path: str, save_dir: str = 'outputs'):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
if not faces:
return
for i, face in enumerate(faces):
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox)
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
landmarks = landmarker.get_landmarks(image, bbox)
print(f' Face {i + 1}: {len(landmarks)} landmarks')
for x, y in landmarks.astype(int):
cv2.circle(image, (x, y), 1, (0, 255, 0), -1)
cv2.putText(
image,
f'Face {i + 1}',
(x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
(0, 255, 0),
2,
)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_landmarks.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def run_webcam(detector, landmarker):
cap = cv2.VideoCapture(0) # 0 = default webcam
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
frame = cv2.flip(frame, 1) # mirror for natural interaction
if not ret:
break
faces = detector.detect(frame)
for face in faces:
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox)
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
landmarks = landmarker.get_landmarks(frame, bbox) # 106 points
for x, y in landmarks.astype(int):
cv2.circle(frame, (x, y), 1, (0, 255, 0), -1)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2,
)
cv2.imshow('106-Point Landmarks', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description='Run facial landmark detection')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam')
parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
parser.add_argument('--save_dir', type=str, default='outputs')
args = parser.parse_args()
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
landmarker = Landmark106()
if args.webcam:
run_webcam(detector, landmarker)
else:
process_image(detector, landmarker, args.image, args.save_dir)
if __name__ == '__main__':
main()

View File

@@ -1,81 +1,103 @@
import cv2
# Face recognition: extract embeddings or compare two faces
# Usage: python run_recognition.py --image path/to/image.jpg
# python run_recognition.py --image1 face1.jpg --image2 face2.jpg
import argparse
import cv2
import numpy as np
# Use the new high-level factory functions for consistency
from uniface.detection import create_detector
from uniface.recognition import create_recognizer
from uniface.detection import SCRFD, RetinaFace
from uniface.face_utils import compute_similarity
from uniface.recognition import ArcFace, MobileFace, SphereFace
def get_recognizer(name: str):
if name == 'arcface':
return ArcFace()
elif name == 'mobileface':
return MobileFace()
else:
return SphereFace()
def run_inference(detector, recognizer, image_path: str):
"""
Detect faces and extract embeddings from a single image.
Args:
detector: Initialized face detector.
recognizer: Initialized face recognition model.
image_path (str): Path to the input image.
"""
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
if not faces:
print("No faces detected.")
print('No faces detected.')
return
print(f"Detected {len(faces)} face(s). Extracting embeddings for the first face...")
print(f'Detected {len(faces)} face(s). Extracting embedding for the first face...')
# Process the first detected face
first_face = faces[0]
landmarks = np.array(first_face['landmarks']) # Convert landmarks to numpy array
# Extract embedding using the landmarks from the face dictionary
landmarks = faces[0]['landmarks'] # 5-point landmarks for alignment (already np.ndarray)
embedding = recognizer.get_embedding(image, landmarks)
norm_embedding = recognizer.get_normalized_embedding(image, landmarks)
norm_embedding = recognizer.get_normalized_embedding(image, landmarks) # L2 normalized
# Print some info about the embeddings
print(f" - Embedding shape: {embedding.shape}")
print(f" - L2 norm of unnormalized embedding: {np.linalg.norm(embedding):.4f}")
print(f" - L2 norm of normalized embedding: {np.linalg.norm(norm_embedding):.4f}")
print(f' Embedding shape: {embedding.shape}')
print(f' L2 norm (raw): {np.linalg.norm(embedding):.4f}')
print(f' L2 norm (normalized): {np.linalg.norm(norm_embedding):.4f}')
def compare_faces(detector, recognizer, image1_path: str, image2_path: str, threshold: float = 0.35):
img1 = cv2.imread(image1_path)
img2 = cv2.imread(image2_path)
if img1 is None or img2 is None:
print('Error: Failed to load one or both images')
return
faces1 = detector.detect(img1)
faces2 = detector.detect(img2)
if not faces1 or not faces2:
print('Error: No faces detected in one or both images')
return
landmarks1 = faces1[0]['landmarks']
landmarks2 = faces2[0]['landmarks']
embedding1 = recognizer.get_normalized_embedding(img1, landmarks1)
embedding2 = recognizer.get_normalized_embedding(img2, landmarks2)
# cosine similarity for normalized embeddings
similarity = compute_similarity(embedding1, embedding2, normalized=True)
is_match = similarity > threshold
print(f'Similarity: {similarity:.4f}')
print(f'Result: {"Same person" if is_match else "Different person"} (threshold: {threshold})')
def main():
parser = argparse.ArgumentParser(description="Extract face embeddings from a single image.")
parser.add_argument("--image", type=str, required=True, help="Path to the input image.")
parser = argparse.ArgumentParser(description='Face recognition and comparison')
parser.add_argument('--image', type=str, help='Single image for embedding extraction')
parser.add_argument('--image1', type=str, help='First image for comparison')
parser.add_argument('--image2', type=str, help='Second image for comparison')
parser.add_argument('--threshold', type=float, default=0.35, help='Similarity threshold')
parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
parser.add_argument(
"--detector",
'--recognizer',
type=str,
default="retinaface",
choices=['retinaface', 'scrfd'],
help="Face detection method to use."
)
parser.add_argument(
"--recognizer",
type=str,
default="arcface",
default='arcface',
choices=['arcface', 'mobileface', 'sphereface'],
help="Face recognition method to use."
)
parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
args = parser.parse_args()
if args.verbose:
from uniface import enable_logging
enable_logging()
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
recognizer = get_recognizer(args.recognizer)
print(f"Initializing detector: {args.detector}")
detector = create_detector(method=args.detector)
print(f"Initializing recognizer: {args.recognizer}")
recognizer = create_recognizer(method=args.recognizer)
run_inference(detector, recognizer, args.image)
if args.image1 and args.image2:
compare_faces(detector, recognizer, args.image1, args.image2, args.threshold)
elif args.image:
run_inference(detector, recognizer, args.image)
else:
print('Error: Provide --image or both --image1 and --image2')
parser.print_help()
if __name__ == "__main__":
if __name__ == '__main__':
main()

201
scripts/run_spoofing.py Normal file
View File

@@ -0,0 +1,201 @@
# Face Anti-Spoofing Detection
# Usage:
# Image: python run_spoofing.py --image path/to/image.jpg
# Video: python run_spoofing.py --video path/to/video.mp4
# Webcam: python run_spoofing.py --source 0
import argparse
import os
from pathlib import Path
import cv2
import numpy as np
from uniface import RetinaFace
from uniface.constants import MiniFASNetWeights
from uniface.spoofing import create_spoofer
def draw_spoofing_result(
image: np.ndarray,
bbox: list,
label_idx: int,
score: float,
thickness: int = 2,
) -> None:
"""Draw bounding box with anti-spoofing result.
Args:
image: Input image to draw on.
bbox: Bounding box in [x1, y1, x2, y2] format.
label_idx: Prediction label index (0 = Fake, 1 = Real).
score: Confidence score (0.0 to 1.0).
thickness: Line thickness for bounding box.
"""
x1, y1, x2, y2 = map(int, bbox[:4])
# Color based on result (green for real, red for fake)
is_real = label_idx == 1
color = (0, 255, 0) if is_real else (0, 0, 255)
# Draw bounding box
cv2.rectangle(image, (x1, y1), (x2, y2), color, thickness)
# Prepare label
label = 'Real' if is_real else 'Fake'
text = f'{label}: {score:.1%}'
# Draw label background
(tw, th), baseline = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2)
cv2.rectangle(image, (x1, y1 - th - 10), (x1 + tw + 10, y1), color, -1)
# Draw label text
cv2.putText(image, text, (x1 + 5, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
def process_image(detector, spoofer, image_path: str, save_dir: str = 'outputs') -> None:
"""Process a single image for face anti-spoofing detection."""
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
# Detect faces
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
if not faces:
print('No faces detected in the image.')
return
# Run anti-spoofing on each face
for i, face in enumerate(faces, 1):
label_idx, score = spoofer.predict(image, face['bbox'])
# label_idx: 0 = Fake, 1 = Real
label = 'Real' if label_idx == 1 else 'Fake'
print(f' Face {i}: {label} ({score:.1%})')
# Draw result on image
draw_spoofing_result(image, face['bbox'], label_idx, score)
# Save output
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_spoofing.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def process_video(detector, spoofer, source, save_dir: str = 'outputs') -> None:
"""Process video or webcam stream for face anti-spoofing detection."""
# Handle webcam or video file
if isinstance(source, int) or source.isdigit():
cap = cv2.VideoCapture(int(source))
is_webcam = True
output_name = 'webcam_spoofing.mp4'
else:
cap = cv2.VideoCapture(source)
is_webcam = False
output_name = f'{Path(source).stem}_spoofing.mp4'
if not cap.isOpened():
print(f'Error: Failed to open video source: {source}')
return
# Get video properties
fps = int(cap.get(cv2.CAP_PROP_FPS)) if not is_webcam else 30
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# Setup video writer
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, output_name)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
writer = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
print("Processing video... Press 'q' to quit")
frame_count = 0
try:
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame_count += 1
# Detect faces
faces = detector.detect(frame)
# Run anti-spoofing on each face
for face in faces:
label_idx, score = spoofer.predict(frame, face['bbox'])
draw_spoofing_result(frame, face['bbox'], label_idx, score)
# Write frame
writer.write(frame)
# Display frame
cv2.imshow('Face Anti-Spoofing', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
print('Stopped by user.')
break
finally:
cap.release()
writer.release()
cv2.destroyAllWindows()
print(f'Processed {frame_count} frames')
if not is_webcam:
print(f'Output saved: {output_path}')
def main():
parser = argparse.ArgumentParser(description='Face Anti-Spoofing Detection')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--video', type=str, help='Path to input video')
parser.add_argument('--source', type=str, help='Video source (0 for webcam)')
parser.add_argument(
'--model',
type=str,
default='v2',
choices=['v1se', 'v2'],
help='Model variant: v1se or v2 (default: v2)',
)
parser.add_argument('--scale', type=float, default=None, help='Custom crop scale (default: auto)')
parser.add_argument('--save_dir', type=str, default='outputs', help='Output directory')
args = parser.parse_args()
# Check that at least one input source is provided
if not any([args.image, args.video, args.source]):
parser.print_help()
print('\nError: Please provide --image, --video, or --source')
return
# Select model variant
model_name = MiniFASNetWeights.V1SE if args.model == 'v1se' else MiniFASNetWeights.V2
# Initialize models
print(f'Initializing models (MiniFASNet {args.model.upper()})...')
detector = RetinaFace()
spoofer = create_spoofer(model_name=model_name, scale=args.scale)
# Process input
if args.image:
if not os.path.exists(args.image):
print(f'Error: Image not found: {args.image}')
return
process_image(detector, spoofer, args.image, args.save_dir)
elif args.video:
if not os.path.exists(args.video):
print(f'Error: Video not found: {args.video}')
return
process_video(detector, spoofer, args.video, args.save_dir)
elif args.source:
process_video(detector, spoofer, args.source, args.save_dir)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,109 @@
# Face detection on video files
# Usage: python run_video_detection.py --input video.mp4 --output output.mp4
import argparse
from pathlib import Path
import cv2
from tqdm import tqdm
from uniface import SCRFD, RetinaFace
from uniface.visualization import draw_detections
def process_video(
detector,
input_path: str,
output_path: str,
threshold: float = 0.6,
show_preview: bool = False,
):
cap = cv2.VideoCapture(input_path)
if not cap.isOpened():
print(f"Error: Cannot open video file '{input_path}'")
return
# get video properties
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(f'Input: {input_path} ({width}x{height}, {fps:.1f} fps, {total_frames} frames)')
print(f'Output: {output_path}')
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # codec for .mp4
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
if not out.isOpened():
print(f"Error: Cannot create output video '{output_path}'")
cap.release()
return
frame_count = 0
total_faces = 0
for _ in tqdm(range(total_frames), desc='Processing', unit='frames'):
ret, frame = cap.read()
if not ret:
break
frame_count += 1
faces = detector.detect(frame)
total_faces += len(faces)
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(
image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2,
)
out.write(frame)
if show_preview:
cv2.imshow("Processing - Press 'q' to cancel", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
print('\nCancelled by user')
break
cap.release()
out.release()
if show_preview:
cv2.destroyAllWindows()
avg_faces = total_faces / frame_count if frame_count > 0 else 0
print(f'\nDone! {frame_count} frames, {total_faces} faces ({avg_faces:.1f} avg/frame)')
print(f'Saved: {output_path}')
def main():
parser = argparse.ArgumentParser(description='Process video with face detection')
parser.add_argument('--input', type=str, required=True, help='Input video path')
parser.add_argument('--output', type=str, required=True, help='Output video path')
parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
parser.add_argument('--threshold', type=float, default=0.6, help='Visualization threshold')
parser.add_argument('--preview', action='store_true', help='Show live preview')
args = parser.parse_args()
if not Path(args.input).exists():
print(f"Error: Input file '{args.input}' does not exist")
return
Path(args.output).parent.mkdir(parents=True, exist_ok=True)
detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
process_video(detector, args.input, args.output, args.threshold, args.preview)
if __name__ == '__main__':
main()

View File

@@ -5,31 +5,24 @@ from pathlib import Path
def compute_sha256(file_path: Path, chunk_size: int = 8192) -> str:
sha256_hash = hashlib.sha256()
with file_path.open("rb") as f:
for chunk in iter(lambda: f.read(chunk_size), b""):
with file_path.open('rb') as f:
for chunk in iter(lambda: f.read(chunk_size), b''):
sha256_hash.update(chunk)
return sha256_hash.hexdigest()
def main():
parser = argparse.ArgumentParser(
description="Compute SHA256 hash of a model weight file."
)
parser.add_argument(
"file",
type=Path,
help="Path to the model weight file (.onnx, .pth, etc)."
)
parser = argparse.ArgumentParser(description='Compute SHA256 hash of a file')
parser.add_argument('file', type=Path, help='Path to file')
args = parser.parse_args()
if not args.file.exists() or not args.file.is_file():
print(f"File does not exist: {args.file}")
print(f'File does not exist: {args.file}')
return
sha256 = compute_sha256(args.file)
print(f"`SHA256 hash for '{args.file.name}':\n{sha256}")
print(f"SHA256 hash for '{args.file.name}':\n{sha256}")
if __name__ == "__main__":
if __name__ == '__main__':
main()

117
tests/test_age_gender.py Normal file
View File

@@ -0,0 +1,117 @@
import numpy as np
import pytest
from uniface.attribute import AgeGender
@pytest.fixture
def age_gender_model():
return AgeGender()
@pytest.fixture
def mock_image():
return np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
@pytest.fixture
def mock_bbox():
return [100, 100, 300, 300]
def test_model_initialization(age_gender_model):
assert age_gender_model is not None, 'AgeGender model initialization failed.'
def test_prediction_output_format(age_gender_model, mock_image, mock_bbox):
gender_id, age = age_gender_model.predict(mock_image, mock_bbox)
assert isinstance(gender_id, int), f'Gender ID should be int, got {type(gender_id)}'
assert isinstance(age, int), f'Age should be int, got {type(age)}'
def test_gender_values(age_gender_model, mock_image, mock_bbox):
gender_id, age = age_gender_model.predict(mock_image, mock_bbox)
assert gender_id in [0, 1], f'Gender ID should be 0 (Female) or 1 (Male), got {gender_id}'
def test_age_range(age_gender_model, mock_image, mock_bbox):
gender_id, age = age_gender_model.predict(mock_image, mock_bbox)
assert 0 <= age <= 120, f'Age should be between 0 and 120, got {age}'
def test_different_bbox_sizes(age_gender_model, mock_image):
test_bboxes = [
[50, 50, 150, 150],
[100, 100, 300, 300],
[50, 50, 400, 400],
]
for bbox in test_bboxes:
gender_id, age = age_gender_model.predict(mock_image, bbox)
assert gender_id in [0, 1], f'Failed for bbox {bbox}'
assert 0 <= age <= 120, f'Age out of range for bbox {bbox}'
def test_different_image_sizes(age_gender_model, mock_bbox):
test_sizes = [(480, 640, 3), (720, 1280, 3), (1080, 1920, 3)]
for size in test_sizes:
mock_image = np.random.randint(0, 255, size, dtype=np.uint8)
gender_id, age = age_gender_model.predict(mock_image, mock_bbox)
assert gender_id in [0, 1], f'Failed for image size {size}'
assert 0 <= age <= 120, f'Age out of range for image size {size}'
def test_consistency(age_gender_model, mock_image, mock_bbox):
gender_id1, age1 = age_gender_model.predict(mock_image, mock_bbox)
gender_id2, age2 = age_gender_model.predict(mock_image, mock_bbox)
assert gender_id1 == gender_id2, 'Same input should produce same gender prediction'
assert age1 == age2, 'Same input should produce same age prediction'
def test_bbox_list_format(age_gender_model, mock_image):
bbox_list = [100, 100, 300, 300]
gender_id, age = age_gender_model.predict(mock_image, bbox_list)
assert gender_id in [0, 1], 'Should work with bbox as list'
assert 0 <= age <= 120, 'Age should be in valid range'
def test_bbox_array_format(age_gender_model, mock_image):
bbox_array = np.array([100, 100, 300, 300])
gender_id, age = age_gender_model.predict(mock_image, bbox_array)
assert gender_id in [0, 1], 'Should work with bbox as numpy array'
assert 0 <= age <= 120, 'Age should be in valid range'
def test_multiple_predictions(age_gender_model, mock_image):
bboxes = [
[50, 50, 150, 150],
[200, 200, 350, 350],
[400, 400, 550, 550],
]
results = []
for bbox in bboxes:
gender_id, age = age_gender_model.predict(mock_image, bbox)
results.append((gender_id, age))
assert len(results) == 3, 'Should have 3 predictions'
for gender_id, age in results:
assert gender_id in [0, 1]
assert 0 <= age <= 120
def test_age_is_positive(age_gender_model, mock_image, mock_bbox):
for _ in range(5):
gender_id, age = age_gender_model.predict(mock_image, mock_bbox)
assert age >= 0, f'Age should be non-negative, got {age}'
def test_output_format_for_visualization(age_gender_model, mock_image, mock_bbox):
gender_id, age = age_gender_model.predict(mock_image, mock_bbox)
gender_str = 'Female' if gender_id == 0 else 'Male'
text = f'{gender_str}, {age}y'
assert isinstance(text, str), 'Should be able to format as string'
assert 'Male' in text or 'Female' in text, 'Text should contain gender'
assert 'y' in text, "Text should contain 'y' for years"

274
tests/test_factory.py Normal file
View File

@@ -0,0 +1,274 @@
import numpy as np
import pytest
from uniface import (
create_detector,
create_landmarker,
create_recognizer,
detect_faces,
list_available_detectors,
)
from uniface.constants import RetinaFaceWeights, SCRFDWeights
# create_detector tests
def test_create_detector_retinaface():
"""
Test creating a RetinaFace detector using factory function.
"""
detector = create_detector('retinaface')
assert detector is not None, 'Failed to create RetinaFace detector'
def test_create_detector_scrfd():
"""
Test creating a SCRFD detector using factory function.
"""
detector = create_detector('scrfd')
assert detector is not None, 'Failed to create SCRFD detector'
def test_create_detector_with_config():
"""
Test creating detector with custom configuration.
"""
detector = create_detector(
'retinaface',
model_name=RetinaFaceWeights.MNET_V2,
conf_thresh=0.8,
nms_thresh=0.3,
)
assert detector is not None, 'Failed to create detector with custom config'
def test_create_detector_invalid_method():
"""
Test that invalid detector method raises an error.
"""
with pytest.raises((ValueError, KeyError)):
create_detector('invalid_method')
def test_create_detector_scrfd_with_model():
"""
Test creating SCRFD detector with specific model.
"""
detector = create_detector('scrfd', model_name=SCRFDWeights.SCRFD_10G_KPS, conf_thresh=0.5)
assert detector is not None, 'Failed to create SCRFD with specific model'
# create_recognizer tests
def test_create_recognizer_arcface():
"""
Test creating an ArcFace recognizer using factory function.
"""
recognizer = create_recognizer('arcface')
assert recognizer is not None, 'Failed to create ArcFace recognizer'
def test_create_recognizer_mobileface():
"""
Test creating a MobileFace recognizer using factory function.
"""
recognizer = create_recognizer('mobileface')
assert recognizer is not None, 'Failed to create MobileFace recognizer'
def test_create_recognizer_sphereface():
"""
Test creating a SphereFace recognizer using factory function.
"""
recognizer = create_recognizer('sphereface')
assert recognizer is not None, 'Failed to create SphereFace recognizer'
def test_create_recognizer_invalid_method():
"""
Test that invalid recognizer method raises an error.
"""
with pytest.raises((ValueError, KeyError)):
create_recognizer('invalid_method')
# create_landmarker tests
def test_create_landmarker():
"""
Test creating a Landmark106 detector using factory function.
"""
landmarker = create_landmarker('2d106det')
assert landmarker is not None, 'Failed to create Landmark106 detector'
def test_create_landmarker_default():
"""
Test creating landmarker with default parameters.
"""
landmarker = create_landmarker()
assert landmarker is not None, 'Failed to create default landmarker'
def test_create_landmarker_invalid_method():
"""
Test that invalid landmarker method raises an error.
"""
with pytest.raises((ValueError, KeyError)):
create_landmarker('invalid_method')
# detect_faces tests
def test_detect_faces_retinaface():
"""
Test high-level detect_faces function with RetinaFace.
"""
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detect_faces(mock_image, method='retinaface')
assert isinstance(faces, list), 'detect_faces should return a list'
def test_detect_faces_scrfd():
"""
Test high-level detect_faces function with SCRFD.
"""
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detect_faces(mock_image, method='scrfd')
assert isinstance(faces, list), 'detect_faces should return a list'
def test_detect_faces_with_threshold():
"""
Test detect_faces with custom confidence threshold.
"""
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detect_faces(mock_image, method='retinaface', conf_thresh=0.8)
assert isinstance(faces, list), 'detect_faces should return a list'
# All detections should respect threshold
for face in faces:
assert face['confidence'] >= 0.8, 'All detections should meet confidence threshold'
def test_detect_faces_default_method():
"""
Test detect_faces with default method (should use retinaface).
"""
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detect_faces(mock_image) # No method specified
assert isinstance(faces, list), 'detect_faces should return a list with default method'
def test_detect_faces_empty_image():
"""
Test detect_faces on a blank image.
"""
empty_image = np.zeros((640, 640, 3), dtype=np.uint8)
faces = detect_faces(empty_image, method='retinaface')
assert isinstance(faces, list), 'Should return a list even for empty image'
assert len(faces) == 0, 'Should detect no faces in blank image'
# list_available_detectors tests
def test_list_available_detectors():
"""
Test that list_available_detectors returns a dictionary.
"""
detectors = list_available_detectors()
assert isinstance(detectors, dict), 'Should return a dictionary of detectors'
assert len(detectors) > 0, 'Should have at least one detector available'
def test_list_available_detectors_contents():
"""
Test that list includes known detectors.
"""
detectors = list_available_detectors()
# Should include at least these detectors
assert 'retinaface' in detectors, "Should include 'retinaface'"
assert 'scrfd' in detectors, "Should include 'scrfd'"
# Integration tests
def test_detector_inference_from_factory():
"""
Test that detector created from factory can perform inference.
"""
detector = create_detector('retinaface')
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = detector.detect(mock_image)
assert isinstance(faces, list), 'Detector should return list of faces'
def test_recognizer_inference_from_factory():
"""
Test that recognizer created from factory can perform inference.
"""
recognizer = create_recognizer('arcface')
mock_image = np.random.randint(0, 255, (112, 112, 3), dtype=np.uint8)
embedding = recognizer.get_embedding(mock_image)
assert embedding is not None, 'Recognizer should return embedding'
assert embedding.shape[1] == 512, 'Should return 512-dimensional embedding'
def test_landmarker_inference_from_factory():
"""
Test that landmarker created from factory can perform inference.
"""
landmarker = create_landmarker('2d106det')
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
mock_bbox = [100, 100, 300, 300]
landmarks = landmarker.get_landmarks(mock_image, mock_bbox)
assert landmarks is not None, 'Landmarker should return landmarks'
assert landmarks.shape == (106, 2), 'Should return 106 landmarks'
def test_multiple_detector_creation():
"""
Test that multiple detectors can be created independently.
"""
detector1 = create_detector('retinaface')
detector2 = create_detector('scrfd')
assert detector1 is not None
assert detector2 is not None
assert detector1 is not detector2, 'Should create separate instances'
def test_detector_with_different_configs():
"""
Test creating multiple detectors with different configurations.
"""
detector_high_thresh = create_detector('retinaface', conf_thresh=0.9)
detector_low_thresh = create_detector('retinaface', conf_thresh=0.3)
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces_high = detector_high_thresh.detect(mock_image)
faces_low = detector_low_thresh.detect(mock_image)
# Both should work
assert isinstance(faces_high, list)
assert isinstance(faces_low, list)
def test_factory_returns_correct_types():
"""
Test that factory functions return instances of the correct types.
"""
from uniface import ArcFace, Landmark106, RetinaFace
detector = create_detector('retinaface')
recognizer = create_recognizer('arcface')
landmarker = create_landmarker('2d106det')
assert isinstance(detector, RetinaFace), 'Should return RetinaFace instance'
assert isinstance(recognizer, ArcFace), 'Should return ArcFace instance'
assert isinstance(landmarker, Landmark106), 'Should return Landmark106 instance'

107
tests/test_landmark.py Normal file
View File

@@ -0,0 +1,107 @@
import numpy as np
import pytest
from uniface.landmark import Landmark106
@pytest.fixture
def landmark_model():
return Landmark106()
@pytest.fixture
def mock_image():
return np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
@pytest.fixture
def mock_bbox():
return [100, 100, 300, 300]
def test_model_initialization(landmark_model):
assert landmark_model is not None, 'Landmark106 model initialization failed.'
def test_landmark_detection(landmark_model, mock_image, mock_bbox):
landmarks = landmark_model.get_landmarks(mock_image, mock_bbox)
assert landmarks.shape == (106, 2), f'Expected shape (106, 2), got {landmarks.shape}'
def test_landmark_dtype(landmark_model, mock_image, mock_bbox):
landmarks = landmark_model.get_landmarks(mock_image, mock_bbox)
assert landmarks.dtype == np.float32, f'Expected float32, got {landmarks.dtype}'
def test_landmark_coordinates_within_image(landmark_model, mock_image, mock_bbox):
landmarks = landmark_model.get_landmarks(mock_image, mock_bbox)
x_coords = landmarks[:, 0]
y_coords = landmarks[:, 1]
x1, y1, x2, y2 = mock_bbox
margin = 50
x_in_bounds = np.sum((x_coords >= x1 - margin) & (x_coords <= x2 + margin))
y_in_bounds = np.sum((y_coords >= y1 - margin) & (y_coords <= y2 + margin))
assert x_in_bounds >= 95, f'Only {x_in_bounds}/106 x-coordinates within bounds'
assert y_in_bounds >= 95, f'Only {y_in_bounds}/106 y-coordinates within bounds'
def test_different_bbox_sizes(landmark_model, mock_image):
test_bboxes = [
[50, 50, 150, 150],
[100, 100, 300, 300],
[50, 50, 400, 400],
]
for bbox in test_bboxes:
landmarks = landmark_model.get_landmarks(mock_image, bbox)
assert landmarks.shape == (106, 2), f'Failed for bbox {bbox}'
def test_landmark_array_format(landmark_model, mock_image, mock_bbox):
landmarks = landmark_model.get_landmarks(mock_image, mock_bbox)
landmarks_int = landmarks.astype(int)
assert landmarks_int.shape == (106, 2), 'Integer conversion should preserve shape'
assert landmarks_int.dtype in [np.int32, np.int64], 'Should convert to integer type'
def test_consistency(landmark_model, mock_image, mock_bbox):
landmarks1 = landmark_model.get_landmarks(mock_image, mock_bbox)
landmarks2 = landmark_model.get_landmarks(mock_image, mock_bbox)
assert np.allclose(landmarks1, landmarks2), 'Same input should produce same landmarks'
def test_different_image_sizes(landmark_model, mock_bbox):
test_sizes = [(480, 640, 3), (720, 1280, 3), (1080, 1920, 3)]
for size in test_sizes:
mock_image = np.random.randint(0, 255, size, dtype=np.uint8)
landmarks = landmark_model.get_landmarks(mock_image, mock_bbox)
assert landmarks.shape == (106, 2), f'Failed for image size {size}'
def test_bbox_list_format(landmark_model, mock_image):
bbox_list = [100, 100, 300, 300]
landmarks = landmark_model.get_landmarks(mock_image, bbox_list)
assert landmarks.shape == (106, 2), 'Should work with bbox as list'
def test_bbox_array_format(landmark_model, mock_image):
bbox_array = np.array([100, 100, 300, 300])
landmarks = landmark_model.get_landmarks(mock_image, bbox_array)
assert landmarks.shape == (106, 2), 'Should work with bbox as numpy array'
def test_landmark_distribution(landmark_model, mock_image, mock_bbox):
landmarks = landmark_model.get_landmarks(mock_image, mock_bbox)
x_variance = np.var(landmarks[:, 0])
y_variance = np.var(landmarks[:, 1])
assert x_variance > 0, 'Landmarks should have variation in x-coordinates'
assert y_variance > 0, 'Landmarks should have variation in y-coordinates'

118
tests/test_parsing.py Normal file
View File

@@ -0,0 +1,118 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import numpy as np
import pytest
from uniface.constants import ParsingWeights
from uniface.parsing import BiSeNet, create_face_parser
def test_bisenet_initialization():
"""Test BiSeNet initialization."""
parser = BiSeNet()
assert parser is not None
assert parser.input_size == (512, 512)
def test_bisenet_with_different_models():
"""Test BiSeNet with different model weights."""
parser_resnet18 = BiSeNet(model_name=ParsingWeights.RESNET18)
parser_resnet34 = BiSeNet(model_name=ParsingWeights.RESNET34)
assert parser_resnet18 is not None
assert parser_resnet34 is not None
def test_bisenet_preprocess():
"""Test preprocessing."""
parser = BiSeNet()
# Create a dummy face image
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
# Preprocess
preprocessed = parser.preprocess(face_image)
assert preprocessed.shape == (1, 3, 512, 512)
assert preprocessed.dtype == np.float32
def test_bisenet_postprocess():
"""Test postprocessing."""
parser = BiSeNet()
# Create dummy model output (batch_size=1, num_classes=19, H=512, W=512)
dummy_output = np.random.randn(1, 19, 512, 512).astype(np.float32)
# Postprocess
mask = parser.postprocess(dummy_output, original_size=(256, 256))
assert mask.shape == (256, 256)
assert mask.dtype == np.uint8
assert mask.min() >= 0
assert mask.max() < 19 # 19 classes (0-18)
def test_bisenet_parse():
"""Test end-to-end parsing."""
parser = BiSeNet()
# Create a dummy face image
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
# Parse
mask = parser.parse(face_image)
assert mask.shape == (256, 256)
assert mask.dtype == np.uint8
assert mask.min() >= 0
assert mask.max() < 19
def test_bisenet_callable():
"""Test that BiSeNet is callable."""
parser = BiSeNet()
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
# Should work as callable
mask = parser(face_image)
assert mask.shape == (256, 256)
assert mask.dtype == np.uint8
def test_create_face_parser_with_enum():
"""Test factory function with enum."""
parser = create_face_parser(ParsingWeights.RESNET18)
assert parser is not None
assert isinstance(parser, BiSeNet)
def test_create_face_parser_with_string():
"""Test factory function with string."""
parser = create_face_parser('parsing_resnet18')
assert parser is not None
assert isinstance(parser, BiSeNet)
def test_create_face_parser_invalid_model():
"""Test factory function with invalid model name."""
with pytest.raises(ValueError, match='Unknown face parsing model'):
create_face_parser('invalid_model')
def test_bisenet_different_input_sizes():
"""Test parsing with different input image sizes."""
parser = BiSeNet()
# Test with different sizes
sizes = [(128, 128), (256, 256), (512, 512), (640, 480)]
for h, w in sizes:
face_image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
mask = parser.parse(face_image)
assert mask.shape == (h, w), f'Failed for size {h}x{w}'
assert mask.dtype == np.uint8

215
tests/test_recognition.py Normal file
View File

@@ -0,0 +1,215 @@
import numpy as np
import pytest
from uniface.recognition import ArcFace, MobileFace, SphereFace
@pytest.fixture
def arcface_model():
"""
Fixture to initialize the ArcFace model for testing.
"""
return ArcFace()
@pytest.fixture
def mobileface_model():
"""
Fixture to initialize the MobileFace model for testing.
"""
return MobileFace()
@pytest.fixture
def sphereface_model():
"""
Fixture to initialize the SphereFace model for testing.
"""
return SphereFace()
@pytest.fixture
def mock_aligned_face():
"""
Create a mock 112x112 aligned face image.
"""
return np.random.randint(0, 255, (112, 112, 3), dtype=np.uint8)
@pytest.fixture
def mock_landmarks():
"""
Create mock 5-point facial landmarks.
"""
return np.array(
[
[38.2946, 51.6963],
[73.5318, 51.5014],
[56.0252, 71.7366],
[41.5493, 92.3655],
[70.7299, 92.2041],
],
dtype=np.float32,
)
# ArcFace Tests
def test_arcface_initialization(arcface_model):
"""
Test that the ArcFace model initializes correctly.
"""
assert arcface_model is not None, 'ArcFace model initialization failed.'
def test_arcface_embedding_shape(arcface_model, mock_aligned_face):
"""
Test that ArcFace produces embeddings with the correct shape.
"""
embedding = arcface_model.get_embedding(mock_aligned_face)
# ArcFace typically produces 512-dimensional embeddings
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
def test_arcface_normalized_embedding(arcface_model, mock_landmarks):
"""
Test that normalized embeddings have unit length.
"""
# Create a larger mock image for alignment
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
embedding = arcface_model.get_normalized_embedding(mock_image, mock_landmarks)
# Check that embedding is normalized (L2 norm ≈ 1.0)
norm = np.linalg.norm(embedding)
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
def test_arcface_embedding_dtype(arcface_model, mock_aligned_face):
"""
Test that embeddings have the correct data type.
"""
embedding = arcface_model.get_embedding(mock_aligned_face)
assert embedding.dtype == np.float32, f'Expected float32, got {embedding.dtype}'
def test_arcface_consistency(arcface_model, mock_aligned_face):
"""
Test that the same input produces the same embedding.
"""
embedding1 = arcface_model.get_embedding(mock_aligned_face)
embedding2 = arcface_model.get_embedding(mock_aligned_face)
assert np.allclose(embedding1, embedding2), 'Same input should produce same embedding'
# MobileFace Tests
def test_mobileface_initialization(mobileface_model):
"""
Test that the MobileFace model initializes correctly.
"""
assert mobileface_model is not None, 'MobileFace model initialization failed.'
def test_mobileface_embedding_shape(mobileface_model, mock_aligned_face):
"""
Test that MobileFace produces embeddings with the correct shape.
"""
embedding = mobileface_model.get_embedding(mock_aligned_face)
# MobileFace typically produces 512-dimensional embeddings
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
def test_mobileface_normalized_embedding(mobileface_model, mock_landmarks):
"""
Test that MobileFace normalized embeddings have unit length.
"""
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
embedding = mobileface_model.get_normalized_embedding(mock_image, mock_landmarks)
norm = np.linalg.norm(embedding)
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
# SphereFace Tests
def test_sphereface_initialization(sphereface_model):
"""
Test that the SphereFace model initializes correctly.
"""
assert sphereface_model is not None, 'SphereFace model initialization failed.'
def test_sphereface_embedding_shape(sphereface_model, mock_aligned_face):
"""
Test that SphereFace produces embeddings with the correct shape.
"""
embedding = sphereface_model.get_embedding(mock_aligned_face)
# SphereFace typically produces 512-dimensional embeddings
assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'
def test_sphereface_normalized_embedding(sphereface_model, mock_landmarks):
"""
Test that SphereFace normalized embeddings have unit length.
"""
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
embedding = sphereface_model.get_normalized_embedding(mock_image, mock_landmarks)
norm = np.linalg.norm(embedding)
assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'
# Cross-model comparison tests
def test_different_models_different_embeddings(arcface_model, mobileface_model, mock_aligned_face):
"""
Test that different models produce different embeddings for the same input.
"""
arcface_emb = arcface_model.get_embedding(mock_aligned_face)
mobileface_emb = mobileface_model.get_embedding(mock_aligned_face)
# Embeddings should be different (with high probability for random input)
# We check that they're not identical
assert not np.allclose(arcface_emb, mobileface_emb), 'Different models should produce different embeddings'
def test_embedding_similarity_computation(arcface_model, mock_aligned_face):
"""
Test computing similarity between embeddings.
"""
# Get two embeddings
emb1 = arcface_model.get_embedding(mock_aligned_face)
# Create a slightly different image
mock_aligned_face2 = mock_aligned_face.copy()
mock_aligned_face2[:10, :10] = 0 # Modify a small region
emb2 = arcface_model.get_embedding(mock_aligned_face2)
# Compute cosine similarity
from uniface import compute_similarity
similarity = compute_similarity(emb1, emb2)
# Similarity should be between -1 and 1
assert -1.0 <= similarity <= 1.0, f'Similarity should be in [-1, 1], got {similarity}'
def test_same_face_high_similarity(arcface_model, mock_aligned_face):
"""
Test that the same face produces high similarity.
"""
emb1 = arcface_model.get_embedding(mock_aligned_face)
emb2 = arcface_model.get_embedding(mock_aligned_face)
from uniface import compute_similarity
similarity = compute_similarity(emb1, emb2)
# Same image should have similarity close to 1.0
assert similarity > 0.99, f'Same face should have similarity > 0.99, got {similarity}'

View File

@@ -7,9 +7,6 @@ from uniface.detection import RetinaFace
@pytest.fixture
def retinaface_model():
"""
Fixture to initialize the RetinaFace model for testing.
"""
return RetinaFace(
model_name=RetinaFaceWeights.MNET_V2,
conf_thresh=0.5,
@@ -20,67 +17,39 @@ def retinaface_model():
def test_model_initialization(retinaface_model):
"""
Test that the RetinaFace model initializes correctly.
"""
assert retinaface_model is not None, "Model initialization failed."
assert retinaface_model is not None, 'Model initialization failed.'
def test_inference_on_640x640_image(retinaface_model):
"""
Test inference on a 640x640 BGR image.
"""
# Generate a mock 640x640 BGR image
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
# Run inference - returns list of dictionaries
faces = retinaface_model.detect(mock_image)
# Check output type
assert isinstance(faces, list), "Detections should be a list."
assert isinstance(faces, list), 'Detections should be a list.'
# Check that each face has the expected structure
for face in faces:
assert isinstance(face, dict), "Each detection should be a dictionary."
assert "bbox" in face, "Each detection should have a 'bbox' key."
assert "confidence" in face, "Each detection should have a 'confidence' key."
assert "landmarks" in face, "Each detection should have a 'landmarks' key."
assert isinstance(face, dict), 'Each detection should be a dictionary.'
assert 'bbox' in face, "Each detection should have a 'bbox' key."
assert 'confidence' in face, "Each detection should have a 'confidence' key."
assert 'landmarks' in face, "Each detection should have a 'landmarks' key."
# Check bbox format
bbox = face["bbox"]
assert len(bbox) == 4, "BBox should have 4 values (x1, y1, x2, y2)."
bbox = face['bbox']
assert len(bbox) == 4, 'BBox should have 4 values (x1, y1, x2, y2).'
# Check landmarks format
landmarks = face["landmarks"]
assert len(landmarks) == 5, "Should have 5 landmark points."
assert all(len(pt) == 2 for pt in landmarks), "Each landmark should be (x, y)."
landmarks = face['landmarks']
assert len(landmarks) == 5, 'Should have 5 landmark points.'
assert all(len(pt) == 2 for pt in landmarks), 'Each landmark should be (x, y).'
def test_confidence_threshold(retinaface_model):
"""
Test that detections respect the confidence threshold.
"""
# Generate a mock 640x640 BGR image
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
# Run inference
faces = retinaface_model.detect(mock_image)
# Ensure all detections have confidence scores above the threshold
for face in faces:
confidence = face["confidence"]
assert confidence >= 0.5, f"Detection has confidence {confidence} below threshold 0.5"
confidence = face['confidence']
assert confidence >= 0.5, f'Detection has confidence {confidence} below threshold 0.5'
def test_no_faces_detected(retinaface_model):
"""
Test inference on an image without detectable faces.
"""
# Generate an empty (black) 640x640 image
empty_image = np.zeros((640, 640, 3), dtype=np.uint8)
# Run inference
faces = retinaface_model.detect(empty_image)
# Ensure no detections are found
assert len(faces) == 0, "Should detect no faces in a blank image."
assert len(faces) == 0, 'Should detect no faces in a blank image.'

71
tests/test_scrfd.py Normal file
View File

@@ -0,0 +1,71 @@
import numpy as np
import pytest
from uniface.constants import SCRFDWeights
from uniface.detection import SCRFD
@pytest.fixture
def scrfd_model():
return SCRFD(
model_name=SCRFDWeights.SCRFD_500M_KPS,
conf_thresh=0.5,
nms_thresh=0.4,
)
def test_model_initialization(scrfd_model):
assert scrfd_model is not None, 'Model initialization failed.'
def test_inference_on_640x640_image(scrfd_model):
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = scrfd_model.detect(mock_image)
assert isinstance(faces, list), 'Detections should be a list.'
for face in faces:
assert isinstance(face, dict), 'Each detection should be a dictionary.'
assert 'bbox' in face, "Each detection should have a 'bbox' key."
assert 'confidence' in face, "Each detection should have a 'confidence' key."
assert 'landmarks' in face, "Each detection should have a 'landmarks' key."
bbox = face['bbox']
assert len(bbox) == 4, 'BBox should have 4 values (x1, y1, x2, y2).'
landmarks = face['landmarks']
assert len(landmarks) == 5, 'Should have 5 landmark points.'
assert all(len(pt) == 2 for pt in landmarks), 'Each landmark should be (x, y).'
def test_confidence_threshold(scrfd_model):
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = scrfd_model.detect(mock_image)
for face in faces:
confidence = face['confidence']
assert confidence >= 0.5, f'Detection has confidence {confidence} below threshold 0.5'
def test_no_faces_detected(scrfd_model):
empty_image = np.zeros((640, 640, 3), dtype=np.uint8)
faces = scrfd_model.detect(empty_image)
assert len(faces) == 0, 'Should detect no faces in a blank image.'
def test_different_input_sizes(scrfd_model):
test_sizes = [(480, 640, 3), (720, 1280, 3), (1080, 1920, 3)]
for size in test_sizes:
mock_image = np.random.randint(0, 255, size, dtype=np.uint8)
faces = scrfd_model.detect(mock_image)
assert isinstance(faces, list), f'Should return list for size {size}'
def test_scrfd_10g_model():
model = SCRFD(model_name=SCRFDWeights.SCRFD_10G_KPS, conf_thresh=0.5)
assert model is not None, 'SCRFD 10G model initialization failed.'
mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
faces = model.detect(mock_image)
assert isinstance(faces, list), 'SCRFD 10G should return list of detections.'

262
tests/test_utils.py Normal file
View File

@@ -0,0 +1,262 @@
import numpy as np
import pytest
from uniface import compute_similarity, face_alignment
@pytest.fixture
def mock_image():
"""
Create a mock 640x640 BGR image.
"""
return np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
@pytest.fixture
def mock_landmarks():
"""
Create mock 5-point facial landmarks.
Standard positions for a face roughly centered at (112/2, 112/2).
"""
return np.array(
[
[38.2946, 51.6963], # Left eye
[73.5318, 51.5014], # Right eye
[56.0252, 71.7366], # Nose
[41.5493, 92.3655], # Left mouth corner
[70.7299, 92.2041], # Right mouth corner
],
dtype=np.float32,
)
# compute_similarity tests
def test_compute_similarity_same_embedding():
"""
Test that similarity of an embedding with itself is 1.0.
"""
embedding = np.random.randn(1, 512).astype(np.float32)
embedding = embedding / np.linalg.norm(embedding) # Normalize
similarity = compute_similarity(embedding, embedding)
assert np.isclose(similarity, 1.0, atol=1e-5), f'Self-similarity should be 1.0, got {similarity}'
def test_compute_similarity_range():
"""
Test that similarity is always in the range [-1, 1].
"""
# Test with multiple random embeddings
for _ in range(10):
emb1 = np.random.randn(1, 512).astype(np.float32)
emb2 = np.random.randn(1, 512).astype(np.float32)
# Normalize
emb1 = emb1 / np.linalg.norm(emb1)
emb2 = emb2 / np.linalg.norm(emb2)
similarity = compute_similarity(emb1, emb2)
assert -1.0 <= similarity <= 1.0, f'Similarity should be in [-1, 1], got {similarity}'
def test_compute_similarity_orthogonal():
"""
Test that orthogonal embeddings have similarity close to 0.
"""
# Create orthogonal embeddings
emb1 = np.zeros((1, 512), dtype=np.float32)
emb1[0, 0] = 1.0 # [1, 0, 0, ..., 0]
emb2 = np.zeros((1, 512), dtype=np.float32)
emb2[0, 1] = 1.0 # [0, 1, 0, ..., 0]
similarity = compute_similarity(emb1, emb2)
assert np.isclose(similarity, 0.0, atol=1e-5), f'Orthogonal embeddings should have similarity 0.0, got {similarity}'
def test_compute_similarity_opposite():
"""
Test that opposite embeddings have similarity close to -1.
"""
emb1 = np.ones((1, 512), dtype=np.float32)
emb1 = emb1 / np.linalg.norm(emb1)
emb2 = -emb1 # Opposite direction
similarity = compute_similarity(emb1, emb2)
assert np.isclose(similarity, -1.0, atol=1e-5), f'Opposite embeddings should have similarity -1.0, got {similarity}'
def test_compute_similarity_symmetry():
"""
Test that similarity(A, B) == similarity(B, A).
"""
emb1 = np.random.randn(1, 512).astype(np.float32)
emb2 = np.random.randn(1, 512).astype(np.float32)
# Normalize
emb1 = emb1 / np.linalg.norm(emb1)
emb2 = emb2 / np.linalg.norm(emb2)
sim_12 = compute_similarity(emb1, emb2)
sim_21 = compute_similarity(emb2, emb1)
assert np.isclose(sim_12, sim_21), 'Similarity should be symmetric'
def test_compute_similarity_dtype():
"""
Test that compute_similarity returns a float.
"""
emb1 = np.random.randn(1, 512).astype(np.float32)
emb2 = np.random.randn(1, 512).astype(np.float32)
# Normalize
emb1 = emb1 / np.linalg.norm(emb1)
emb2 = emb2 / np.linalg.norm(emb2)
similarity = compute_similarity(emb1, emb2)
assert isinstance(similarity, (float, np.floating)), f'Similarity should be float, got {type(similarity)}'
# face_alignment tests
def test_face_alignment_output_shape(mock_image, mock_landmarks):
"""
Test that face_alignment produces output with the correct shape.
"""
aligned, _ = face_alignment(mock_image, mock_landmarks, image_size=(112, 112))
assert aligned.shape == (112, 112, 3), f'Expected shape (112, 112, 3), got {aligned.shape}'
def test_face_alignment_dtype(mock_image, mock_landmarks):
"""
Test that aligned face has the correct data type.
"""
aligned, _ = face_alignment(mock_image, mock_landmarks, image_size=(112, 112))
assert aligned.dtype == np.uint8, f'Expected uint8, got {aligned.dtype}'
def test_face_alignment_different_sizes(mock_image, mock_landmarks):
"""
Test face alignment with different output sizes.
"""
# Only test sizes that are multiples of 112 or 128 as required by the function
test_sizes = [(112, 112), (128, 128), (224, 224)]
for size in test_sizes:
aligned, _ = face_alignment(mock_image, mock_landmarks, image_size=size)
assert aligned.shape == (*size, 3), f'Failed for size {size}'
def test_face_alignment_consistency(mock_image, mock_landmarks):
"""
Test that the same input produces the same aligned face.
"""
aligned1, _ = face_alignment(mock_image, mock_landmarks, image_size=(112, 112))
aligned2, _ = face_alignment(mock_image, mock_landmarks, image_size=(112, 112))
assert np.allclose(aligned1, aligned2), 'Same input should produce same aligned face'
def test_face_alignment_landmarks_as_list(mock_image):
"""
Test that landmarks can be passed as a list of lists (converted to array).
"""
landmarks_list = [
[38.2946, 51.6963],
[73.5318, 51.5014],
[56.0252, 71.7366],
[41.5493, 92.3655],
[70.7299, 92.2041],
]
# Convert list to numpy array before passing to face_alignment
landmarks_array = np.array(landmarks_list, dtype=np.float32)
aligned, _ = face_alignment(mock_image, landmarks_array, image_size=(112, 112))
assert aligned.shape == (112, 112, 3), 'Should work with landmarks as array'
def test_face_alignment_value_range(mock_image, mock_landmarks):
"""
Test that aligned face pixel values are in valid range [0, 255].
"""
aligned, _ = face_alignment(mock_image, mock_landmarks, image_size=(112, 112))
assert np.all(aligned >= 0), 'Pixel values should be >= 0'
assert np.all(aligned <= 255), 'Pixel values should be <= 255'
def test_face_alignment_not_all_zeros(mock_image, mock_landmarks):
"""
Test that aligned face is not all zeros (actual transformation occurred).
"""
aligned, _ = face_alignment(mock_image, mock_landmarks, image_size=(112, 112))
# At least some pixels should be non-zero
assert np.any(aligned > 0), 'Aligned face should have some non-zero pixels'
def test_face_alignment_from_different_positions(mock_image):
"""
Test alignment with landmarks at different positions in the image.
"""
# Landmarks at different positions
positions = [
np.array(
[[100, 100], [150, 100], [125, 130], [110, 150], [140, 150]],
dtype=np.float32,
),
np.array(
[[300, 200], [350, 200], [325, 230], [310, 250], [340, 250]],
dtype=np.float32,
),
np.array(
[[500, 400], [550, 400], [525, 430], [510, 450], [540, 450]],
dtype=np.float32,
),
]
for landmarks in positions:
aligned, _ = face_alignment(mock_image, landmarks, image_size=(112, 112))
assert aligned.shape == (112, 112, 3), f'Failed for landmarks at {landmarks[0]}'
def test_face_alignment_landmark_count(mock_image):
"""
Test that face_alignment works specifically with 5-point landmarks.
"""
# Standard 5-point landmarks
landmarks_5pt = np.array(
[
[38.2946, 51.6963],
[73.5318, 51.5014],
[56.0252, 71.7366],
[41.5493, 92.3655],
[70.7299, 92.2041],
],
dtype=np.float32,
)
aligned, _ = face_alignment(mock_image, landmarks_5pt, image_size=(112, 112))
assert aligned.shape == (112, 112, 3), 'Should work with 5-point landmarks'
def test_compute_similarity_with_recognition_embeddings():
"""
Test compute_similarity with realistic embedding dimensions.
"""
# Simulate ArcFace/MobileFace/SphereFace embeddings (512-dim)
emb1 = np.random.randn(1, 512).astype(np.float32)
emb2 = np.random.randn(1, 512).astype(np.float32)
# Normalize (as done in get_normalized_embedding)
emb1 = emb1 / np.linalg.norm(emb1)
emb2 = emb2 / np.linalg.norm(emb2)
similarity = compute_similarity(emb1, emb2)
# Should be a valid similarity score
assert -1.0 <= similarity <= 1.0
assert isinstance(similarity, (float, np.floating))

View File

@@ -11,52 +11,83 @@
# See the License for the specific language governing permissions and
# limitations under the License.
__license__ = "MIT"
__author__ = "Yakhyokhuja Valikhujaev"
__version__ = "0.1.9"
__license__ = 'MIT'
__author__ = 'Yakhyokhuja Valikhujaev'
__version__ = '1.6.0'
from uniface.face_utils import compute_similarity, face_alignment
from uniface.log import Logger, enable_logging
from uniface.model_store import verify_model_weights
from uniface.visualization import draw_detections
from uniface.visualization import draw_detections, vis_parsing_maps
from .analyzer import FaceAnalyzer
from .attribute import AgeGender
from .face import Face
try:
from .attribute import Emotion
except ImportError:
Emotion = None # PyTorch not installed
from .detection import SCRFD, RetinaFace, create_detector, detect_faces, list_available_detectors
from .detection import (
SCRFD,
RetinaFace,
YOLOv5Face,
create_detector,
detect_faces,
list_available_detectors,
)
from .gaze import MobileGaze, create_gaze_estimator
from .landmark import Landmark106, create_landmarker
from .parsing import BiSeNet, create_face_parser
from .privacy import BlurFace, anonymize_faces
from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
from .spoofing import MiniFASNet, create_spoofer
__all__ = [
"__author__",
"__license__",
"__version__",
'__author__',
'__license__',
'__version__',
# Core classes
'Face',
'FaceAnalyzer',
# Factory functions
"create_detector",
"create_landmarker",
"create_recognizer",
"detect_faces",
"list_available_detectors",
'create_detector',
'create_face_parser',
'create_gaze_estimator',
'create_landmarker',
'create_recognizer',
'create_spoofer',
'detect_faces',
'list_available_detectors',
# Detection models
"RetinaFace",
"SCRFD",
'RetinaFace',
'SCRFD',
'YOLOv5Face',
# Recognition models
"ArcFace",
"MobileFace",
"SphereFace",
'ArcFace',
'MobileFace',
'SphereFace',
# Landmark models
"Landmark106",
'Landmark106',
# Gaze models
'MobileGaze',
# Parsing models
'BiSeNet',
# Attribute models
"AgeGender",
"Emotion",
'AgeGender',
'Emotion',
# Spoofing models
'MiniFASNet',
# Privacy
'BlurFace',
'anonymize_faces',
# Utilities
"compute_similarity",
"draw_detections",
"face_alignment",
"verify_model_weights",
"Logger",
"enable_logging",
'compute_similarity',
'draw_detections',
'vis_parsing_maps',
'face_alignment',
'verify_model_weights',
'Logger',
'enable_logging',
]

66
uniface/analyzer.py Normal file
View File

@@ -0,0 +1,66 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import List, Optional
import numpy as np
from uniface.attribute.age_gender import AgeGender
from uniface.detection.base import BaseDetector
from uniface.face import Face
from uniface.log import Logger
from uniface.recognition.base import BaseRecognizer
__all__ = ['FaceAnalyzer']
class FaceAnalyzer:
"""Unified face analyzer combining detection, recognition, and attributes."""
def __init__(
self,
detector: BaseDetector,
recognizer: Optional[BaseRecognizer] = None,
age_gender: Optional[AgeGender] = None,
) -> None:
self.detector = detector
self.recognizer = recognizer
self.age_gender = age_gender
Logger.info(f'Initialized FaceAnalyzer with detector={detector.__class__.__name__}')
if recognizer:
Logger.info(f' - Recognition enabled: {recognizer.__class__.__name__}')
if age_gender:
Logger.info(f' - Age/Gender enabled: {age_gender.__class__.__name__}')
def analyze(self, image: np.ndarray) -> List[Face]:
"""Analyze faces in an image."""
faces = self.detector.detect(image)
Logger.debug(f'Detected {len(faces)} face(s)')
for idx, face in enumerate(faces):
if self.recognizer is not None:
try:
face.embedding = self.recognizer.get_normalized_embedding(image, face.landmarks)
Logger.debug(f' Face {idx + 1}: Extracted embedding with shape {face.embedding.shape}')
except Exception as e:
Logger.warning(f' Face {idx + 1}: Failed to extract embedding: {e}')
if self.age_gender is not None:
try:
face.gender, face.age = self.age_gender.predict(image, face.bbox)
Logger.debug(f' Face {idx + 1}: Age={face.age}, Gender={face.gender}')
except Exception as e:
Logger.warning(f' Face {idx + 1}: Failed to predict age/gender: {e}')
Logger.info(f'Analysis complete: {len(faces)} face(s) processed')
return faces
def __repr__(self) -> str:
parts = [f'FaceAnalyzer(detector={self.detector.__class__.__name__}']
if self.recognizer:
parts.append(f'recognizer={self.recognizer.__class__.__name__}')
if self.age_gender:
parts.append(f'age_gender={self.age_gender.__class__.__name__}')
return ', '.join(parts) + ')'

View File

@@ -2,7 +2,8 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Dict, Any, List, Union
from typing import Any, Dict, List, Union
import numpy as np
from uniface.attribute.age_gender import AgeGender
@@ -12,18 +13,14 @@ from uniface.constants import AgeGenderWeights, DDAMFNWeights
# Emotion requires PyTorch - make it optional
try:
from uniface.attribute.emotion import Emotion
_EMOTION_AVAILABLE = True
except ImportError:
Emotion = None
_EMOTION_AVAILABLE = False
# Public API for the attribute module
__all__ = [
"AgeGender",
"Emotion",
"create_attribute_predictor",
"predict_attributes"
]
__all__ = ['AgeGender', 'Emotion', 'create_attribute_predictor', 'predict_attributes']
# A mapping from model enums to their corresponding attribute classes
_ATTRIBUTE_MODELS = {
@@ -35,10 +32,7 @@ if _EMOTION_AVAILABLE:
_ATTRIBUTE_MODELS.update({model: Emotion for model in DDAMFNWeights})
def create_attribute_predictor(
model_name: Union[AgeGenderWeights, DDAMFNWeights],
**kwargs: Any
) -> Attribute:
def create_attribute_predictor(model_name: Union[AgeGenderWeights, DDAMFNWeights], **kwargs: Any) -> Attribute:
"""
Factory function to create an attribute predictor instance.
@@ -59,17 +53,16 @@ def create_attribute_predictor(
model_class = _ATTRIBUTE_MODELS.get(model_name)
if model_class is None:
raise ValueError(f"Unsupported attribute model: {model_name}. "
f"Please choose from AgeGenderWeights or DDAMFNWeights.")
raise ValueError(
f'Unsupported attribute model: {model_name}. Please choose from AgeGenderWeights or DDAMFNWeights.'
)
# Pass model_name to the constructor, as some classes might need it
return model_class(model_name=model_name, **kwargs)
def predict_attributes(
image: np.ndarray,
detections: List[Dict[str, np.ndarray]],
predictor: Attribute
image: np.ndarray, detections: List[Dict[str, np.ndarray]], predictor: Attribute
) -> List[Dict[str, Any]]:
"""
High-level API to predict attributes for multiple detected faces.
@@ -95,8 +88,8 @@ def predict_attributes(
face['attributes'] = {}
if isinstance(predictor, AgeGender):
gender, age = predictor(image, face['bbox'])
face['attributes']['gender'] = gender
gender_id, age = predictor(image, face['bbox'])
face['attributes']['gender_id'] = gender_id
face['attributes']['age'] = age
elif isinstance(predictor, Emotion):
emotion, confidence = predictor(image, face['landmark'])

View File

@@ -2,7 +2,7 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import List, Tuple, Union
from typing import List, Optional, Tuple, Union
import cv2
import numpy as np
@@ -14,7 +14,7 @@ from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
__all__ = ["AgeGender"]
__all__ = ['AgeGender']
class AgeGender(Attribute):
@@ -22,20 +22,32 @@ class AgeGender(Attribute):
Age and gender prediction model using ONNX Runtime.
This class inherits from the base `Attribute` class and implements the
functionality for predicting age (in years) and gender (0 for female,
1 for male) from a face image. It requires a bounding box to locate the face.
functionality for predicting age (in years) and gender ID (0 for Female,
1 for Male) from a face image. It requires a bounding box to locate the face.
Args:
model_name (AgeGenderWeights): The enum specifying the model weights to load.
Defaults to `AgeGenderWeights.DEFAULT`.
input_size (Optional[Tuple[int, int]]): Input size (height, width).
If None, automatically detected from model metadata. Defaults to None.
"""
def __init__(self, model_name: AgeGenderWeights = AgeGenderWeights.DEFAULT) -> None:
def __init__(
self,
model_name: AgeGenderWeights = AgeGenderWeights.DEFAULT,
input_size: Optional[Tuple[int, int]] = None,
) -> None:
"""
Initializes the AgeGender prediction model.
Args:
model_name (AgeGenderWeights): The enum specifying the model weights
to load.
model_name (AgeGenderWeights): The enum specifying the model weights to load.
input_size (Optional[Tuple[int, int]]): Input size (height, width).
If None, automatically detected from model metadata. Defaults to None.
"""
Logger.info(f"Initializing AgeGender with model={model_name.name}")
Logger.info(f'Initializing AgeGender with model={model_name.name}')
self.model_path = verify_model_weights(model_name)
self._user_input_size = input_size # Store user preference
self._initialize_model()
def _initialize_model(self) -> None:
@@ -47,12 +59,27 @@ class AgeGender(Attribute):
# Get model input details from the loaded model
input_meta = self.session.get_inputs()[0]
self.input_name = input_meta.name
self.input_size = tuple(input_meta.shape[2:4]) # (height, width)
# Use user-provided size if given, otherwise auto-detect from model
model_input_size = tuple(input_meta.shape[2:4]) # (height, width)
if self._user_input_size is not None:
self.input_size = self._user_input_size
if self._user_input_size != model_input_size:
Logger.warning(
f'Using custom input_size {self.input_size}, '
f'but model expects {model_input_size}. This may affect accuracy.'
)
else:
self.input_size = model_input_size
self.output_names = [output.name for output in self.session.get_outputs()]
Logger.info(f"Successfully initialized AgeGender model with input size {self.input_size}")
Logger.info(f'Successfully initialized AgeGender model with input size {self.input_size}')
except Exception as e:
Logger.error(f"Failed to load AgeGender model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize AgeGender model: {e}")
Logger.error(
f"Failed to load AgeGender model from '{self.model_path}'",
exc_info=True,
)
raise RuntimeError(f'Failed to initialize AgeGender model: {e}') from e
def preprocess(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> np.ndarray:
"""
@@ -76,11 +103,15 @@ class AgeGender(Attribute):
aligned_face, _ = bbox_center_alignment(image, center, self.input_size[1], scale, rotation)
blob = cv2.dnn.blobFromImage(
aligned_face, scalefactor=1.0, size=self.input_size[::-1], mean=(0.0, 0.0, 0.0), swapRB=True
aligned_face,
scalefactor=1.0,
size=self.input_size[::-1],
mean=(0.0, 0.0, 0.0),
swapRB=True,
)
return blob
def postprocess(self, prediction: np.ndarray) -> Tuple[str, int]:
def postprocess(self, prediction: np.ndarray) -> Tuple[int, int]:
"""
Processes the raw model output to extract gender and age.
@@ -88,17 +119,16 @@ class AgeGender(Attribute):
prediction (np.ndarray): The raw output from the model inference.
Returns:
Tuple[str, int]: A tuple containing the predicted gender label ("Female" or "Male")
Tuple[int, int]: A tuple containing the predicted gender ID (0 for Female, 1 for Male)
and age (in years).
"""
# First two values are gender logits
gender_id = int(np.argmax(prediction[:2]))
gender = "Female" if gender_id == 0 else "Male"
# Third value is normalized age, scaled by 100
age = int(np.round(prediction[2] * 100))
return gender, age
return gender_id, age
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[str, int]:
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, int]:
"""
Predicts age and gender for a single face specified by a bounding box.
@@ -107,22 +137,22 @@ class AgeGender(Attribute):
bbox (Union[List, np.ndarray]): The face bounding box coordinates [x1, y1, x2, y2].
Returns:
Tuple[str, int]: A tuple containing the predicted gender label and age.
Tuple[int, int]: A tuple containing the predicted gender ID (0 for Female, 1 for Male) and age.
"""
face_blob = self.preprocess(image, bbox)
prediction = self.session.run(self.output_names, {self.input_name: face_blob})[0][0]
gender, age = self.postprocess(prediction)
return gender, age
gender_id, age = self.postprocess(prediction)
return gender_id, age
# TODO: below is only for testing, remove it later
if __name__ == "__main__":
if __name__ == '__main__':
# To run this script, you need to have uniface.detection installed
# or available in your path.
from uniface.constants import RetinaFaceWeights
from uniface.detection import create_detector
print("Initializing models for live inference...")
print('Initializing models for live inference...')
# 1. Initialize the face detector
# Using a smaller model for faster real-time performance
detector = create_detector(model_name=RetinaFaceWeights.MNET_V2)
@@ -133,14 +163,14 @@ if __name__ == "__main__":
# 3. Start webcam capture
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open webcam.")
print('Error: Could not open webcam.')
exit()
print("Starting webcam feed. Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture frame.")
print('Error: Failed to capture frame.')
break
# Detect faces in the current frame
@@ -148,25 +178,34 @@ if __name__ == "__main__":
# For each detected face, predict age and gender
for detection in detections:
box = detection["bbox"]
box = detection['bbox']
x1, y1, x2, y2 = map(int, box)
# Predict attributes
gender, age = age_gender_predictor.predict(frame, box)
gender_id, age = age_gender_predictor.predict(frame, box)
gender_str = 'Female' if gender_id == 0 else 'Male'
# Prepare text and draw on the frame
label = f"{gender}, {age}"
label = f'{gender_str}, {age}'
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.putText(
frame,
label,
(x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.8,
(0, 255, 0),
2,
)
# Display the resulting frame
cv2.imshow("Age and Gender Inference (Press 'q' to quit)", frame)
# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord("q"):
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release resources
cap.release()
cv2.destroyAllWindows()
print("Inference stopped.")
print('Inference stopped.')

View File

@@ -4,6 +4,7 @@
from abc import ABC, abstractmethod
from typing import Any
import numpy as np
@@ -26,7 +27,7 @@ class Attribute(ABC):
inference session (e.g., ONNX Runtime, PyTorch), and any necessary
warm-up procedures to prepare the model for prediction.
"""
raise NotImplementedError("Subclasses must implement the _initialize_model method.")
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, image: np.ndarray, *args: Any) -> Any:
@@ -46,7 +47,7 @@ class Attribute(ABC):
Returns:
The preprocessed data ready for model inference.
"""
raise NotImplementedError("Subclasses must implement the preprocess method.")
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, prediction: Any) -> Any:
@@ -63,7 +64,7 @@ class Attribute(ABC):
Returns:
The final, processed attributes.
"""
raise NotImplementedError("Subclasses must implement the postprocess method.")
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def predict(self, image: np.ndarray, *args: Any) -> Any:
@@ -82,7 +83,7 @@ class Attribute(ABC):
Returns:
The final predicted attributes.
"""
raise NotImplementedError("Subclasses must implement the predict method.")
raise NotImplementedError('Subclasses must implement the predict method.')
def __call__(self, *args, **kwargs) -> Any:
"""

View File

@@ -2,18 +2,19 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import List, Tuple, Union
import cv2
import torch
import numpy as np
from typing import Tuple, Union, List
import torch
from uniface.attribute.base import Attribute
from uniface.log import Logger
from uniface.constants import DDAMFNWeights
from uniface.face_utils import face_alignment
from uniface.log import Logger
from uniface.model_store import verify_model_weights
__all__ = ["Emotion"]
__all__ = ['Emotion']
class Emotion(Attribute):
@@ -37,15 +38,30 @@ class Emotion(Attribute):
model_weights (DDAMFNWeights): The enum for the model weights to load.
input_size (Tuple[int, int]): The expected input size for the model.
"""
Logger.info(f"Initializing Emotion with model={model_weights.name}")
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Logger.info(f'Initializing Emotion with model={model_weights.name}')
if torch.backends.mps.is_available():
self.device = torch.device('mps')
elif torch.cuda.is_available():
self.device = torch.device('cuda')
else:
self.device = torch.device('cpu')
self.input_size = input_size
self.model_path = verify_model_weights(model_weights)
# Define emotion labels based on the selected model
self.emotion_labels = ["Neutral", "Happy", "Sad", "Surprise", "Fear", "Disgust", "Angry"]
self.emotion_labels = [
'Neutral',
'Happy',
'Sad',
'Surprise',
'Fear',
'Disgust',
'Angry',
]
if model_weights == DDAMFNWeights.AFFECNET8:
self.emotion_labels.append("Contempt")
self.emotion_labels.append('Contempt')
self._initialize_model()
@@ -60,10 +76,10 @@ class Emotion(Attribute):
dummy_input = torch.randn(1, 3, *self.input_size).to(self.device)
with torch.no_grad():
self.model(dummy_input)
Logger.info(f"Successfully initialized Emotion model on {self.device}")
Logger.info(f'Successfully initialized Emotion model on {self.device}')
except Exception as e:
Logger.error(f"Failed to load Emotion model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize Emotion model: {e}")
raise RuntimeError(f'Failed to initialize Emotion model: {e}') from e
def preprocess(self, image: np.ndarray, landmark: Union[List, np.ndarray]) -> torch.Tensor:
"""
@@ -77,7 +93,7 @@ class Emotion(Attribute):
torch.Tensor: The preprocessed image tensor ready for inference.
"""
landmark = np.asarray(landmark)
aligned_image, _ = face_alignment(image, landmark)
# Convert BGR to RGB, resize, normalize, and convert to a CHW tensor
@@ -114,11 +130,11 @@ class Emotion(Attribute):
# TODO: below is only for testing, remove it later
if __name__ == "__main__":
from uniface.detection import create_detector
if __name__ == '__main__':
from uniface.constants import RetinaFaceWeights
from uniface.detection import create_detector
print("Initializing models for live inference...")
print('Initializing models for live inference...')
# 1. Initialize the face detector
# Using a smaller model for faster real-time performance
detector = create_detector(model_name=RetinaFaceWeights.MNET_V2)
@@ -129,14 +145,14 @@ if __name__ == "__main__":
# 3. Start webcam capture
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open webcam.")
print('Error: Could not open webcam.')
exit()
print("Starting webcam feed. Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture frame.")
print('Error: Failed to capture frame.')
break
# Detect faces in the current frame.
@@ -151,11 +167,19 @@ if __name__ == "__main__":
# Predict attributes using the landmark
emotion, confidence = emotion_predictor.predict(frame, landmark)
# Prepare text and draw on the frame
label = f"{emotion} ({confidence:.2f})"
label = f'{emotion} ({confidence:.2f})'
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (255, 0, 0), 2)
cv2.putText(
frame,
label,
(x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.8,
(255, 0, 0),
2,
)
# Display the resulting frame
cv2.imshow("Emotion Inference (Press 'q' to quit)", frame)
@@ -167,4 +191,4 @@ if __name__ == "__main__":
# Release resources
cap.release()
cv2.destroyAllWindows()
print("Inference stopped.")
print('Inference stopped.')

View File

@@ -2,12 +2,22 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import math
import itertools
import math
from typing import List, Optional, Tuple
import cv2
import numpy as np
from typing import Tuple, List
__all__ = [
'resize_image',
'generate_anchors',
'non_max_suppression',
'decode_boxes',
'decode_landmarks',
'distance2bbox',
'distance2kps',
]
def resize_image(frame, target_shape: Tuple[int, int] = (640, 640)) -> Tuple[np.ndarray, float]:
@@ -45,26 +55,19 @@ def resize_image(frame, target_shape: Tuple[int, int] = (640, 640)) -> Tuple[np.
def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> np.ndarray:
"""
Generate anchor boxes for a given image size.
Generate anchor boxes for a given image size (RetinaFace specific).
Args:
image_size (Tuple[int, int]): Input image size (width, height). Defaults to (640, 640).
Returns:
np.ndarray: Anchor box coordinates as a NumPy array.
np.ndarray: Anchor box coordinates as a NumPy array with shape (num_anchors, 4).
"""
image_size = image_size
steps = [8, 16, 32]
min_sizes = [[16, 32], [64, 128], [256, 512]]
anchors = []
feature_maps = [
[
math.ceil(image_size[0] / step),
math.ceil(image_size[1] / step)
] for step in steps
]
feature_maps = [[math.ceil(image_size[0] / step), math.ceil(image_size[1] / step)] for step in steps]
for k, (map_height, map_width) in enumerate(feature_maps):
step = steps[k]
@@ -82,16 +85,16 @@ def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> np.ndarray:
return output
def non_max_supression(dets: List[np.ndarray], threshold: float):
def non_max_suppression(dets: np.ndarray, threshold: float) -> List[int]:
"""
Apply Non-Maximum Suppression (NMS) to reduce overlapping bounding boxes based on a threshold.
Args:
dets (numpy.ndarray): Array of detections with each row as [x1, y1, x2, y2, score].
dets (np.ndarray): Array of detections with each row as [x1, y1, x2, y2, score].
threshold (float): IoU threshold for suppression.
Returns:
list: Indices of bounding boxes retained after suppression.
List[int]: Indices of bounding boxes retained after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
@@ -122,19 +125,21 @@ def non_max_supression(dets: List[np.ndarray], threshold: float):
return keep
def decode_boxes(loc, priors, variances=[0.1, 0.2]) -> np.ndarray:
def decode_boxes(loc: np.ndarray, priors: np.ndarray, variances: Optional[List[float]] = None) -> np.ndarray:
"""
Decode locations from predictions using priors to undo
the encoding done for offset regression at train time.
the encoding done for offset regression at train time (RetinaFace specific).
Args:
loc (np.ndarray): Location predictions for loc layers, shape: [num_priors, 4]
priors (np.ndarray): Prior boxes in center-offset form, shape: [num_priors, 4]
variances (list[float]): Variances of prior boxes
variances (Optional[List[float]]): Variances of prior boxes. Defaults to [0.1, 0.2].
Returns:
np.ndarray: Decoded bounding box predictions
np.ndarray: Decoded bounding box predictions with shape [num_priors, 4]
"""
if variances is None:
variances = [0.1, 0.2]
# Compute centers of predicted boxes
cxcy = priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:]
@@ -149,18 +154,22 @@ def decode_boxes(loc, priors, variances=[0.1, 0.2]) -> np.ndarray:
return boxes
def decode_landmarks(predictions, priors, variances=[0.1, 0.2]) -> np.ndarray:
def decode_landmarks(
predictions: np.ndarray, priors: np.ndarray, variances: Optional[List[float]] = None
) -> np.ndarray:
"""
Decode landmark predictions using prior boxes.
Decode landmark predictions using prior boxes (RetinaFace specific).
Args:
predictions (np.ndarray): Landmark predictions, shape: [num_priors, 10]
priors (np.ndarray): Prior boxes, shape: [num_priors, 4]
variances (list): Scaling factors for landmark offsets.
variances (Optional[List[float]]): Scaling factors for landmark offsets. Defaults to [0.1, 0.2].
Returns:
np.ndarray: Decoded landmarks, shape: [num_priors, 10]
"""
if variances is None:
variances = [0.1, 0.2]
# Reshape predictions to [num_priors, 5, 2] to process landmark points
predictions = predictions.reshape(predictions.shape[0], 5, 2)
@@ -176,3 +185,59 @@ def decode_landmarks(predictions, priors, variances=[0.1, 0.2]) -> np.ndarray:
landmarks = landmarks.reshape(landmarks.shape[0], -1)
return landmarks
def distance2bbox(points: np.ndarray, distance: np.ndarray, max_shape: Optional[Tuple[int, int]] = None) -> np.ndarray:
"""
Decode distance prediction to bounding box (SCRFD specific).
Args:
points (np.ndarray): Anchor points with shape (n, 2), [x, y].
distance (np.ndarray): Distance from the given point to 4
boundaries (left, top, right, bottom) with shape (n, 4).
max_shape (Optional[Tuple[int, int]]): Shape of the image (height, width) for clipping.
Returns:
np.ndarray: Decoded bounding boxes with shape (n, 4) as [x1, y1, x2, y2].
"""
x1 = points[:, 0] - distance[:, 0]
y1 = points[:, 1] - distance[:, 1]
x2 = points[:, 0] + distance[:, 2]
y2 = points[:, 1] + distance[:, 3]
if max_shape is not None:
x1 = np.clip(x1, 0, max_shape[1])
y1 = np.clip(y1, 0, max_shape[0])
x2 = np.clip(x2, 0, max_shape[1])
y2 = np.clip(y2, 0, max_shape[0])
else:
x1 = np.maximum(x1, 0)
y1 = np.maximum(y1, 0)
x2 = np.maximum(x2, 0)
y2 = np.maximum(y2, 0)
return np.stack([x1, y1, x2, y2], axis=-1)
def distance2kps(points: np.ndarray, distance: np.ndarray, max_shape: Optional[Tuple[int, int]] = None) -> np.ndarray:
"""
Decode distance prediction to keypoints (SCRFD specific).
Args:
points (np.ndarray): Anchor points with shape (n, 2), [x, y].
distance (np.ndarray): Distance from the given point to keypoints with shape (n, 2k).
max_shape (Optional[Tuple[int, int]]): Shape of the image (height, width) for clipping.
Returns:
np.ndarray: Decoded keypoints with shape (n, 2k).
"""
preds = []
for i in range(0, distance.shape[1], 2):
px = points[:, i % 2] + distance[:, i]
py = points[:, i % 2 + 1] + distance[:, i + 1]
if max_shape is not None:
px = np.clip(px, 0, max_shape[1])
py = np.clip(py, 0, max_shape[0])
preds.append(px)
preds.append(py)
return np.stack(preds, axis=-1)

View File

@@ -5,6 +5,7 @@
from enum import Enum
from typing import Dict
# fmt: off
class SphereFaceWeights(str, Enum):
"""
@@ -29,7 +30,7 @@ class ArcFaceWeights(str, Enum):
Pretrained weights from ArcFace model (insightface).
https://github.com/deepinsight/insightface
"""
MNET = "arcface_mnet"
MNET = "arcface_mnet"
RESNET = "arcface_resnet"
class RetinaFaceWeights(str, Enum):
@@ -54,6 +55,22 @@ class SCRFDWeights(str, Enum):
SCRFD_500M_KPS = "scrfd_500m"
class YOLOv5FaceWeights(str, Enum):
"""
Trained on WIDER FACE dataset.
Original implementation: https://github.com/deepcam-cn/yolov5-face
Exported to ONNX from: https://github.com/yakhyo/yolov5-face-onnx-inference
Model Performance (WIDER FACE):
- YOLOV5N: 11MB, 93.61% Easy / 91.52% Medium / 80.53% Hard
- YOLOV5S: 28MB, 94.33% Easy / 92.61% Medium / 83.15% Hard
- YOLOV5M: 82MB, 95.30% Easy / 93.76% Medium / 85.28% Hard
"""
YOLOV5N = "yolov5n"
YOLOV5S = "yolov5s"
YOLOV5M = "yolov5m"
class DDAMFNWeights(str, Enum):
"""
Trained on AffectNet dataset.
@@ -78,87 +95,136 @@ class LandmarkWeights(str, Enum):
"""
DEFAULT = "2d_106"
# fmt: on
class GazeWeights(str, Enum):
"""
MobileGaze: Real-Time Gaze Estimation models.
Trained on Gaze360 dataset.
https://github.com/yakhyo/gaze-estimation
"""
RESNET18 = "gaze_resnet18"
RESNET34 = "gaze_resnet34"
RESNET50 = "gaze_resnet50"
MOBILENET_V2 = "gaze_mobilenetv2"
MOBILEONE_S0 = "gaze_mobileone_s0"
class ParsingWeights(str, Enum):
"""
Face Parsing: Semantic Segmentation of Facial Components.
Trained on CelebAMask-HQ dataset.
https://github.com/yakhyo/face-parsing
"""
RESNET18 = "parsing_resnet18"
RESNET34 = "parsing_resnet34"
class MiniFASNetWeights(str, Enum):
"""
MiniFASNet: Lightweight Face Anti-Spoofing models.
Trained on face anti-spoofing datasets.
https://github.com/yakhyo/face-anti-spoofing
Model Variants:
- V1SE: Uses scale=4.0 for face crop (squeese-and-excitation version)
- V2: Uses scale=2.7 for face crop (improved version)
"""
V1SE = "minifasnet_v1se"
V2 = "minifasnet_v2"
MODEL_URLS: Dict[Enum, str] = {
# RetinaFace
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv1_0.25.onnx',
RetinaFaceWeights.MNET_050: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv1_0.50.onnx',
RetinaFaceWeights.MNET_V1: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv1.onnx',
RetinaFaceWeights.MNET_V2: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_mv2.onnx',
RetinaFaceWeights.RESNET18: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_r18.onnx',
RetinaFaceWeights.RESNET34: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/retinaface_r34.onnx',
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.25.onnx',
RetinaFaceWeights.MNET_050: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.50.onnx',
RetinaFaceWeights.MNET_V1: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1.onnx',
RetinaFaceWeights.MNET_V2: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv2.onnx',
RetinaFaceWeights.RESNET18: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_r18.onnx',
RetinaFaceWeights.RESNET34: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_r34.onnx',
# MobileFace
MobileFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_V2: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_V3_SMALL: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_V3_LARGE: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
MobileFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/weights/mobilenetv1_0.25.onnx',
MobileFaceWeights.MNET_V2: 'https://github.com/yakhyo/uniface/releases/download/weights/mobilenetv2.onnx',
MobileFaceWeights.MNET_V3_SMALL: 'https://github.com/yakhyo/uniface/releases/download/weights/mobilenetv3_small.onnx',
MobileFaceWeights.MNET_V3_LARGE: 'https://github.com/yakhyo/uniface/releases/download/weights/mobilenetv3_large.onnx',
# SphereFace
SphereFaceWeights.SPHERE20: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
SphereFaceWeights.SPHERE36: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/###',
SphereFaceWeights.SPHERE20: 'https://github.com/yakhyo/uniface/releases/download/weights/sphere20.onnx',
SphereFaceWeights.SPHERE36: 'https://github.com/yakhyo/uniface/releases/download/weights/sphere36.onnx',
# ArcFace
ArcFaceWeights.MNET: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/w600k_mbf.onnx',
ArcFaceWeights.RESNET: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/w600k_r50.onnx',
ArcFaceWeights.MNET: 'https://github.com/yakhyo/uniface/releases/download/weights/w600k_mbf.onnx',
ArcFaceWeights.RESNET: 'https://github.com/yakhyo/uniface/releases/download/weights/w600k_r50.onnx',
# SCRFD
SCRFDWeights.SCRFD_10G_KPS: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/scrfd_10g_kps.onnx',
SCRFDWeights.SCRFD_500M_KPS: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/scrfd_500m_kps.onnx',
SCRFDWeights.SCRFD_10G_KPS: 'https://github.com/yakhyo/uniface/releases/download/weights/scrfd_10g_kps.onnx',
SCRFDWeights.SCRFD_500M_KPS: 'https://github.com/yakhyo/uniface/releases/download/weights/scrfd_500m_kps.onnx',
# YOLOv5-Face
YOLOv5FaceWeights.YOLOV5N: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5n_face.onnx',
YOLOv5FaceWeights.YOLOV5S: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5s_face.onnx',
YOLOv5FaceWeights.YOLOV5M: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5m_face.onnx',
# DDAFM
DDAMFNWeights.AFFECNET7: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/affecnet7.script',
DDAMFNWeights.AFFECNET8: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/affecnet8.script',
DDAMFNWeights.AFFECNET7: 'https://github.com/yakhyo/uniface/releases/download/weights/affecnet7.script',
DDAMFNWeights.AFFECNET8: 'https://github.com/yakhyo/uniface/releases/download/weights/affecnet8.script',
# AgeGender
AgeGenderWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/genderage.onnx',
AgeGenderWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/weights/genderage.onnx',
# Landmarks
LandmarkWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/v0.1.2/2d106det.onnx',
LandmarkWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/weights/2d106det.onnx',
# Gaze (MobileGaze)
GazeWeights.RESNET18: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet18_gaze.onnx',
GazeWeights.RESNET34: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet34_gaze.onnx',
GazeWeights.RESNET50: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet50_gaze.onnx',
GazeWeights.MOBILENET_V2: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobilenetv2_gaze.onnx',
GazeWeights.MOBILEONE_S0: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobileone_s0_gaze.onnx',
# Parsing
ParsingWeights.RESNET18: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',
ParsingWeights.RESNET34: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet34.onnx',
# Anti-Spoofing (MiniFASNet)
MiniFASNetWeights.V1SE: 'https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV1SE.onnx',
MiniFASNetWeights.V2: 'https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV2.onnx',
}
MODEL_SHA256: Dict[Enum, str] = {
# RetinaFace
RetinaFaceWeights.MNET_025: 'b7a7acab55e104dce6f32cdfff929bd83946da5cd869b9e2e9bdffafd1b7e4a5',
RetinaFaceWeights.MNET_050: 'd8977186f6037999af5b4113d42ba77a84a6ab0c996b17c713cc3d53b88bfc37',
RetinaFaceWeights.MNET_V1: '75c961aaf0aff03d13c074e9ec656e5510e174454dd4964a161aab4fe5f04153',
RetinaFaceWeights.MNET_V2: '3ca44c045651cabeed1193a1fae8946ad1f3a55da8fa74b341feab5a8319f757',
RetinaFaceWeights.RESNET18: 'e8b5ddd7d2c3c8f7c942f9f10cec09d8e319f78f09725d3f709631de34fb649d',
RetinaFaceWeights.RESNET34: 'bd0263dc2a465d32859555cb1741f2d98991eb0053696e8ee33fec583d30e630',
RetinaFaceWeights.MNET_025: 'b7a7acab55e104dce6f32cdfff929bd83946da5cd869b9e2e9bdffafd1b7e4a5',
RetinaFaceWeights.MNET_050: 'd8977186f6037999af5b4113d42ba77a84a6ab0c996b17c713cc3d53b88bfc37',
RetinaFaceWeights.MNET_V1: '75c961aaf0aff03d13c074e9ec656e5510e174454dd4964a161aab4fe5f04153',
RetinaFaceWeights.MNET_V2: '3ca44c045651cabeed1193a1fae8946ad1f3a55da8fa74b341feab5a8319f757',
RetinaFaceWeights.RESNET18: 'e8b5ddd7d2c3c8f7c942f9f10cec09d8e319f78f09725d3f709631de34fb649d',
RetinaFaceWeights.RESNET34: 'bd0263dc2a465d32859555cb1741f2d98991eb0053696e8ee33fec583d30e630',
# MobileFace
MobileFaceWeights.MNET_025: '#',
MobileFaceWeights.MNET_V2: '#',
MobileFaceWeights.MNET_V3_SMALL: '#',
MobileFaceWeights.MNET_V3_LARGE: '#',
MobileFaceWeights.MNET_025: 'eeda7d23d9c2b40cf77fa8da8e895b5697465192648852216074679657f8ee8b',
MobileFaceWeights.MNET_V2: '38b148284dd48cc898d5d4453104252fbdcbacc105fe3f0b80e78954d9d20d89',
MobileFaceWeights.MNET_V3_SMALL: 'd4acafa1039a82957aa8a9a1dac278a401c353a749c39df43de0e29cc1c127c3',
MobileFaceWeights.MNET_V3_LARGE: '0e48f8e11f070211716d03e5c65a3db35a5e917cfb5bc30552358629775a142a',
# SphereFace
SphereFaceWeights.SPHERE20: '#',
SphereFaceWeights.SPHERE36: '#',
SphereFaceWeights.SPHERE20: 'c02878cf658eb1861f580b7e7144b0d27cc29c440bcaa6a99d466d2854f14c9d',
SphereFaceWeights.SPHERE36: '13b3890cd5d7dec2b63f7c36fd7ce07403e5a0bbb701d9647c0289e6cbe7bb20',
# ArcFace
ArcFaceWeights.MNET: '9cc6e4a75f0e2bf0b1aed94578f144d15175f357bdc05e815e5c4a02b319eb4f',
ArcFaceWeights.RESNET: '4c06341c33c2ca1f86781dab0e829f88ad5b64be9fba56e56bc9ebdefc619e43',
ArcFaceWeights.MNET: '9cc6e4a75f0e2bf0b1aed94578f144d15175f357bdc05e815e5c4a02b319eb4f',
ArcFaceWeights.RESNET: '4c06341c33c2ca1f86781dab0e829f88ad5b64be9fba56e56bc9ebdefc619e43',
# SCRFD
SCRFDWeights.SCRFD_10G_KPS: '5838f7fe053675b1c7a08b633df49e7af5495cee0493c7dcf6697200b85b5b91',
SCRFDWeights.SCRFD_500M_KPS: '5e4447f50245bbd7966bd6c0fa52938c61474a04ec7def48753668a9d8b4ea3a',
SCRFDWeights.SCRFD_10G_KPS: '5838f7fe053675b1c7a08b633df49e7af5495cee0493c7dcf6697200b85b5b91',
SCRFDWeights.SCRFD_500M_KPS: '5e4447f50245bbd7966bd6c0fa52938c61474a04ec7def48753668a9d8b4ea3a',
# YOLOv5-Face
YOLOv5FaceWeights.YOLOV5N: 'eb244a06e36999db732b317c2b30fa113cd6cfc1a397eaf738f2d6f33c01f640',
YOLOv5FaceWeights.YOLOV5S: 'fc682801cd5880e1e296184a14aea0035486b5146ec1a1389d2e7149cb134bb2',
YOLOv5FaceWeights.YOLOV5M: '04302ce27a15bde3e20945691b688e2dd018a10e92dd8932146bede6a49207b2',
# DDAFM
DDAMFNWeights.AFFECNET7: '10535bf8b6afe8e9d6ae26cea6c3add9a93036e9addb6adebfd4a972171d015d',
DDAMFNWeights.AFFECNET8: '8c66963bc71db42796a14dfcbfcd181b268b65a3fc16e87147d6a3a3d7e0f487',
DDAMFNWeights.AFFECNET7: '10535bf8b6afe8e9d6ae26cea6c3add9a93036e9addb6adebfd4a972171d015d',
DDAMFNWeights.AFFECNET8: '8c66963bc71db42796a14dfcbfcd181b268b65a3fc16e87147d6a3a3d7e0f487',
# AgeGender
AgeGenderWeights.DEFAULT: '4fde69b1c810857b88c64a335084f1c3fe8f01246c9a191b48c7bb756d6652fb',
AgeGenderWeights.DEFAULT: '4fde69b1c810857b88c64a335084f1c3fe8f01246c9a191b48c7bb756d6652fb',
# Landmark
LandmarkWeights.DEFAULT: 'f001b856447c413801ef5c42091ed0cd516fcd21f2d6b79635b1e733a7109dbf',
LandmarkWeights.DEFAULT: 'f001b856447c413801ef5c42091ed0cd516fcd21f2d6b79635b1e733a7109dbf',
# MobileGaze (trained on Gaze360)
GazeWeights.RESNET18: '23d5d7e4f6f40dce8c35274ce9d08b45b9e22cbaaf5af73182f473229d713d31',
GazeWeights.RESNET34: '4457ee5f7acd1a5ab02da4b61f02fc3a0b17adbf3844dd0ba3cd4288f2b5e1de',
GazeWeights.RESNET50: 'e1eaf98f5ec7c89c6abe7cfe39f7be83e747163f98d1ff945c0603b3c521be22',
GazeWeights.MOBILENET_V2: 'fdcdb84e3e6421b5a79e8f95139f249fc258d7f387eed5ddac2b80a9a15ce076',
GazeWeights.MOBILEONE_S0: 'c0b5a4f4a0ffd24f76ab3c1452354bb2f60110899fd9a88b464c75bafec0fde8',
# Face Parsing
ParsingWeights.RESNET18: '0d9bd318e46987c3bdbfacae9e2c0f461cae1c6ac6ea6d43bbe541a91727e33f',
ParsingWeights.RESNET34: '5b805bba7b5660ab7070b5a381dcf75e5b3e04199f1e9387232a77a00095102e',
# Anti-Spoofing (MiniFASNet)
MiniFASNetWeights.V1SE: 'ebab7f90c7833fbccd46d3a555410e78d969db5438e169b6524be444862b3676',
MiniFASNetWeights.V2: 'b32929adc2d9c34b9486f8c4c7bc97c1b69bc0ea9befefc380e4faae4e463907',
}
CHUNK_SIZE = 8192

View File

@@ -3,44 +3,48 @@
# GitHub: https://github.com/yakhyo
import numpy as np
from typing import Tuple, Dict, Any, List
from typing import Any, Dict, List
import numpy as np
from uniface.face import Face
from .scrfd import SCRFD
from .base import BaseDetector
from .retinaface import RetinaFace
from .scrfd import SCRFD
from .yolov5 import YOLOv5Face
# Global cache for detector instances
_detector_cache: Dict[str, BaseDetector] = {}
def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> List[Dict[str, Any]]:
def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> List[Face]:
"""
High-level face detection function.
Args:
image (np.ndarray): Input image as numpy array.
method (str): Detection method to use. Options: 'retinaface', 'scrfd'.
method (str): Detection method to use. Options: 'retinaface', 'scrfd', 'yolov5face'.
**kwargs: Additional arguments passed to the detector.
Returns:
List[Dict[str, Any]]: A list of dictionaries, where each dictionary represents a detected face and contains:
- 'bbox' (List[float]): [x1, y1, x2, y2] bounding box coordinates.
- 'confidence' (float): The confidence score of the detection.
- 'landmarks' (List[List[float]]): 5-point facial landmarks.
List[Face]: A list of Face objects, each containing:
- bbox (np.ndarray): [x1, y1, x2, y2] bounding box coordinates.
- confidence (float): The confidence score of the detection.
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2).
Example:
>>> from uniface import detect_faces
>>> image = cv2.imread("your_image.jpg")
>>> faces = detect_faces(image, method='retinaface', conf_thresh=0.8)
>>> for face in faces:
... print(f"Found face with confidence: {face['confidence']}")
... print(f"BBox: {face['bbox']}")
... print(f"Found face with confidence: {face.confidence}")
... print(f"BBox: {face.bbox}")
"""
method_name = method.lower()
sorted_kwargs = sorted(kwargs.items())
cache_key = f"{method_name}_{str(sorted_kwargs)}"
cache_key = f'{method_name}_{str(sorted_kwargs)}'
if cache_key not in _detector_cache:
# Pass kwargs to create the correctly configured detector
@@ -58,6 +62,7 @@ def create_detector(method: str = 'retinaface', **kwargs) -> BaseDetector:
method (str): Detection method. Options:
- 'retinaface': RetinaFace detector (default)
- 'scrfd': SCRFD detector (fast and accurate)
- 'yolov5face': YOLOv5-Face detector (accurate with landmarks)
**kwargs: Detector-specific parameters
Returns:
@@ -85,6 +90,14 @@ def create_detector(method: str = 'retinaface', **kwargs) -> BaseDetector:
... conf_thresh=0.8,
... nms_thresh=0.4
... )
>>> # YOLOv5-Face detector
>>> detector = create_detector(
... 'yolov5face',
... model_name=YOLOv5FaceWeights.YOLOV5S,
... conf_thresh=0.25,
... nms_thresh=0.45
... )
"""
method = method.lower()
@@ -94,12 +107,12 @@ def create_detector(method: str = 'retinaface', **kwargs) -> BaseDetector:
elif method == 'scrfd':
return SCRFD(**kwargs)
elif method == 'yolov5face':
return YOLOv5Face(**kwargs)
else:
available_methods = ['retinaface', 'scrfd']
raise ValueError(
f"Unsupported detection method: '{method}'. "
f"Available methods: {available_methods}"
)
available_methods = ['retinaface', 'scrfd', 'yolov5face']
raise ValueError(f"Unsupported detection method: '{method}'. Available methods: {available_methods}")
def list_available_detectors() -> Dict[str, Dict[str, Any]]:
@@ -118,8 +131,8 @@ def list_available_detectors() -> Dict[str, Dict[str, Any]]:
'model_name': 'mnet_v2',
'conf_thresh': 0.5,
'nms_thresh': 0.4,
'input_size': (640, 640)
}
'input_size': (640, 640),
},
},
'scrfd': {
'description': 'SCRFD detector - fast and accurate with efficient architecture',
@@ -129,9 +142,20 @@ def list_available_detectors() -> Dict[str, Dict[str, Any]]:
'model_name': 'scrfd_10g_kps',
'conf_thresh': 0.5,
'nms_thresh': 0.4,
'input_size': (640, 640)
}
}
'input_size': (640, 640),
},
},
'yolov5face': {
'description': 'YOLOv5-Face detector - accurate face detection with landmarks',
'supports_landmarks': True,
'paper': 'https://arxiv.org/abs/2105.12931',
'default_params': {
'model_name': 'yolov5s_face',
'conf_thresh': 0.25,
'nms_thresh': 0.45,
'input_size': 640,
},
},
}
@@ -141,5 +165,6 @@ __all__ = [
'list_available_detectors',
'SCRFD',
'RetinaFace',
'YOLOv5Face',
'BaseDetector',
]

View File

@@ -2,13 +2,12 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""
Base classes for face detection.
"""
from abc import ABC, abstractmethod
from typing import Any, Dict, List
import numpy as np
from abc import ABC, abstractmethod
from typing import Tuple, Dict, Any
from uniface.face import Face
class BaseDetector(ABC):
@@ -24,7 +23,7 @@ class BaseDetector(ABC):
self.config = kwargs
@abstractmethod
def detect(self, image: np.ndarray, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
def detect(self, image: np.ndarray, **kwargs) -> List[Face]:
"""
Detect faces in an image.
@@ -33,11 +32,17 @@ class BaseDetector(ABC):
**kwargs: Additional detection parameters
Returns:
Tuple[np.ndarray, np.ndarray]: (detections, landmarks)
- detections: Bounding boxes with confidence scores, shape (N, 5)
Format: [x_min, y_min, x_max, y_max, confidence]
- landmarks: Facial landmark points, shape (N, 5, 2) for 5-point landmarks
or (N, 68, 2) for 68-point landmarks. Empty array if not supported.
List[Face]: List of detected Face objects, each containing:
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
- confidence (float): Detection confidence score (0.0 to 1.0)
- landmarks (np.ndarray): Facial landmarks with shape (5, 2) for 5-point landmarks
Example:
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face.bbox # np.ndarray with shape (4,)
... confidence = face.confidence # float
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
"""
pass
@@ -55,7 +60,7 @@ class BaseDetector(ABC):
pass
@abstractmethod
def postprocess(self, outputs, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
def postprocess(self, outputs, **kwargs) -> Any:
"""
Postprocess model outputs to get final detections.
@@ -64,13 +69,13 @@ class BaseDetector(ABC):
**kwargs: Additional postprocessing parameters
Returns:
Tuple[np.ndarray, np.ndarray]: (detections, landmarks)
Any: Processed outputs (implementation-specific format, typically tuple of arrays)
"""
pass
def __str__(self) -> str:
"""String representation of the detector."""
return f"{self.__class__.__name__}({self.config})"
return f'{self.__class__.__name__}({self.config})'
def __repr__(self) -> str:
"""Detailed string representation."""
@@ -96,5 +101,5 @@ class BaseDetector(ABC):
return {
'name': self.__class__.__name__,
'supports_landmarks': self._supports_landmarks,
'config': self.config
'config': self.config,
}

View File

@@ -2,23 +2,24 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Any, List, Literal, Tuple
import numpy as np
from typing import Tuple, List, Literal, Dict, Any
from uniface.common import (
decode_boxes,
decode_landmarks,
generate_anchors,
non_max_suppression,
resize_image,
)
from uniface.constants import RetinaFaceWeights
from uniface.face import Face
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.constants import RetinaFaceWeights
from uniface.onnx_utils import create_onnx_session
from .base import BaseDetector
from .utils import (
non_max_supression,
resize_image,
decode_boxes,
generate_anchors,
decode_landmarks
)
class RetinaFace(BaseDetector):
@@ -27,16 +28,19 @@ class RetinaFace(BaseDetector):
Title: "RetinaFace: Single-stage Dense Face Localisation in the Wild"
Paper: https://arxiv.org/abs/1905.00641
Code: https://github.com/yakhyo/retinaface-pytorch
Args:
**kwargs: Keyword arguments passed to BaseDetector and RetinaFace. Supported keys include:
model_name (RetinaFaceWeights, optional): Model weights to use. Defaults to `RetinaFaceWeights.MNET_V2`.
conf_thresh (float, optional): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float, optional): Non-maximum suppression (NMS) IoU threshold. Defaults to 0.4.
pre_nms_topk (int, optional): Number of top-scoring boxes considered before NMS. Defaults to 5000.
post_nms_topk (int, optional): Max number of detections kept after NMS. Defaults to 750.
dynamic_size (bool, optional): If True, generate anchors dynamically per input image. Defaults to False.
input_size (Tuple[int, int], optional): Fixed input size (width, height) if `dynamic_size=False`. Defaults to (640, 640).
model_name (RetinaFaceWeights): Model weights to use. Defaults to `RetinaFaceWeights.MNET_V2`.
conf_thresh (float): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float): Non-maximum suppression (NMS) IoU threshold. Defaults to 0.4.
input_size (Tuple[int, int]): Fixed input size (width, height) if `dynamic_size=False`.
Defaults to (640, 640).
Note: Non-default sizes may cause slower inference and CoreML compatibility issues.
**kwargs: Advanced options:
pre_nms_topk (int): Number of top-scoring boxes considered before NMS. Defaults to 5000.
post_nms_topk (int): Max number of detections kept after NMS. Defaults to 750.
dynamic_size (bool): If True, generate anchors dynamically per input image. Defaults to False.
Attributes:
model_name (RetinaFaceWeights): Selected model variant.
@@ -55,31 +59,47 @@ class RetinaFace(BaseDetector):
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
def __init__(
self,
*,
model_name: RetinaFaceWeights = RetinaFaceWeights.MNET_V2,
conf_thresh: float = 0.5,
nms_thresh: float = 0.4,
input_size: Tuple[int, int] = (640, 640),
**kwargs: Any,
) -> None:
super().__init__(
model_name=model_name,
conf_thresh=conf_thresh,
nms_thresh=nms_thresh,
input_size=input_size,
**kwargs,
)
self._supports_landmarks = True # RetinaFace supports landmarks
self.model_name = kwargs.get('model_name', RetinaFaceWeights.MNET_V2)
self.conf_thresh = kwargs.get('conf_thresh', 0.5)
self.nms_thresh = kwargs.get('nms_thresh', 0.4)
self.model_name = model_name
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
self.input_size = input_size
# Advanced options from kwargs
self.pre_nms_topk = kwargs.get('pre_nms_topk', 5000)
self.post_nms_topk = kwargs.get('post_nms_topk', 750)
self.dynamic_size = kwargs.get('dynamic_size', False)
self.input_size = kwargs.get('input_size', (640, 640))
Logger.info(
f"Initializing RetinaFace with model={self.model_name}, conf_thresh={self.conf_thresh}, nms_thresh={self.nms_thresh}, "
f"input_size={self.input_size}"
f'Initializing RetinaFace with model={self.model_name}, conf_thresh={self.conf_thresh}, '
f'nms_thresh={self.nms_thresh}, input_size={self.input_size}'
)
# Get path to model weights
self._model_path = verify_model_weights(self.model_name)
Logger.info(f"Verified model weights located at: {self._model_path}")
Logger.info(f'Verified model weights located at: {self._model_path}')
# Precompute anchors if using static size
if not self.dynamic_size and self.input_size is not None:
self._priors = generate_anchors(image_size=self.input_size)
Logger.debug("Generated anchors for static input size.")
Logger.debug('Generated anchors for static input size.')
# Initialize model
self._initialize_model(self._model_path)
@@ -98,7 +118,7 @@ class RetinaFace(BaseDetector):
self.session = create_onnx_session(model_path)
self.input_names = self.session.get_inputs()[0].name
self.output_names = [x.name for x in self.session.get_outputs()]
Logger.info(f"Successfully initialized the model from {model_path}")
Logger.info(f'Successfully initialized the model from {model_path}')
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
@@ -131,10 +151,11 @@ class RetinaFace(BaseDetector):
def detect(
self,
image: np.ndarray,
*,
max_num: int = 0,
metric: Literal["default", "max"] = "max",
center_weight: float = 2.0
) -> List[Dict[str, Any]]:
metric: Literal['default', 'max'] = 'max',
center_weight: float = 2.0,
) -> List[Face]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
@@ -148,10 +169,19 @@ class RetinaFace(BaseDetector):
when using the "default" metric. Defaults to 2.0.
Returns:
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
- 'bbox': [x1, y1, x2, y2] - Bounding box coordinates
- 'confidence': float - Detection confidence score
- 'landmarks': [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5]] - 5-point facial landmarks
List[Face]: List of Face objects, each containing:
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
- confidence (float): Detection confidence score (0.0 to 1.0)
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
Example:
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face.bbox # np.ndarray with shape (4,)
... confidence = face.confidence # float
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
... # Can pass landmarks directly to recognition
... embedding = recognizer.get_normalized_embedding(image, face.landmarks)
"""
original_height, original_width = image.shape[:2]
@@ -178,10 +208,12 @@ class RetinaFace(BaseDetector):
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack([
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0]
])
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
# Calculate scores based on the chosen metric
@@ -198,16 +230,18 @@ class RetinaFace(BaseDetector):
faces = []
for i in range(detections.shape[0]):
face_dict = {
'bbox': detections[i, :4].astype(float).tolist(),
'confidence': detections[i, 4].item(),
'landmarks': landmarks[i].astype(float).tolist()
}
faces.append(face_dict)
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces
def postprocess(self, outputs: List[np.ndarray], resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
def postprocess(
self, outputs: List[np.ndarray], resize_factor: float, shape: Tuple[int, int]
) -> Tuple[np.ndarray, np.ndarray]:
"""
Process the model outputs into final detection results.
@@ -226,7 +260,11 @@ class RetinaFace(BaseDetector):
- landmarks (np.ndarray): Array of detected facial landmarks.
Shape: (num_detections, 5, 2), where each row contains 5 landmark points (x, y).
"""
loc, conf, landmarks = outputs[0].squeeze(0), outputs[1].squeeze(0), outputs[2].squeeze(0)
loc, conf, landmarks = (
outputs[0].squeeze(0),
outputs[1].squeeze(0),
outputs[2].squeeze(0),
)
# Decode boxes and landmarks
boxes = decode_boxes(loc, self._priors)
@@ -242,22 +280,31 @@ class RetinaFace(BaseDetector):
boxes, landmarks, scores = boxes[mask], landmarks[mask], scores[mask]
# Sort by scores
order = scores.argsort()[::-1][:self.pre_nms_topk]
order = scores.argsort()[::-1][: self.pre_nms_topk]
boxes, landmarks, scores = boxes[order], landmarks[order], scores[order]
# Apply NMS
detections = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
keep = non_max_supression(detections, self.nms_thresh)
keep = non_max_suppression(detections, self.nms_thresh)
detections, landmarks = detections[keep], landmarks[keep]
# Keep top-k detections
detections, landmarks = detections[:self.post_nms_topk], landmarks[:self.post_nms_topk]
detections, landmarks = (
detections[: self.post_nms_topk],
landmarks[: self.post_nms_topk],
)
landmarks = landmarks.reshape(-1, 5, 2).astype(np.int32)
landmarks = landmarks.reshape(-1, 5, 2).astype(np.float32)
return detections, landmarks
def _scale_detections(self, boxes: np.ndarray, landmarks: np.ndarray, resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
def _scale_detections(
self,
boxes: np.ndarray,
landmarks: np.ndarray,
resize_factor: float,
shape: Tuple[int, int],
) -> Tuple[np.ndarray, np.ndarray]:
# Scale bounding boxes and landmarks to the original image size.
bbox_scale = np.array([shape[0], shape[1]] * 2)
boxes = boxes * bbox_scale / resize_factor
@@ -272,30 +319,31 @@ class RetinaFace(BaseDetector):
def draw_bbox(frame, bbox, score, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = map(int, bbox) # Unpack 4 bbox values
cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness)
cv2.putText(frame, f"{score:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
cv2.putText(frame, f'{score:.2f}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
for (x, y) in points.astype(np.int32):
for x, y in points.astype(np.int32):
cv2.circle(frame, (int(x), int(y)), radius, color, -1)
if __name__ == "__main__":
if __name__ == '__main__':
import cv2
detector = RetinaFace(model_name=RetinaFaceWeights.MNET_050)
print(detector.get_info())
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Failed to open webcam.")
print('Failed to open webcam.')
exit()
print("📷 Webcam started. Press 'q' to exit.")
print("Webcam started. Press 'q' to exit.")
while True:
ret, frame = cap.read()
if not ret:
print("Failed to read frame.")
print('Failed to read frame.')
break
# Get face detections as list of dictionaries
@@ -303,26 +351,26 @@ if __name__ == "__main__":
# Process each detected face
for face in faces:
# Extract bbox and landmarks from dictionary
bbox = face['bbox'] # [x1, y1, x2, y2]
landmarks = face['landmarks'] # [[x1, y1], [x2, y2], ...]
confidence = face['confidence']
# Extract bbox and landmarks from Face object
draw_bbox(frame, face.bbox, face.confidence)
# Pass bbox and confidence separately
draw_bbox(frame, bbox, confidence)
# Convert landmarks to numpy array format if needed
if landmarks is not None and len(landmarks) > 0:
# Convert list of [x, y] pairs to numpy array
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
draw_keypoints(frame, points)
# Draw landmarks if available
if face.landmarks is not None and len(face.landmarks) > 0:
draw_keypoints(frame, face.landmarks)
# Display face count
cv2.putText(frame, f"Faces: {len(faces)}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(255, 255, 255),
2,
)
cv2.imshow("FaceDetection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
cv2.imshow('FaceDetection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()

View File

@@ -2,20 +2,21 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Any, Dict, List, Literal, Tuple
from typing import Any, List, Literal, Tuple
import cv2
import numpy as np
from uniface.common import distance2bbox, distance2kps, non_max_suppression, resize_image
from uniface.constants import SCRFDWeights
from uniface.face import Face
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseDetector
from .utils import distance2bbox, distance2kps, non_max_supression, resize_image
__all__ = ["SCRFD"]
__all__ = ['SCRFD']
class SCRFD(BaseDetector):
@@ -24,16 +25,20 @@ class SCRFD(BaseDetector):
Title: "Sample and Computation Redistribution for Efficient Face Detection"
Paper: https://arxiv.org/abs/2105.04714
Code: https://github.com/insightface/insightface
Args:
**kwargs: Keyword arguments passed to BaseDetector and SCRFD. Supported keys include:
model_name (SCRFDWeights, optional): Predefined model enum (e.g., `SCRFD_10G_KPS`).
Specifies the SCRFD variant to load. Defaults to SCRFD_10G_KPS.
conf_thresh (float, optional): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float, optional): Non-Maximum Suppression threshold. Defaults to 0.4.
input_size (Tuple[int, int], optional): Input image size (width, height). Defaults to (640, 640).
model_name (SCRFDWeights): Predefined model enum (e.g., `SCRFD_10G_KPS`).
Specifies the SCRFD variant to load. Defaults to SCRFD_10G_KPS.
conf_thresh (float): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float): Non-Maximum Suppression threshold. Defaults to 0.4.
input_size (Tuple[int, int]): Input image size (width, height).
Defaults to (640, 640).
Note: Non-default sizes may cause slower inference and CoreML compatibility issues.
**kwargs: Reserved for future advanced options.
Attributes:
model_name (SCRFDWeights): Selected model variant.
conf_thresh (float): Threshold used to filter low-confidence detections.
nms_thresh (float): Threshold used during NMS to suppress overlapping boxes.
input_size (Tuple[int, int]): Image size to which inputs are resized before inference.
@@ -48,15 +53,25 @@ class SCRFD(BaseDetector):
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
def __init__(
self,
*,
model_name: SCRFDWeights = SCRFDWeights.SCRFD_10G_KPS,
conf_thresh: float = 0.5,
nms_thresh: float = 0.4,
input_size: Tuple[int, int] = (640, 640),
**kwargs: Any,
) -> None:
super().__init__(
model_name=model_name,
conf_thresh=conf_thresh,
nms_thresh=nms_thresh,
input_size=input_size,
**kwargs,
)
self._supports_landmarks = True # SCRFD supports landmarks
model_name = kwargs.get("model_name", SCRFDWeights.SCRFD_10G_KPS)
conf_thresh = kwargs.get("conf_thresh", 0.5)
nms_thresh = kwargs.get("nms_thresh", 0.4)
input_size = kwargs.get("input_size", (640, 640))
self.model_name = model_name
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
self.input_size = input_size
@@ -69,13 +84,13 @@ class SCRFD(BaseDetector):
# ---------------------------------
Logger.info(
f"Initializing SCRFD with model={model_name}, conf_thresh={conf_thresh}, nms_thresh={nms_thresh}, "
f"input_size={input_size}"
f'Initializing SCRFD with model={self.model_name}, conf_thresh={self.conf_thresh}, '
f'nms_thresh={self.nms_thresh}, input_size={self.input_size}'
)
# Get path to model weights
self._model_path = verify_model_weights(model_name)
Logger.info(f"Verified model weights located at: {self._model_path}")
self._model_path = verify_model_weights(self.model_name)
Logger.info(f'Verified model weights located at: {self._model_path}')
# Initialize model
self._initialize_model(self._model_path)
@@ -94,7 +109,7 @@ class SCRFD(BaseDetector):
self.session = create_onnx_session(model_path)
self.input_names = self.session.get_inputs()[0].name
self.output_names = [x.name for x in self.session.get_outputs()]
Logger.info(f"Successfully initialized the model from {model_path}")
Logger.info(f'Successfully initialized the model from {model_path}')
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
@@ -173,8 +188,13 @@ class SCRFD(BaseDetector):
return scores_list, bboxes_list, kpss_list
def detect(
self, image: np.ndarray, max_num: int = 0, metric: Literal["default", "max"] = "max", center_weight: float = 2
) -> List[Dict[str, Any]]:
self,
image: np.ndarray,
*,
max_num: int = 0,
metric: Literal['default', 'max'] = 'max',
center_weight: float = 2.0,
) -> List[Face]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
@@ -188,10 +208,19 @@ class SCRFD(BaseDetector):
when using the "default" metric. Defaults to 2.0.
Returns:
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
- 'bbox': [x1, y1, x2, y2] - Bounding box coordinates
- 'confidence': float - Detection confidence score
- 'landmarks': [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5]] - 5-point facial landmarks
List[Face]: List of Face objects, each containing:
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
- confidence (float): Detection confidence score (0.0 to 1.0)
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
Example:
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face.bbox # np.ndarray with shape (4,)
... confidence = face.confidence # float
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
... # Can pass landmarks directly to recognition
... embedding = recognizer.get_normalized_embedding(image, face.landmarks)
"""
original_height, original_width = image.shape[:2]
@@ -219,11 +248,11 @@ class SCRFD(BaseDetector):
pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)
pre_det = pre_det[order, :]
keep = non_max_supression(pre_det, threshold=self.nms_thresh)
keep = non_max_suppression(pre_det, threshold=self.nms_thresh)
detections = pre_det[keep, :]
landmarks = landmarks[order, :, :]
landmarks = landmarks[keep, :, :].astype(np.int32)
landmarks = landmarks[keep, :, :].astype(np.float32)
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
@@ -240,7 +269,7 @@ class SCRFD(BaseDetector):
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == "max":
if metric == 'max':
values = area
else:
values = area - offset_dist_squared * center_weight
@@ -252,12 +281,12 @@ class SCRFD(BaseDetector):
faces = []
for i in range(detections.shape[0]):
face_dict = {
"bbox": detections[i, :4].astype(float).tolist(),
"confidence": detections[i, 4].item(),
"landmarks": landmarks[i].astype(float).tolist(),
}
faces.append(face_dict)
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces
@@ -266,7 +295,7 @@ class SCRFD(BaseDetector):
def draw_bbox(frame, bbox, score, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = map(int, bbox) # Unpack 4 bbox values
cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness)
cv2.putText(frame, f"{score:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
cv2.putText(frame, f'{score:.2f}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
@@ -274,21 +303,21 @@ def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
cv2.circle(frame, (int(x), int(y)), radius, color, -1)
if __name__ == "__main__":
if __name__ == '__main__':
detector = SCRFD(model_name=SCRFDWeights.SCRFD_500M_KPS)
print(detector.get_info())
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Failed to open webcam.")
print('Failed to open webcam.')
exit()
print("📷 Webcam started. Press 'q' to exit.")
print("Webcam started. Press 'q' to exit.")
while True:
ret, frame = cap.read()
if not ret:
print("Failed to read frame.")
print('Failed to read frame.')
break
# Get face detections as list of dictionaries
@@ -296,25 +325,26 @@ if __name__ == "__main__":
# Process each detected face
for face in faces:
# Extract bbox and landmarks from dictionary
bbox = face["bbox"] # [x1, y1, x2, y2]
landmarks = face["landmarks"] # [[x1, y1], [x2, y2], ...]
confidence = face["confidence"]
# Extract bbox and landmarks from Face object
draw_bbox(frame, face.bbox, face.confidence)
# Pass bbox and confidence separately
draw_bbox(frame, bbox, confidence)
# Convert landmarks to numpy array format if needed
if landmarks is not None and len(landmarks) > 0:
# Convert list of [x, y] pairs to numpy array
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
draw_keypoints(frame, points)
# Draw landmarks if available
if face.landmarks is not None and len(face.landmarks) > 0:
draw_keypoints(frame, face.landmarks)
# Display face count
cv2.putText(frame, f"Faces: {len(faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.putText(
frame,
f'Faces: {len(faces)}',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(255, 255, 255),
2,
)
cv2.imshow("FaceDetection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
cv2.imshow('FaceDetection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()

View File

@@ -1,232 +0,0 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import math
import itertools
import numpy as np
from typing import Tuple, List
def resize_image(frame, target_shape: Tuple[int, int] = (640, 640)) -> Tuple[np.ndarray, float]:
"""
Resize an image to fit within a target shape while keeping its aspect ratio.
Args:
frame (np.ndarray): Input image.
target_shape (Tuple[int, int]): Target size (width, height). Defaults to (640, 640).
Returns:
Tuple[np.ndarray, float]: Resized image on a blank canvas and the resize factor.
"""
width, height = target_shape
# Aspect-ratio preserving resize
im_ratio = float(frame.shape[0]) / frame.shape[1]
model_ratio = height / width
if im_ratio > model_ratio:
new_height = height
new_width = int(new_height / im_ratio)
else:
new_width = width
new_height = int(new_width * im_ratio)
resize_factor = float(new_height) / frame.shape[0]
resized_frame = cv2.resize(frame, (new_width, new_height))
# Create blank image and place resized image on it
image = np.zeros((height, width, 3), dtype=np.uint8)
image[:new_height, :new_width, :] = resized_frame
return image, resize_factor
def generate_anchors(image_size: Tuple[int, int] = (640, 640)) -> np.ndarray:
"""
Generate anchor boxes for a given image size.
Args:
image_size (Tuple[int, int]): Input image size (width, height). Defaults to (640, 640).
Returns:
np.ndarray: Anchor box coordinates as a NumPy array.
"""
image_size = image_size
steps = [8, 16, 32]
min_sizes = [[16, 32], [64, 128], [256, 512]]
anchors = []
feature_maps = [
[
math.ceil(image_size[0] / step),
math.ceil(image_size[1] / step)
] for step in steps
]
for k, (map_height, map_width) in enumerate(feature_maps):
step = steps[k]
for i, j in itertools.product(range(map_height), range(map_width)):
for min_size in min_sizes[k]:
s_kx = min_size / image_size[1]
s_ky = min_size / image_size[0]
dense_cx = [x * step / image_size[1] for x in [j + 0.5]]
dense_cy = [y * step / image_size[0] for y in [i + 0.5]]
for cy, cx in itertools.product(dense_cy, dense_cx):
anchors += [cx, cy, s_kx, s_ky]
output = np.array(anchors, dtype=np.float32).reshape(-1, 4)
return output
def non_max_supression(dets: List[np.ndarray], threshold: float):
"""
Apply Non-Maximum Suppression (NMS) to reduce overlapping bounding boxes based on a threshold.
Args:
dets (numpy.ndarray): Array of detections with each row as [x1, y1, x2, y2, score].
threshold (float): IoU threshold for suppression.
Returns:
list: Indices of bounding boxes retained after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= threshold)[0]
order = order[inds + 1]
return keep
def decode_boxes(loc, priors, variances=[0.1, 0.2]) -> np.ndarray:
"""
Decode locations from predictions using priors to undo
the encoding done for offset regression at train time.
Args:
loc (np.ndarray): Location predictions for loc layers, shape: [num_priors, 4]
priors (np.ndarray): Prior boxes in center-offset form, shape: [num_priors, 4]
variances (list[float]): Variances of prior boxes
Returns:
np.ndarray: Decoded bounding box predictions
"""
# Compute centers of predicted boxes
cxcy = priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:]
# Compute widths and heights of predicted boxes
wh = priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])
# Convert center, size to corner coordinates
boxes = np.zeros_like(loc)
boxes[:, :2] = cxcy - wh / 2 # xmin, ymin
boxes[:, 2:] = cxcy + wh / 2 # xmax, ymax
return boxes
def decode_landmarks(predictions, priors, variances=[0.1, 0.2]) -> np.ndarray:
"""
Decode landmark predictions using prior boxes.
Args:
predictions (np.ndarray): Landmark predictions, shape: [num_priors, 10]
priors (np.ndarray): Prior boxes, shape: [num_priors, 4]
variances (list): Scaling factors for landmark offsets.
Returns:
np.ndarray: Decoded landmarks, shape: [num_priors, 10]
"""
# Reshape predictions to [num_priors, 5, 2] to process landmark points
predictions = predictions.reshape(predictions.shape[0], 5, 2)
# Expand priors to match (num_priors, 5, 2)
priors_xy = np.repeat(priors[:, :2][:, np.newaxis, :], 5, axis=1) # (num_priors, 5, 2)
priors_wh = np.repeat(priors[:, 2:][:, np.newaxis, :], 5, axis=1) # (num_priors, 5, 2)
# Compute absolute landmark positions
landmarks = priors_xy + predictions * variances[0] * priors_wh
# Flatten back to [num_priors, 10]
landmarks = landmarks.reshape(landmarks.shape[0], -1)
return landmarks
def distance2bbox(points, distance, max_shape=None):
"""Decode distance prediction to bounding box.
Args:
points (Tensor): Shape (n, 2), [x, y].
distance (Tensor): Distance from the given point to 4
boundaries (left, top, right, bottom).
max_shape (tuple): Shape of the image.
Returns:
Tensor: Decoded bounding boxes with shape (n, 4).
"""
x1 = points[:, 0] - distance[:, 0]
y1 = points[:, 1] - distance[:, 1]
x2 = points[:, 0] + distance[:, 2]
y2 = points[:, 1] + distance[:, 3]
if max_shape is not None:
x1 = x1.clamp(min=0, max=max_shape[1])
y1 = y1.clamp(min=0, max=max_shape[0])
x2 = x2.clamp(min=0, max=max_shape[1])
y2 = y2.clamp(min=0, max=max_shape[0])
else:
x1 = np.maximum(x1, 0)
y1 = np.maximum(y1, 0)
x2 = np.maximum(x2, 0)
y2 = np.maximum(y2, 0)
return np.stack([x1, y1, x2, y2], axis=-1)
def distance2kps(points, distance, max_shape=None):
"""Decode distance prediction to keypoints.
Args:
points (Tensor): Shape (n, 2), [x, y].
distance (Tensor): Distance from the given point to 4
boundaries (left, top, right, bottom).
max_shape (tuple): Shape of the image.
Returns:
Tensor: Decoded keypoints with shape (n, 2k).
"""
preds = []
for i in range(0, distance.shape[1], 2):
px = points[:, i % 2] + distance[:, i]
py = points[:, i % 2 + 1] + distance[:, i + 1]
if max_shape is not None:
px = px.clamp(min=0, max=max_shape[1])
py = py.clamp(min=0, max=max_shape[0])
preds.append(px)
preds.append(py)
return np.stack(preds, axis=-1)

341
uniface/detection/yolov5.py Normal file
View File

@@ -0,0 +1,341 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Any, List, Literal, Tuple
import cv2
import numpy as np
from uniface.common import non_max_suppression
from uniface.constants import YOLOv5FaceWeights
from uniface.face import Face
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseDetector
__all__ = ['YOLOv5Face']
class YOLOv5Face(BaseDetector):
"""
Face detector based on the YOLOv5-Face architecture.
Title: "YOLO5Face: Why Reinventing a Face Detector"
Paper: https://arxiv.org/abs/2105.12931
Code: https://github.com/yakhyo/yolov5-face-onnx-inference (ONNX inference implementation)
Args:
model_name (YOLOv5FaceWeights): Predefined model enum (e.g., `YOLOV5S`).
Specifies the YOLOv5-Face variant to load. Defaults to YOLOV5S.
conf_thresh (float): Confidence threshold for filtering detections. Defaults to 0.6.
nms_thresh (float): Non-Maximum Suppression threshold. Defaults to 0.5.
input_size (int): Input image size. Defaults to 640.
Note: ONNX model is fixed at 640. Changing this will cause inference errors.
**kwargs: Advanced options:
max_det (int): Maximum number of detections to return. Defaults to 750.
Attributes:
model_name (YOLOv5FaceWeights): Selected model variant.
conf_thresh (float): Threshold used to filter low-confidence detections.
nms_thresh (float): Threshold used during NMS to suppress overlapping boxes.
input_size (int): Image size to which inputs are resized before inference.
max_det (int): Maximum number of detections to return.
_model_path (str): Absolute path to the downloaded/verified model weights.
Raises:
ValueError: If the model weights are invalid or not found.
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(
self,
*,
model_name: YOLOv5FaceWeights = YOLOv5FaceWeights.YOLOV5S,
conf_thresh: float = 0.6,
nms_thresh: float = 0.5,
input_size: int = 640,
**kwargs: Any,
) -> None:
super().__init__(
model_name=model_name,
conf_thresh=conf_thresh,
nms_thresh=nms_thresh,
input_size=input_size,
**kwargs,
)
self._supports_landmarks = True # YOLOv5-Face supports landmarks
# Validate input size
if input_size != 640:
raise ValueError(
f'YOLOv5Face only supports input_size=640 (got {input_size}). The ONNX model has a fixed input shape.'
)
self.model_name = model_name
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
self.input_size = input_size
# Advanced options from kwargs
self.max_det = kwargs.get('max_det', 750)
Logger.info(
f'Initializing YOLOv5Face with model={self.model_name}, conf_thresh={self.conf_thresh}, '
f'nms_thresh={self.nms_thresh}, input_size={self.input_size}'
)
# Get path to model weights
self._model_path = verify_model_weights(self.model_name)
Logger.info(f'Verified model weights located at: {self._model_path}')
# Initialize model
self._initialize_model(self._model_path)
def _initialize_model(self, model_path: str) -> None:
"""
Initializes an ONNX model session from the given path.
Args:
model_path (str): The file path to the ONNX model.
Raises:
RuntimeError: If the model fails to load, logs an error and raises an exception.
"""
try:
self.session = create_onnx_session(model_path)
self.input_names = self.session.get_inputs()[0].name
self.output_names = [x.name for x in self.session.get_outputs()]
Logger.info(f'Successfully initialized the model from {model_path}')
except Exception as e:
Logger.error(f"Failed to load model from '{model_path}': {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize model session for '{model_path}'") from e
def preprocess(self, image: np.ndarray) -> Tuple[np.ndarray, float, Tuple[int, int]]:
"""
Preprocess image for inference.
Args:
image (np.ndarray): Input image (BGR format)
Returns:
Tuple[np.ndarray, float, Tuple[int, int]]: Preprocessed image, scale ratio, and padding
"""
# Get original image shape
img_h, img_w = image.shape[:2]
# Calculate scale ratio
scale = min(self.input_size / img_h, self.input_size / img_w)
new_h, new_w = int(img_h * scale), int(img_w * scale)
# Resize image
img_resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
# Create padded image
img_padded = np.full((self.input_size, self.input_size, 3), 114, dtype=np.uint8)
# Calculate padding
pad_h = (self.input_size - new_h) // 2
pad_w = (self.input_size - new_w) // 2
# Place resized image in center
img_padded[pad_h : pad_h + new_h, pad_w : pad_w + new_w] = img_resized
# Convert to RGB and normalize
img_rgb = cv2.cvtColor(img_padded, cv2.COLOR_BGR2RGB)
img_normalized = img_rgb.astype(np.float32) / 255.0
# Transpose to CHW format (HWC -> CHW) and add batch dimension
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
img_batch = np.ascontiguousarray(img_batch)
return img_batch, scale, (pad_w, pad_h)
def inference(self, input_tensor: np.ndarray) -> List[np.ndarray]:
"""Perform model inference on the preprocessed image tensor.
Args:
input_tensor (np.ndarray): Preprocessed input tensor.
Returns:
List[np.ndarray]: Raw model outputs.
"""
return self.session.run(self.output_names, {self.input_names: input_tensor})
def postprocess(
self,
predictions: np.ndarray,
scale: float,
padding: Tuple[int, int],
) -> Tuple[np.ndarray, np.ndarray]:
"""
Postprocess model predictions.
Args:
predictions (np.ndarray): Raw model output
scale (float): Scale ratio used in preprocessing
padding (Tuple[int, int]): Padding used in preprocessing
Returns:
Tuple[np.ndarray, np.ndarray]: Filtered detections and landmarks
- detections: [x1, y1, x2, y2, conf]
- landmarks: [5, 2] for each detection
"""
# predictions shape: (1, 25200, 16)
# 16 = [x, y, w, h, obj_conf, cls_conf, 10 landmarks (5 points * 2 coords)]
predictions = predictions[0] # Remove batch dimension
# Filter by confidence
mask = predictions[:, 4] >= self.conf_thresh
predictions = predictions[mask]
if len(predictions) == 0:
return np.array([]), np.array([])
# Convert from xywh to xyxy
boxes = self._xywh2xyxy(predictions[:, :4])
# Get confidence scores
scores = predictions[:, 4]
# Get landmarks (5 points, 10 coordinates)
landmarks = predictions[:, 5:15].copy()
# Apply NMS
detections_for_nms = np.hstack((boxes, scores[:, None])).astype(np.float32, copy=False)
keep = non_max_suppression(detections_for_nms, self.nms_thresh)
if len(keep) == 0:
return np.array([]), np.array([])
# Filter detections and limit to max_det
keep = keep[: self.max_det]
boxes = boxes[keep]
scores = scores[keep]
landmarks = landmarks[keep]
# Scale back to original image coordinates
pad_w, pad_h = padding
boxes[:, [0, 2]] = (boxes[:, [0, 2]] - pad_w) / scale
boxes[:, [1, 3]] = (boxes[:, [1, 3]] - pad_h) / scale
# Scale landmarks
for i in range(5):
landmarks[:, i * 2] = (landmarks[:, i * 2] - pad_w) / scale
landmarks[:, i * 2 + 1] = (landmarks[:, i * 2 + 1] - pad_h) / scale
# Reshape landmarks to (N, 5, 2)
landmarks = landmarks.reshape(-1, 5, 2)
# Combine results
detections = np.concatenate([boxes, scores[:, None]], axis=1)
return detections, landmarks
def _xywh2xyxy(self, x: np.ndarray) -> np.ndarray:
"""
Convert bounding box format from xywh to xyxy.
Args:
x (np.ndarray): Boxes in [x, y, w, h] format
Returns:
np.ndarray: Boxes in [x1, y1, x2, y2] format
"""
y = np.copy(x)
y[..., 0] = x[..., 0] - x[..., 2] / 2 # x1
y[..., 1] = x[..., 1] - x[..., 3] / 2 # y1
y[..., 2] = x[..., 0] + x[..., 2] / 2 # x2
y[..., 3] = x[..., 1] + x[..., 3] / 2 # y2
return y
def detect(
self,
image: np.ndarray,
*,
max_num: int = 0,
metric: Literal['default', 'max'] = 'max',
center_weight: float = 2.0,
) -> List[Face]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
Args:
image (np.ndarray): Input image as a NumPy array of shape (H, W, C).
max_num (int): Maximum number of detections to return. Use 0 to return all detections. Defaults to 0.
metric (Literal["default", "max"]): Metric for ranking detections when `max_num` is limited.
- "default": Prioritize detections closer to the image center.
- "max": Prioritize detections with larger bounding box areas.
center_weight (float): Weight for penalizing detections farther from the image center
when using the "default" metric. Defaults to 2.0.
Returns:
List[Face]: List of Face objects, each containing:
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
- confidence (float): Detection confidence score (0.0 to 1.0)
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
Example:
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face.bbox # np.ndarray with shape (4,)
... confidence = face.confidence # float
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
... # Can pass landmarks directly to recognition
... embedding = recognizer.get_normalized_embedding(image, face.landmarks)
"""
original_height, original_width = image.shape[:2]
# Preprocess
image_tensor, scale, padding = self.preprocess(image)
# ONNXRuntime inference
outputs = self.inference(image_tensor)
# Postprocess
detections, landmarks = self.postprocess(outputs[0], scale, padding)
# Handle case when no faces are detected
if len(detections) == 0:
return []
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
# Calculate scores based on the chosen metric
offset_dist_squared = np.sum(np.power(offsets, 2.0), axis=0)
if metric == 'max':
values = area
else:
values = area - offset_dist_squared * center_weight
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
faces = []
for i in range(detections.shape[0]):
face = Face(
bbox=detections[i, :4],
confidence=float(detections[i, 4]),
landmarks=landmarks[i],
)
faces.append(face)
return faces

66
uniface/face.py Normal file
View File

@@ -0,0 +1,66 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from dataclasses import asdict, dataclass
from typing import Optional
import numpy as np
from uniface.face_utils import compute_similarity
__all__ = ['Face']
@dataclass
class Face:
"""
Detected face with analysis results.
"""
# Required attributes
bbox: np.ndarray
confidence: float
landmarks: np.ndarray
# Optional attributes
embedding: Optional[np.ndarray] = None
age: Optional[int] = None
gender: Optional[int] = None # 0 or 1
def compute_similarity(self, other: 'Face') -> float:
"""Compute cosine similarity with another face."""
if self.embedding is None or other.embedding is None:
raise ValueError('Both faces must have embeddings for similarity computation')
return float(compute_similarity(self.embedding, other.embedding))
def to_dict(self) -> dict:
"""Convert to dictionary."""
return asdict(self)
@property
def sex(self) -> str:
"""Get gender as a string label (Female or Male)."""
if self.gender is None:
return None
return 'Female' if self.gender == 0 else 'Male'
@property
def bbox_xyxy(self) -> np.ndarray:
"""Get bounding box coordinates in (x1, y1, x2, y2) format."""
return self.bbox.copy()
@property
def bbox_xywh(self) -> np.ndarray:
"""Get bounding box coordinates in (x1, y1, w, h) format."""
return np.array([self.bbox[0], self.bbox[1], self.bbox[2] - self.bbox[0], self.bbox[3] - self.bbox[1]])
def __repr__(self) -> str:
parts = [f'Face(confidence={self.confidence:.3f}']
if self.age is not None:
parts.append(f'age={self.age}')
if self.gender is not None:
parts.append(f'sex={self.sex}')
if self.embedding is not None:
parts.append(f'embedding_dim={self.embedding.shape[0]}')
return ', '.join(parts) + ')'

View File

@@ -2,13 +2,18 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Tuple, Union
import cv2
import numpy as np
from skimage.transform import SimilarityTransform
from typing import Tuple
__all__ = ["face_alignment", "compute_similarity", "bbox_center_alignment", "transform_points_2d"]
__all__ = [
'face_alignment',
'compute_similarity',
'bbox_center_alignment',
'transform_points_2d',
]
# Reference alignment for facial landmarks (ArcFace)
@@ -18,19 +23,20 @@ reference_alignment: np.ndarray = np.array(
[73.5318, 51.5014],
[56.0252, 71.7366],
[41.5493, 92.3655],
[70.7299, 92.2041]
[70.7299, 92.2041],
],
dtype=np.float32
dtype=np.float32,
)
def estimate_norm(landmark: np.ndarray, image_size: int = 112) -> Tuple[np.ndarray, np.ndarray]:
def estimate_norm(landmark: np.ndarray, image_size: Union[int, Tuple[int, int]] = 112) -> Tuple[np.ndarray, np.ndarray]:
"""
Estimate the normalization transformation matrix for facial landmarks.
Args:
landmark (np.ndarray): Array of shape (5, 2) representing the coordinates of the facial landmarks.
image_size (int, optional): The size of the output image. Default is 112.
image_size (Union[int, Tuple[int, int]], optional): The size of the output image.
Can be an integer (for square images) or a tuple (width, height). Default is 112.
Returns:
np.ndarray: The 2x3 transformation matrix for aligning the landmarks.
@@ -40,14 +46,21 @@ def estimate_norm(landmark: np.ndarray, image_size: int = 112) -> Tuple[np.ndarr
AssertionError: If the input landmark array does not have the shape (5, 2)
or if image_size is not a multiple of 112 or 128.
"""
assert landmark.shape == (5, 2), "Landmark array must have shape (5, 2)."
assert image_size % 112 == 0 or image_size % 128 == 0, "Image size must be a multiple of 112 or 128."
assert landmark.shape == (5, 2), 'Landmark array must have shape (5, 2).'
if image_size % 112 == 0:
ratio = float(image_size) / 112.0
# Handle both int and tuple inputs
if isinstance(image_size, tuple):
size = image_size[0] # Use width for ratio calculation
else:
size = image_size
assert size % 112 == 0 or size % 128 == 0, 'Image size must be a multiple of 112 or 128.'
if size % 112 == 0:
ratio = float(size) / 112.0
diff_x = 0.0
else:
ratio = float(image_size) / 128.0
ratio = float(size) / 128.0
diff_x = 8.0 * ratio
# Adjust reference alignment based on ratio and diff_x
@@ -64,14 +77,19 @@ def estimate_norm(landmark: np.ndarray, image_size: int = 112) -> Tuple[np.ndarr
return matrix, inverse_matrix
def face_alignment(image: np.ndarray, landmark: np.ndarray, image_size: int = 112) -> Tuple[np.ndarray, np.ndarray]:
def face_alignment(
image: np.ndarray,
landmark: np.ndarray,
image_size: Union[int, Tuple[int, int]] = 112,
) -> Tuple[np.ndarray, np.ndarray]:
"""
Align the face in the input image based on the given facial landmarks.
Args:
image (np.ndarray): Input image as a NumPy array.
landmark (np.ndarray): Array of shape (5, 2) representing the coordinates of the facial landmarks.
image_size (int, optional): The size of the aligned output image. Default is 112.
image_size (Union[int, Tuple[int, int]], optional): The size of the aligned output image.
Can be an integer (for square images) or a tuple (width, height). Default is 112.
Returns:
np.ndarray: The aligned face as a NumPy array.
@@ -80,8 +98,14 @@ def face_alignment(image: np.ndarray, landmark: np.ndarray, image_size: int = 11
# Get the transformation matrix
M, M_inv = estimate_norm(landmark, image_size)
# Handle both int and tuple for warpAffine output size
if isinstance(image_size, int):
output_size = (image_size, image_size)
else:
output_size = image_size
# Warp the input image to align the face
warped = cv2.warpAffine(image, M, (image_size, image_size), borderValue=0.0)
warped = cv2.warpAffine(image, M, output_size, borderValue=0.0)
return warped, M_inv

54
uniface/gaze/__init__.py Normal file
View File

@@ -0,0 +1,54 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from .base import BaseGazeEstimator
from .models import MobileGaze
def create_gaze_estimator(method: str = 'mobilegaze', **kwargs) -> BaseGazeEstimator:
"""
Factory function to create gaze estimators.
This function initializes and returns a gaze estimator instance based on the
specified method. It acts as a high-level interface to the underlying
model classes.
Args:
method (str): The gaze estimation method to use.
Options: 'mobilegaze' (default).
**kwargs: Model-specific parameters passed to the estimator's constructor.
For example, `model_name` can be used to select a specific
backbone from `GazeWeights` enum (RESNET18, RESNET34, RESNET50,
MOBILENET_V2, MOBILEONE_S0).
Returns:
BaseGazeEstimator: An initialized gaze estimator instance ready for use.
Raises:
ValueError: If the specified `method` is not supported.
Examples:
>>> # Create the default MobileGaze estimator (ResNet18 backbone)
>>> estimator = create_gaze_estimator()
>>> # Create with MobileNetV2 backbone
>>> from uniface.constants import GazeWeights
>>> estimator = create_gaze_estimator(
... 'mobilegaze',
... model_name=GazeWeights.MOBILENET_V2
... )
>>> # Use the estimator
>>> pitch, yaw = estimator.estimate(face_crop)
"""
method = method.lower()
if method in ('mobilegaze', 'mobile_gaze', 'gaze'):
return MobileGaze(**kwargs)
else:
available = ['mobilegaze']
raise ValueError(f"Unsupported gaze estimation method: '{method}'. Available: {available}")
__all__ = ['create_gaze_estimator', 'MobileGaze', 'BaseGazeEstimator']

108
uniface/gaze/base.py Normal file
View File

@@ -0,0 +1,108 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from typing import Tuple
import numpy as np
class BaseGazeEstimator(ABC):
"""
Abstract base class for all gaze estimation models.
This class defines the common interface that all gaze estimators must implement,
ensuring consistency across different gaze estimation methods. Gaze estimation
predicts the direction a person is looking based on their face image.
The gaze direction is represented as pitch and yaw angles in radians:
- Pitch: Vertical angle (positive = looking up, negative = looking down)
- Yaw: Horizontal angle (positive = looking right, negative = looking left)
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess the input face image for model inference.
This method should take a raw face crop and convert it into the format
expected by the model's inference engine (e.g., normalized tensor).
Args:
face_image (np.ndarray): A cropped face image in BGR format with
shape (H, W, C).
Returns:
np.ndarray: The preprocessed image tensor ready for inference,
typically with shape (1, C, H, W).
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, outputs: Tuple[np.ndarray, np.ndarray]) -> Tuple[float, float]:
"""
Postprocess raw model outputs into gaze angles.
This method takes the raw output from the model's inference and
converts it into pitch and yaw angles in radians.
Args:
outputs: Raw outputs from the model inference. The format depends
on the specific model architecture.
Returns:
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def estimate(self, face_image: np.ndarray) -> Tuple[float, float]:
"""
Perform end-to-end gaze estimation on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the gaze direction.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
The face should be roughly centered and
well-framed within the image.
Returns:
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians:
- pitch: Vertical gaze angle (positive = up, negative = down)
- yaw: Horizontal gaze angle (positive = right, negative = left)
Example:
>>> estimator = create_gaze_estimator()
>>> pitch, yaw = estimator.estimate(face_crop)
>>> print(f"Looking: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
"""
raise NotImplementedError('Subclasses must implement the estimate method.')
def __call__(self, face_image: np.ndarray) -> Tuple[float, float]:
"""
Provides a convenient, callable shortcut for the `estimate` method.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
Returns:
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians.
"""
return self.estimate(face_image)

187
uniface/gaze/models.py Normal file
View File

@@ -0,0 +1,187 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Tuple
import cv2
import numpy as np
from uniface.constants import GazeWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseGazeEstimator
__all__ = ['MobileGaze']
class MobileGaze(BaseGazeEstimator):
"""
MobileGaze: Real-Time Gaze Estimation with ONNX Runtime.
MobileGaze is a gaze estimation model that predicts gaze direction from a single
face image. It supports multiple backbone architectures including ResNet 18/34/50,
MobileNetV2, and MobileOne S0. The model uses a classification approach with binned
angles, which are then decoded to continuous pitch and yaw values.
The model outputs gaze direction as pitch (vertical) and yaw (horizontal) angles
in radians.
Reference:
https://github.com/yakhyo/gaze-estimation
Args:
model_name (GazeWeights): The enum specifying the gaze model backbone to load.
Options: RESNET18, RESNET34, RESNET50, MOBILENET_V2, MOBILEONE_S0.
Defaults to `GazeWeights.RESNET18`.
input_size (Tuple[int, int]): The resolution (width, height) for the model's
input. Defaults to (448, 448).
Attributes:
input_size (Tuple[int, int]): Model input dimensions.
input_mean (list): Per-channel mean values for normalization (ImageNet).
input_std (list): Per-channel std values for normalization (ImageNet).
Example:
>>> from uniface.gaze import MobileGaze
>>> from uniface import RetinaFace
>>>
>>> detector = RetinaFace()
>>> gaze_estimator = MobileGaze()
>>>
>>> # Detect faces and estimate gaze for each
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face['bbox']
... x1, y1, x2, y2 = map(int, bbox[:4])
... face_crop = image[y1:y2, x1:x2]
... pitch, yaw = gaze_estimator.estimate(face_crop)
... print(f"Gaze: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
"""
def __init__(
self,
model_name: GazeWeights = GazeWeights.RESNET34,
input_size: Tuple[int, int] = (448, 448),
) -> None:
Logger.info(f'Initializing MobileGaze with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.input_mean = [0.485, 0.456, 0.406]
self.input_std = [0.229, 0.224, 0.225]
# Model specific parameters for bin-based classification (Gaze360 config)
self._bins = 90
self._binwidth = 4
self._angle_offset = 180
self._idx_tensor = np.arange(self._bins, dtype=np.float32)
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get input configuration
input_cfg = self.session.get_inputs()[0]
input_shape = input_cfg.shape
self.input_name = input_cfg.name
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
# Get output configuration
outputs = self.session.get_outputs()
self.output_names = [output.name for output in outputs]
if len(self.output_names) != 2:
raise ValueError(f'Expected 2 output nodes (pitch, yaw), got {len(self.output_names)}')
Logger.info(f'MobileGaze initialized with input size {self.input_size}')
except Exception as e:
Logger.error(f"Failed to load gaze model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize gaze model: {e}') from e
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess a face crop for gaze estimation.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
"""
# Convert BGR to RGB
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
# Resize to model input size
image = cv2.resize(image, self.input_size)
# Normalize to [0, 1] and apply normalization
image = image.astype(np.float32) / 255.0
mean = np.array(self.input_mean, dtype=np.float32)
std = np.array(self.input_std, dtype=np.float32)
image = (image - mean) / std
# HWC -> CHW -> NCHW
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0).astype(np.float32)
return image
def _softmax(self, x: np.ndarray) -> np.ndarray:
"""Apply softmax along axis 1."""
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return e_x / e_x.sum(axis=1, keepdims=True)
def postprocess(self, outputs: Tuple[np.ndarray, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
"""
Postprocess raw model outputs into gaze angles.
This method takes the raw output from the model's inference and
converts it into pitch and yaw angles in radians.
Args:
outputs: Raw outputs from the model inference. The format depends
on the specific model architecture.
Returns:
Tuple[np.ndarray, np.ndarray]: A tuple of (pitch, yaw) angles in radians.
"""
pitch_logits, yaw_logits = outputs
# Convert logits to probabilities
pitch_probs = self._softmax(pitch_logits)
yaw_probs = self._softmax(yaw_logits)
# Compute expected bin index (soft-argmax)
pitch_deg = np.sum(pitch_probs * self._idx_tensor, axis=1) * self._binwidth - self._angle_offset
yaw_deg = np.sum(yaw_probs * self._idx_tensor, axis=1) * self._binwidth - self._angle_offset
# Convert degrees to radians
pitch = np.radians(pitch_deg[0])
yaw = np.radians(yaw_deg[0])
return pitch, yaw
def estimate(self, face_image: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""
Perform end-to-end gaze estimation on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the gaze direction.
"""
input_tensor = self.preprocess(face_image)
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
pitch, yaw = self.postprocess((outputs[0], outputs[1]))
return pitch, yaw

View File

@@ -2,8 +2,8 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from .models import Landmark106
from .base import BaseLandmarker
from .models import Landmark106
def create_landmarker(method: str = '2d106det', **kwargs) -> BaseLandmarker:
@@ -25,8 +25,4 @@ def create_landmarker(method: str = '2d106det', **kwargs) -> BaseLandmarker:
raise ValueError(f"Unsupported method: '{method}'. Available: {available}")
__all__ = [
"create_landmarker",
"Landmark106",
"BaseLandmarker"
]
__all__ = ['create_landmarker', 'Landmark106', 'BaseLandmarker']

View File

@@ -3,6 +3,7 @@
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
import numpy as np
@@ -10,6 +11,7 @@ class BaseLandmarker(ABC):
"""
Abstract Base Class for all facial landmark models.
"""
@abstractmethod
def get_landmarks(self, image: np.ndarray, bbox: np.ndarray) -> np.ndarray:
"""

View File

@@ -2,18 +2,20 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
from typing import Tuple
from uniface.log import Logger
import cv2
import numpy as np
from uniface.constants import LandmarkWeights
from uniface.model_store import verify_model_weights
from uniface.face_utils import bbox_center_alignment, transform_points_2d
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseLandmarker
__all__ = ['Landmark']
__all__ = ['Landmark106']
class Landmark106(BaseLandmarker):
@@ -40,15 +42,13 @@ class Landmark106(BaseLandmarker):
>>> print(landmarks.shape)
(106, 2)
"""
def __init__(
self,
model_name: LandmarkWeights = LandmarkWeights.DEFAULT,
input_size: Tuple[int, int] = (192, 192)
input_size: Tuple[int, int] = (192, 192),
) -> None:
Logger.info(
f"Initializing Facial Landmark with model={model_name}, "
f"input_size={input_size}"
)
Logger.info(f'Initializing Facial Landmark with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.input_std = 1.0
self.input_mean = 0.0
@@ -79,11 +79,11 @@ class Landmark106(BaseLandmarker):
self.lmk_dim = 2 # x,y coordinates
self.lmk_num = output_shape[1] // self.lmk_dim # Number of landmarks
Logger.info(f"Model initialized with {self.lmk_num} landmarks")
Logger.info(f'Model initialized with {self.lmk_num} landmarks')
except Exception as e:
Logger.error(f"Failed to load landmark model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f"Failed to initialize landmark model: {e}")
raise RuntimeError(f'Failed to initialize landmark model: {e}') from e
def preprocess(self, image: np.ndarray, bbox: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""Prepares a face crop for inference.
@@ -108,8 +108,11 @@ class Landmark106(BaseLandmarker):
aligned_face, transform_matrix = bbox_center_alignment(image, center, self.input_size[0], scale, 0.0)
face_blob = cv2.dnn.blobFromImage(
aligned_face, 1.0 / self.input_std, self.input_size,
(self.input_mean, self.input_mean, self.input_mean), swapRB=True
aligned_face,
1.0 / self.input_std,
self.input_size,
(self.input_mean, self.input_mean, self.input_mean),
swapRB=True,
)
return face_blob, transform_matrix
@@ -129,7 +132,7 @@ class Landmark106(BaseLandmarker):
"""
landmarks = predictions.reshape((-1, 2))
landmarks[:, 0:2] += 1
landmarks[:, 0:2] *= (self.input_size[0] // 2)
landmarks[:, 0:2] *= self.input_size[0] // 2
inverse_matrix = cv2.invertAffineTransform(transform_matrix)
landmarks = transform_points_2d(landmarks, inverse_matrix)
@@ -149,27 +152,22 @@ class Landmark106(BaseLandmarker):
np.ndarray: An array of predicted landmark points with shape (106, 2).
"""
face_blob, transform_matrix = self.preprocess(image, bbox)
raw_predictions = self.session.run(
self.output_names, {self.input_names[0]: face_blob}
)[0][0]
raw_predictions = self.session.run(self.output_names, {self.input_names[0]: face_blob})[0][0]
landmarks = self.postprocess(raw_predictions, transform_matrix)
return landmarks
# Testing code
if __name__ == '__main__':
from uniface.detection import RetinaFace
from uniface.landmark import Landmark106
# TODO: For testing purposes only, remote later
if __name__ == "__main__":
# UPDATED: Use the high-level factory functions
from uniface.detection import create_detector
from uniface.landmark import create_landmarker
# 1. Create the detector and landmarker using the new API
face_detector = create_detector('retinaface')
landmarker = create_landmarker() # Uses the default '2d106det' method
face_detector = RetinaFace()
landmarker = Landmark106()
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Webcam not available.")
print('Webcam not available.')
exit()
print("Press 'q' to quit.")
@@ -177,14 +175,14 @@ if __name__ == "__main__":
while True:
ret, frame = cap.read()
if not ret:
print("Frame capture failed.")
print('Frame capture failed.')
break
# 2. The detect method returns a list of dictionaries
faces = face_detector.detect(frame)
if not faces:
cv2.imshow("Facial Landmark Detection", frame)
cv2.imshow('Facial Landmark Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
continue
@@ -199,14 +197,14 @@ if __name__ == "__main__":
# --- Drawing Logic ---
# Draw the landmarks
for (x, y) in landmarks.astype(int):
for x, y in landmarks.astype(int):
cv2.circle(frame, (x, y), 2, (0, 255, 0), -1)
# Draw the bounding box
x1, y1, x2, y2 = map(int, bbox)
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
cv2.imshow("Facial Landmark Detection", frame)
cv2.imshow('Facial Landmark Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break

View File

@@ -1,7 +1,7 @@
import logging
# Create logger for uniface
Logger = logging.getLogger("uniface")
Logger = logging.getLogger('uniface')
Logger.setLevel(logging.WARNING) # Only show warnings/errors by default
Logger.addHandler(logging.NullHandler())
@@ -19,10 +19,7 @@ def enable_logging(level=logging.INFO):
"""
Logger.handlers.clear()
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter(
"%(asctime)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
))
handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S'))
Logger.addHandler(handler)
Logger.setLevel(level)
Logger.propagate = False

View File

@@ -2,14 +2,14 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import os
import hashlib
import os
import requests
from tqdm import tqdm
from uniface.log import Logger
import uniface.constants as const
from uniface.log import Logger
__all__ = ['verify_model_weights']
@@ -62,12 +62,12 @@ def verify_model_weights(model_name: str, root: str = '~/.uniface/models') -> st
Logger.info(f"Successfully downloaded '{model_name}' to {model_path}")
except Exception as e:
Logger.error(f"Failed to download model '{model_name}': {e}")
raise ConnectionError(f"Download failed for '{model_name}'")
raise ConnectionError(f"Download failed for '{model_name}'") from e
expected_hash = const.MODEL_SHA256.get(model_name)
if expected_hash and not verify_file_hash(model_path, expected_hash):
os.remove(model_path) # Remove corrupted file
Logger.warning("Corrupted weight detected. Removing...")
Logger.warning('Corrupted weight detected. Removing...')
raise ValueError(f"Hash mismatch for '{model_name}'. The file may be corrupted; please try downloading again.")
return model_path
@@ -78,33 +78,36 @@ def download_file(url: str, dest_path: str) -> None:
try:
response = requests.get(url, stream=True)
response.raise_for_status()
with open(dest_path, "wb") as file, tqdm(
desc=f"Downloading {dest_path}",
unit='B',
unit_scale=True,
unit_divisor=1024
) as progress:
with (
open(dest_path, 'wb') as file,
tqdm(
desc=f'Downloading {dest_path}',
unit='B',
unit_scale=True,
unit_divisor=1024,
) as progress,
):
for chunk in response.iter_content(chunk_size=const.CHUNK_SIZE):
if chunk:
file.write(chunk)
progress.update(len(chunk))
except requests.RequestException as e:
raise ConnectionError(f"Failed to download file from {url}. Error: {e}")
raise ConnectionError(f'Failed to download file from {url}. Error: {e}') from e
def verify_file_hash(file_path: str, expected_hash: str) -> bool:
"""Compute the SHA-256 hash of the file and compare it with the expected hash."""
file_hash = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(const.CHUNK_SIZE), b""):
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(const.CHUNK_SIZE), b''):
file_hash.update(chunk)
actual_hash = file_hash.hexdigest()
if actual_hash != expected_hash:
Logger.warning(f"Expected hash: {expected_hash}, but got: {actual_hash}")
Logger.warning(f'Expected hash: {expected_hash}, but got: {actual_hash}')
return actual_hash == expected_hash
if __name__ == "__main__":
if __name__ == '__main__':
model_names = [model.value for model in const.RetinaFaceWeights]
# Download each model in the list

View File

@@ -2,10 +2,6 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""
Utilities for ONNX Runtime configuration and provider selection.
"""
from typing import List
import onnxruntime as ort
@@ -35,19 +31,19 @@ def get_available_providers() -> List[str]:
providers = []
# Priority order: CoreML > CUDA > CPU
if "CoreMLExecutionProvider" in available:
providers.append("CoreMLExecutionProvider")
Logger.info("CoreML acceleration enabled (Apple Silicon)")
if 'CoreMLExecutionProvider' in available:
providers.append('CoreMLExecutionProvider')
Logger.info('CoreML acceleration enabled (Apple Silicon)')
if "CUDAExecutionProvider" in available:
providers.append("CUDAExecutionProvider")
Logger.info("CUDA acceleration enabled (NVIDIA GPU)")
if 'CUDAExecutionProvider' in available:
providers.append('CUDAExecutionProvider')
Logger.info('CUDA acceleration enabled (NVIDIA GPU)')
# CPU is always available as fallback
providers.append("CPUExecutionProvider")
providers.append('CPUExecutionProvider')
if len(providers) == 1:
Logger.info("Using CPU execution (no hardware acceleration detected)")
Logger.info('Using CPU execution (no hardware acceleration detected)')
return providers
@@ -77,11 +73,27 @@ def create_onnx_session(model_path: str, providers: List[str] = None) -> ort.Inf
if providers is None:
providers = get_available_providers()
# Suppress ONNX Runtime warnings (e.g., CoreML partition warnings)
# Log levels: 0=VERBOSE, 1=INFO, 2=WARNING, 3=ERROR, 4=FATAL
sess_options = ort.SessionOptions()
sess_options.log_severity_level = 3 # Only show ERROR and FATAL
try:
session = ort.InferenceSession(model_path, providers=providers)
session = ort.InferenceSession(model_path, sess_options=sess_options, providers=providers)
active_provider = session.get_providers()[0]
Logger.debug(f"Session created with provider: {active_provider}")
Logger.debug(f'Session created with provider: {active_provider}')
# Show user-friendly message about which provider is being used
provider_names = {
'CoreMLExecutionProvider': 'CoreML (Apple Silicon)',
'CUDAExecutionProvider': 'CUDA (NVIDIA GPU)',
'CPUExecutionProvider': 'CPU',
}
provider_display = provider_names.get(active_provider, active_provider)
Logger.debug(f'Model loaded with provider: {active_provider}')
print(f'✓ Model loaded ({provider_display})')
return session
except Exception as e:
Logger.error(f"Failed to create ONNX session: {e}", exc_info=True)
raise RuntimeError(f"Failed to initialize ONNX Runtime session: {e}") from e
Logger.error(f'Failed to create ONNX session: {e}', exc_info=True)
raise RuntimeError(f'Failed to initialize ONNX Runtime session: {e}') from e

View File

@@ -0,0 +1,61 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Union
from uniface.constants import ParsingWeights
from .base import BaseFaceParser
from .bisenet import BiSeNet
__all__ = ['BaseFaceParser', 'BiSeNet', 'create_face_parser']
def create_face_parser(
model_name: Union[str, ParsingWeights] = ParsingWeights.RESNET18,
) -> BaseFaceParser:
"""
Factory function to create a face parsing model instance.
This function provides a convenient way to instantiate face parsing models
without directly importing the specific model classes. It supports both
string-based and enum-based model selection.
Args:
model_name (Union[str, ParsingWeights]): The face parsing model to create.
Can be either a string or a ParsingWeights enum value.
Available options:
- 'parsing_resnet18' or ParsingWeights.RESNET18 (default)
- 'parsing_resnet34' or ParsingWeights.RESNET34
Returns:
BaseFaceParser: An instance of the requested face parsing model.
Raises:
ValueError: If the model_name is not recognized.
Examples:
>>> # Using enum
>>> from uniface.parsing import create_face_parser
>>> from uniface.constants import ParsingWeights
>>> parser = create_face_parser(ParsingWeights.RESNET18)
>>>
>>> # Using string
>>> parser = create_face_parser('parsing_resnet18')
>>>
>>> # Parse a face image
>>> mask = parser.parse(face_crop)
"""
# Convert string to enum if necessary
if isinstance(model_name, str):
try:
model_name = ParsingWeights(model_name)
except ValueError as e:
valid_models = [e.value for e in ParsingWeights]
raise ValueError(
f"Unknown face parsing model: '{model_name}'. Valid options are: {', '.join(valid_models)}"
) from e
# All parsing models use the same BiSeNet class
return BiSeNet(model_name=model_name)

106
uniface/parsing/base.py Normal file
View File

@@ -0,0 +1,106 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from typing import Tuple
import numpy as np
class BaseFaceParser(ABC):
"""
Abstract base class for all face parsing models.
This class defines the common interface that all face parsing models must implement,
ensuring consistency across different parsing methods. Face parsing segments a face
image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
The output is a segmentation mask where each pixel is assigned a class label
representing a facial component.
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess the input face image for model inference.
This method should take a raw face crop and convert it into the format
expected by the model's inference engine (e.g., normalized tensor).
Args:
face_image (np.ndarray): A face image in BGR format with
shape (H, W, C).
Returns:
np.ndarray: The preprocessed image tensor ready for inference,
typically with shape (1, C, H, W).
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
"""
Postprocess raw model outputs into a segmentation mask.
This method takes the raw output from the model's inference and
converts it into a segmentation mask at the original image size.
Args:
outputs (np.ndarray): Raw outputs from the model inference.
original_size (Tuple[int, int]): Original image size (width, height).
Returns:
np.ndarray: Segmentation mask with the same size as the original image.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def parse(self, face_image: np.ndarray) -> np.ndarray:
"""
Perform end-to-end face parsing on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the segmentation mask.
Args:
face_image (np.ndarray): A face image in BGR format.
The face should be roughly centered and
well-framed within the image.
Returns:
np.ndarray: Segmentation mask with the same size as input image,
where each pixel value represents a facial component class.
Example:
>>> parser = create_face_parser()
>>> mask = parser.parse(face_crop)
>>> print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
"""
raise NotImplementedError('Subclasses must implement the parse method.')
def __call__(self, face_image: np.ndarray) -> np.ndarray:
"""
Provides a convenient, callable shortcut for the `parse` method.
Args:
face_image (np.ndarray): A face image in BGR format.
Returns:
np.ndarray: Segmentation mask with the same size as input image.
"""
return self.parse(face_image)

166
uniface/parsing/bisenet.py Normal file
View File

@@ -0,0 +1,166 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Tuple
import cv2
import numpy as np
from uniface.constants import ParsingWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseFaceParser
__all__ = ['BiSeNet']
class BiSeNet(BaseFaceParser):
"""
BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
BiSeNet is a semantic segmentation model that segments a face image into
different facial components such as skin, eyes, nose, mouth, hair, etc. The model
uses a BiSeNet architecture with ResNet backbone and outputs a segmentation mask
where each pixel is assigned a class label.
The model supports 19 facial component classes including:
- Background, skin, eyebrows, eyes, nose, mouth, lips, ears, hair, etc.
Reference:
https://github.com/yakhyo/face-parsing
Args:
model_name (ParsingWeights): The enum specifying the parsing model to load.
Options: RESNET18, RESNET34.
Defaults to `ParsingWeights.RESNET18`.
input_size (Tuple[int, int]): The resolution (width, height) for the model's
input. Defaults to (512, 512).
Attributes:
input_size (Tuple[int, int]): Model input dimensions.
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
Example:
>>> from uniface.parsing import BiSeNet
>>> from uniface import RetinaFace
>>>
>>> detector = RetinaFace()
>>> parser = BiSeNet()
>>>
>>> # Detect faces and parse each face
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face['bbox']
... x1, y1, x2, y2 = map(int, bbox[:4])
... face_crop = image[y1:y2, x1:x2]
... mask = parser.parse(face_crop)
... print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
"""
def __init__(
self,
model_name: ParsingWeights = ParsingWeights.RESNET18,
input_size: Tuple[int, int] = (512, 512),
) -> None:
Logger.info(f'Initializing BiSeNet with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get input configuration
input_cfg = self.session.get_inputs()[0]
input_shape = input_cfg.shape
self.input_name = input_cfg.name
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
# Get output configuration
outputs = self.session.get_outputs()
self.output_names = [output.name for output in outputs]
Logger.info(f'BiSeNet initialized with input size {self.input_size}')
except Exception as e:
Logger.error(f"Failed to load parsing model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize parsing model: {e}') from e
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess a face image for parsing.
Args:
face_image (np.ndarray): A face image in BGR format.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
"""
# Convert BGR to RGB
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
# Resize to model input size
image = cv2.resize(image, self.input_size, interpolation=cv2.INTER_LINEAR)
# Normalize to [0, 1] and apply normalization
image = image.astype(np.float32) / 255.0
image = (image - self.input_mean) / self.input_std
# HWC -> CHW -> NCHW
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0).astype(np.float32)
return image
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
"""
Postprocess model output to segmentation mask.
Args:
outputs (np.ndarray): Raw model output.
original_size (Tuple[int, int]): Original image size (width, height).
Returns:
np.ndarray: Segmentation mask resized to original dimensions.
"""
# Get the class with highest probability for each pixel
predicted_mask = outputs.squeeze(0).argmax(0).astype(np.uint8)
# Resize back to original size
restored_mask = cv2.resize(predicted_mask, original_size, interpolation=cv2.INTER_NEAREST)
return restored_mask
def parse(self, face_image: np.ndarray) -> np.ndarray:
"""
Perform end-to-end face parsing on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the segmentation mask.
Args:
face_image (np.ndarray): A face image in BGR format.
Returns:
np.ndarray: Segmentation mask with the same size as input image.
"""
original_size = (face_image.shape[1], face_image.shape[0]) # (width, height)
input_tensor = self.preprocess(face_image)
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
return self.postprocess(outputs[0], original_size)

View File

@@ -0,0 +1,52 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Optional
import numpy as np
from .blur import BlurFace
def anonymize_faces(
image: np.ndarray,
detector: Optional[object] = None,
method: str = 'pixelate',
blur_strength: float = 3.0,
pixel_blocks: int = 10,
conf_thresh: float = 0.5,
**kwargs,
) -> np.ndarray:
"""One-line face anonymization with automatic detection.
Args:
image (np.ndarray): Input image (BGR format).
detector: Face detector instance. Creates RetinaFace if None.
method (str): Blur method name. Defaults to 'pixelate'.
blur_strength (float): Blur intensity. Defaults to 3.0.
pixel_blocks (int): Block count for pixelate. Defaults to 10.
conf_thresh (float): Detection confidence threshold. Defaults to 0.5.
**kwargs: Additional detector arguments.
Returns:
np.ndarray: Anonymized image.
Example:
>>> from uniface.privacy import anonymize_faces
>>> anonymized = anonymize_faces(image, method='pixelate')
"""
if detector is None:
try:
from uniface import RetinaFace
detector = RetinaFace(conf_thresh=conf_thresh, **kwargs)
except ImportError as err:
raise ImportError('Could not import RetinaFace. Please ensure UniFace is properly installed.') from err
faces = detector.detect(image)
blurrer = BlurFace(method=method, blur_strength=blur_strength, pixel_blocks=pixel_blocks)
return blurrer.anonymize(image, faces)
__all__ = ['BlurFace', 'anonymize_faces']

193
uniface/privacy/blur.py Normal file
View File

@@ -0,0 +1,193 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Dict, List, Tuple, Union
import cv2
import numpy as np
__all__ = ['BlurFace']
def _gaussian_blur(region: np.ndarray, strength: float = 3.0) -> np.ndarray:
"""Apply Gaussian blur to a region."""
h, w = region.shape[:2]
kernel_size = max(3, int((min(h, w) / 7) * strength)) | 1
return cv2.GaussianBlur(region, (kernel_size, kernel_size), 0)
def _median_blur(region: np.ndarray, strength: float = 3.0) -> np.ndarray:
"""Apply median blur to a region."""
h, w = region.shape[:2]
kernel_size = max(3, int((min(h, w) / 7) * strength)) | 1
return cv2.medianBlur(region, kernel_size)
def _pixelate_blur(region: np.ndarray, blocks: int = 10) -> np.ndarray:
"""Apply pixelation to a region."""
h, w = region.shape[:2]
temp_h, temp_w = max(1, h // blocks), max(1, w // blocks)
temp = cv2.resize(region, (temp_w, temp_h), interpolation=cv2.INTER_LINEAR)
return cv2.resize(temp, (w, h), interpolation=cv2.INTER_NEAREST)
def _blackout_blur(region: np.ndarray, color: Tuple[int, int, int] = (0, 0, 0)) -> np.ndarray:
"""Replace region with solid color."""
return np.full_like(region, color)
class EllipticalBlur:
"""Elliptical blur with soft, feathered edges.
This blur applies Gaussian blur within an elliptical mask that follows
the natural oval shape of faces, requiring full image context for proper blending.
Args:
blur_strength (float): Blur intensity multiplier. Defaults to 3.0.
margin (int): Extra pixels to extend ellipse beyond bbox. Defaults to 20.
"""
def __init__(self, blur_strength: float = 3.0, margin: int = 20):
self.blur_strength = blur_strength
self.margin = margin
def __call__(
self,
image: np.ndarray,
bboxes: List[Union[Tuple, List]],
inplace: bool = False,
) -> np.ndarray:
if not inplace:
image = image.copy()
h, w = image.shape[:2]
for bbox in bboxes:
x1, y1, x2, y2 = map(int, bbox)
center_x, center_y = (x1 + x2) // 2, (y1 + y2) // 2
axes_x = (x2 - x1) // 2 + self.margin
axes_y = (y2 - y1) // 2 + self.margin
# Create soft elliptical mask
mask = np.zeros((h, w), dtype=np.float32)
cv2.ellipse(mask, (center_x, center_y), (axes_x, axes_y), 0, 0, 360, 255, -1)
mask = cv2.GaussianBlur(mask, (51, 51), 0) / 255.0
mask = mask[:, :, np.newaxis]
kernel_size = max(3, int((min(axes_y, axes_x) * 2 / 7) * self.blur_strength)) | 1
blurred = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
image = (blurred * mask + image * (1 - mask)).astype(np.uint8)
return image
class BlurFace:
"""Face blurring with multiple anonymization methods.
Args:
method (str): Blur method - 'gaussian', 'pixelate', 'blackout', 'elliptical', or 'median'.
Defaults to 'pixelate'.
blur_strength (float): Intensity for gaussian/elliptical/median. Defaults to 3.0.
pixel_blocks (int): Block count for pixelate. Defaults to 10.
color (Tuple[int, int, int]): Fill color (BGR) for blackout. Defaults to (0, 0, 0).
margin (int): Edge margin for elliptical. Defaults to 20.
Example:
>>> blurrer = BlurFace(method='pixelate')
>>> anonymized = blurrer.anonymize(image, faces)
"""
VALID_METHODS = {'gaussian', 'pixelate', 'blackout', 'elliptical', 'median'}
def __init__(
self,
method: str = 'pixelate',
blur_strength: float = 3.0,
pixel_blocks: int = 15,
color: Tuple[int, int, int] = (0, 0, 0),
margin: int = 20,
):
self.method = method.lower()
self._blur_strength = blur_strength
self._pixel_blocks = pixel_blocks
self._color = color
self._margin = margin
if self.method not in self.VALID_METHODS:
raise ValueError(f"Invalid blur method: '{method}'. Choose from: {sorted(self.VALID_METHODS)}")
if self.method == 'elliptical':
self._elliptical = EllipticalBlur(blur_strength, margin)
def _blur_region(self, region: np.ndarray) -> np.ndarray:
if self.method == 'gaussian':
return _gaussian_blur(region, self._blur_strength)
elif self.method == 'median':
return _median_blur(region, self._blur_strength)
elif self.method == 'pixelate':
return _pixelate_blur(region, self._pixel_blocks)
elif self.method == 'blackout':
return _blackout_blur(region, self._color)
def anonymize(
self,
image: np.ndarray,
faces: List[Dict],
inplace: bool = False,
) -> np.ndarray:
"""Anonymize faces in an image.
Args:
image (np.ndarray): Input image (BGR format).
faces (List[Dict]): Face detections with 'bbox' key containing [x1, y1, x2, y2].
inplace (bool): Modify image in-place if True. Defaults to False.
Returns:
np.ndarray: Image with anonymized faces.
"""
if not faces:
return image if inplace else image.copy()
bboxes = [face['bbox'] for face in faces]
return self.blur_regions(image, bboxes, inplace)
def blur_regions(
self,
image: np.ndarray,
bboxes: List[Union[Tuple, List]],
inplace: bool = False,
) -> np.ndarray:
"""Blur specific rectangular regions in an image.
Args:
image (np.ndarray): Input image (BGR format).
bboxes (List): Bounding boxes as [x1, y1, x2, y2].
inplace (bool): Modify image in-place if True. Defaults to False.
Returns:
np.ndarray: Image with blurred regions.
"""
if not bboxes:
return image if inplace else image.copy()
if self.method == 'elliptical':
return self._elliptical(image, bboxes, inplace)
if not inplace:
image = image.copy()
h, w = image.shape[:2]
for bbox in bboxes:
x1, y1, x2, y2 = map(int, bbox)
x1, y1 = max(0, x1), max(0, y1)
x2, y2 = min(w, x2), min(h, y2)
if x2 > x1 and y2 > y1:
image[y1:y2, x1:x2] = self._blur_region(image[y1:y2, x1:x2])
return image
def __repr__(self) -> str:
return f"BlurFace(method='{self.method}')"

0
uniface/py.typed Normal file
View File

View File

@@ -2,10 +2,10 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Dict
from .models import ArcFace, MobileFace, SphereFace
from .base import BaseRecognizer
from uniface.constants import ArcFaceWeights, MobileFaceWeights, SphereFaceWeights
from .models import ArcFace, MobileFace, SphereFace
def create_recognizer(method: str = 'arcface', **kwargs) -> BaseRecognizer:
"""
@@ -54,10 +54,5 @@ def create_recognizer(method: str = 'arcface', **kwargs) -> BaseRecognizer:
available = ['arcface', 'mobileface', 'sphereface']
raise ValueError(f"Unsupported method: '{method}'. Available: {available}")
__all__ = [
"create_recognizer",
"ArcFace",
"MobileFace",
"SphereFace",
"BaseRecognizer",
]
__all__ = ['create_recognizer', 'BaseRecognizer', 'ArcFace', 'MobileFace', 'SphereFace']

View File

@@ -3,13 +3,14 @@
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Tuple, Union
import cv2
import numpy as np
from dataclasses import dataclass
from typing import Tuple, Union, List
from uniface.log import Logger
from uniface.face_utils import face_alignment
from uniface.log import Logger
from uniface.onnx_utils import create_onnx_session
@@ -18,6 +19,7 @@ class PreprocessConfig:
"""
Configuration for preprocessing images before feeding them into the model.
"""
input_mean: Union[float, List[float]] = 127.5
input_std: Union[float, List[float]] = 127.5
input_size: Tuple[int, int] = (112, 112)
@@ -28,6 +30,7 @@ class BaseRecognizer(ABC):
Abstract Base Class for all face recognition models.
It provides the core functionality for preprocessing, inference, and embedding extraction.
"""
@abstractmethod
def __init__(self, model_path: str, preprocessing: PreprocessConfig) -> None:
"""
@@ -63,17 +66,20 @@ class BaseRecognizer(ABC):
input_shape = input_cfg.shape
model_input_size = tuple(input_shape[2:4][::-1]) # (width, height)
if model_input_size != self.input_size:
Logger.warning(f"Model input size {model_input_size} differs from configured size {self.input_size}")
Logger.warning(f'Model input size {model_input_size} differs from configured size {self.input_size}')
# Extract output configuration
self.output_names = [output.name for output in self.session.get_outputs()]
self.output_shape = self.session.get_outputs()[0].shape
assert len(self.output_names) == 1, "Expected only one output node."
Logger.info(f"Successfully initialized face encoder from {self.model_path}")
assert len(self.output_names) == 1, 'Expected only one output node.'
Logger.info(f'Successfully initialized face encoder from {self.model_path}')
except Exception as e:
Logger.error(f"Failed to load face encoder model from '{self.model_path}'", exc_info=True)
Logger.error(
f"Failed to load face encoder model from '{self.model_path}'",
exc_info=True,
)
raise RuntimeError(f"Failed to initialize model session for '{self.model_path}'") from e
def preprocess(self, face_img: np.ndarray) -> np.ndarray:
@@ -91,8 +97,9 @@ class BaseRecognizer(ABC):
if isinstance(self.input_std, (list, tuple)):
# Per-channel normalization
rgb_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB).astype(np.float32)
normalized_img = (rgb_img - np.array(self.input_mean, dtype=np.float32)) / \
np.array(self.input_std, dtype=np.float32)
normalized_img = (rgb_img - np.array(self.input_mean, dtype=np.float32)) / np.array(
self.input_std, dtype=np.float32
)
# Change to NCHW (batch, channels, height, width)
blob = np.transpose(normalized_img, (2, 0, 1)) # CHW
@@ -104,24 +111,28 @@ class BaseRecognizer(ABC):
scalefactor=1.0 / self.input_std,
size=self.input_size,
mean=(self.input_mean, self.input_mean, self.input_mean),
swapRB=True # Convert BGR to RGB
swapRB=True, # Convert BGR to RGB
)
return blob
def get_embedding(self, image: np.ndarray, landmarks: np.ndarray) -> np.ndarray:
def get_embedding(self, image: np.ndarray, landmarks: np.ndarray = None) -> np.ndarray:
"""
Extracts face embedding from an image.
Args:
image: Input face image (BGR format).
landmarks: Facial landmarks (5 points for alignment).
image: Input face image (BGR format). If already aligned (112x112), landmarks can be None.
landmarks: Facial landmarks (5 points for alignment). Optional if image is already aligned.
Returns:
Face embedding vector (typically 512-dimensional).
"""
# Align face using landmarks
aligned_face, _ = face_alignment(image, landmarks)
# If landmarks are provided, align the face first
if landmarks is not None:
aligned_face, _ = face_alignment(image, landmarks, image_size=self.input_size)
else:
# Assume image is already aligned
aligned_face = image
# Generate embedding from aligned face
face_blob = self.preprocess(aligned_face)

View File

@@ -6,9 +6,10 @@ from typing import Optional
from uniface.constants import ArcFaceWeights, MobileFaceWeights, SphereFaceWeights
from uniface.model_store import verify_model_weights
from .base import BaseRecognizer, PreprocessConfig
__all__ = ["ArcFace", "MobileFace", "SphereFace"]
__all__ = ['ArcFace', 'MobileFace', 'SphereFace']
class ArcFace(BaseRecognizer):
@@ -33,14 +34,10 @@ class ArcFace(BaseRecognizer):
def __init__(
self,
model_name: ArcFaceWeights = ArcFaceWeights.MNET,
preprocessing: Optional[PreprocessConfig] = None
preprocessing: Optional[PreprocessConfig] = None,
) -> None:
if preprocessing is None:
preprocessing = PreprocessConfig(
input_mean=127.5,
input_std=127.5,
input_size=(112, 112)
)
preprocessing = PreprocessConfig(input_mean=127.5, input_std=127.5, input_size=(112, 112))
model_path = verify_model_weights(model_name)
super().__init__(model_path=model_path, preprocessing=preprocessing)
@@ -67,14 +64,10 @@ class MobileFace(BaseRecognizer):
def __init__(
self,
model_name: MobileFaceWeights = MobileFaceWeights.MNET_V2,
preprocessing: Optional[PreprocessConfig] = None
preprocessing: Optional[PreprocessConfig] = None,
) -> None:
if preprocessing is None:
preprocessing = PreprocessConfig(
input_mean=127.5,
input_std=127.5,
input_size=(112, 112)
)
preprocessing = PreprocessConfig(input_mean=127.5, input_std=127.5, input_size=(112, 112))
model_path = verify_model_weights(model_name)
super().__init__(model_path=model_path, preprocessing=preprocessing)
@@ -101,14 +94,10 @@ class SphereFace(BaseRecognizer):
def __init__(
self,
model_name: SphereFaceWeights = SphereFaceWeights.SPHERE20,
preprocessing: Optional[PreprocessConfig] = None
preprocessing: Optional[PreprocessConfig] = None,
) -> None:
if preprocessing is None:
preprocessing = PreprocessConfig(
input_mean=127.5,
input_std=127.5,
input_size=(112, 112)
)
preprocessing = PreprocessConfig(input_mean=127.5, input_std=127.5, input_size=(112, 112))
model_path = verify_model_weights(model_name)
super().__init__(model_path=model_path, preprocessing=preprocessing)

View File

@@ -0,0 +1,64 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Optional
from uniface.constants import MiniFASNetWeights
from .base import BaseSpoofer
from .minifasnet import MiniFASNet
__all__ = [
'BaseSpoofer',
'MiniFASNet',
'MiniFASNetWeights',
'create_spoofer',
]
def create_spoofer(
model_name: MiniFASNetWeights = MiniFASNetWeights.V2,
scale: Optional[float] = None,
) -> MiniFASNet:
"""
Factory function to create a face anti-spoofing model.
This is a convenience function that creates a MiniFASNet instance
with the specified model variant and optional custom scale.
Args:
model_name (MiniFASNetWeights): The model variant to use.
Options:
- MiniFASNetWeights.V2: Improved version (default), uses scale=2.7
- MiniFASNetWeights.V1SE: Squeeze-and-excitation version, uses scale=4.0
Defaults to MiniFASNetWeights.V2.
scale (Optional[float]): Custom crop scale factor for face region.
If None, uses the default scale for the selected model variant.
Returns:
MiniFASNet: An initialized face anti-spoofing model.
Example:
>>> from uniface.spoofing import create_spoofer, MiniFASNetWeights
>>> from uniface import RetinaFace
>>>
>>> # Create with default settings (V2 model)
>>> spoofer = create_spoofer()
>>>
>>> # Create with V1SE model
>>> spoofer = create_spoofer(model_name=MiniFASNetWeights.V1SE)
>>>
>>> # Create with custom scale
>>> spoofer = create_spoofer(scale=3.0)
>>>
>>> # Use with face detector
>>> detector = RetinaFace()
>>> faces = detector.detect(image)
>>> for face in faces:
... label_idx, score = spoofer.predict(image, face['bbox'])
... # label_idx: 0 = Fake, 1 = Real
... label = 'Real' if label_idx == 1 else 'Fake'
... print(f'{label}: {score:.2%}')
"""
return MiniFASNet(model_name=model_name, scale=scale)

117
uniface/spoofing/base.py Normal file
View File

@@ -0,0 +1,117 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from typing import List, Tuple, Union
import numpy as np
class BaseSpoofer(ABC):
"""
Abstract base class for all face anti-spoofing models.
This class defines the common interface that all anti-spoofing models must implement,
ensuring consistency across different spoofing detection methods. Anti-spoofing models
detect whether a face is real (live person) or fake (photo, video, mask, etc.).
The prediction returns a tuple of (label_idx, score):
- label_idx: 0 = Fake (spoof), 1 = Real (live)
- score: Confidence score for the predicted label (0.0 to 1.0)
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> np.ndarray:
"""
Preprocess the input image for model inference.
This method should crop the face region using the bounding box,
resize it to the model's expected input size, and normalize
the pixel values as required by the model.
Args:
image (np.ndarray): Input image in BGR format with shape (H, W, C).
bbox (Union[List, np.ndarray]): Face bounding box in [x1, y1, x2, y2] format.
Returns:
np.ndarray: The preprocessed image tensor ready for inference,
typically with shape (1, C, H, W).
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, outputs: np.ndarray) -> Tuple[int, float]:
"""
Postprocess raw model outputs into prediction result.
This method takes the raw output from the model's inference and
converts it into a label index and confidence score.
Args:
outputs (np.ndarray): Raw outputs from the model inference (logits).
Returns:
Tuple[int, float]: A tuple of (label_idx, score) where:
- label_idx: 0 = Fake (spoof), 1 = Real (live)
- score: Confidence score for the predicted label (0.0 to 1.0)
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, float]:
"""
Perform end-to-end anti-spoofing prediction on a face.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the prediction.
Args:
image (np.ndarray): Input image in BGR format containing the face.
bbox (Union[List, np.ndarray]): Face bounding box in [x1, y1, x2, y2] format.
This is typically obtained from a face detector.
Returns:
Tuple[int, float]: A tuple of (label_idx, score) where:
- label_idx: 0 = Fake (spoof), 1 = Real (live)
- score: Confidence score for the predicted label (0.0 to 1.0)
Example:
>>> spoofer = MiniFASNet()
>>> detector = RetinaFace()
>>> faces = detector.detect(image)
>>> for face in faces:
... label_idx, score = spoofer.predict(image, face['bbox'])
... label = 'Real' if label_idx == 1 else 'Fake'
... print(f'{label}: {score:.2%}')
"""
raise NotImplementedError('Subclasses must implement the predict method.')
def __call__(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, float]:
"""
Provides a convenient, callable shortcut for the `predict` method.
Args:
image (np.ndarray): Input image in BGR format.
bbox (Union[List, np.ndarray]): Face bounding box in [x1, y1, x2, y2] format.
Returns:
Tuple[int, float]: A tuple of (label_idx, score) where:
- label_idx: 0 = Fake (spoof), 1 = Real (live)
- score: Confidence score for the predicted label (0.0 to 1.0)
"""
return self.predict(image, bbox)

View File

@@ -0,0 +1,225 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import List, Optional, Tuple, Union
import cv2
import numpy as np
from uniface.constants import MiniFASNetWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseSpoofer
__all__ = ['MiniFASNet']
# Default crop scales for each model variant
DEFAULT_SCALES = {
MiniFASNetWeights.V1SE: 4.0,
MiniFASNetWeights.V2: 2.7,
}
class MiniFASNet(BaseSpoofer):
"""
MiniFASNet: Lightweight Face Anti-Spoofing with ONNX Runtime.
MiniFASNet is a face anti-spoofing model that detects whether a face is real
(live person) or fake (photo, video replay, mask, etc.). It supports two model
variants: V1SE (with squeeze-and-excitation) and V2 (improved version).
The model takes a face region cropped from the image using a bounding box
and predicts whether it's a real or spoofed face.
Reference:
https://github.com/yakhyo/face-anti-spoofing
Args:
model_name (MiniFASNetWeights): The enum specifying the model variant to load.
Options: V1SE (scale=4.0), V2 (scale=2.7).
Defaults to `MiniFASNetWeights.V2`.
scale (Optional[float]): Custom crop scale factor for face region.
If None, uses the default scale for the selected model variant.
V1SE uses 4.0, V2 uses 2.7.
Attributes:
scale (float): Crop scale factor for face region extraction.
input_size (Tuple[int, int]): Model input dimensions (width, height).
Example:
>>> from uniface.spoofing import MiniFASNet
>>> from uniface import RetinaFace
>>>
>>> detector = RetinaFace()
>>> spoofer = MiniFASNet()
>>>
>>> # Detect faces and check if they are real
>>> faces = detector.detect(image)
>>> for face in faces:
... label_idx, score = spoofer.predict(image, face['bbox'])
... # label_idx: 0 = Fake, 1 = Real
... label = 'Real' if label_idx == 1 else 'Fake'
... print(f'{label}: {score:.2%}')
"""
def __init__(
self,
model_name: MiniFASNetWeights = MiniFASNetWeights.V2,
scale: Optional[float] = None,
) -> None:
Logger.info(f'Initializing MiniFASNet with model={model_name.name}')
# Use default scale for the model variant if not specified
self.scale = scale if scale is not None else DEFAULT_SCALES.get(model_name, 2.7)
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get input configuration
input_cfg = self.session.get_inputs()[0]
self.input_name = input_cfg.name
# Input shape is (batch, channels, height, width) - we need (width, height)
self.input_size = tuple(input_cfg.shape[2:4][::-1]) # (width, height)
# Get output configuration
output_cfg = self.session.get_outputs()[0]
self.output_name = output_cfg.name
Logger.info(f'MiniFASNet initialized with input size {self.input_size}, scale={self.scale}')
except Exception as e:
Logger.error(f"Failed to load MiniFASNet model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize MiniFASNet model: {e}') from e
def _xyxy_to_xywh(self, bbox: Union[List, np.ndarray]) -> List[int]:
"""Convert bounding box from [x1, y1, x2, y2] to [x, y, w, h] format."""
x1, y1, x2, y2 = bbox[:4]
return [int(x1), int(y1), int(x2 - x1), int(y2 - y1)]
def _crop_face(self, image: np.ndarray, bbox_xywh: List[int]) -> np.ndarray:
"""
Crop and resize face region from image using scale factor.
The crop is centered on the face bounding box and scaled to capture
more context around the face, which is important for anti-spoofing.
Args:
image: Input image in BGR format.
bbox_xywh: Face bounding box in [x, y, w, h] format.
Returns:
Cropped and resized face region.
"""
src_h, src_w = image.shape[:2]
x, y, box_w, box_h = bbox_xywh
# Calculate the scale to apply based on image and face size
scale = min((src_h - 1) / box_h, (src_w - 1) / box_w, self.scale)
new_w = box_w * scale
new_h = box_h * scale
# Calculate center of the bounding box
center_x = x + box_w / 2
center_y = y + box_h / 2
# Calculate new bounding box coordinates
x1 = max(0, int(center_x - new_w / 2))
y1 = max(0, int(center_y - new_h / 2))
x2 = min(src_w - 1, int(center_x + new_w / 2))
y2 = min(src_h - 1, int(center_y + new_h / 2))
# Crop and resize
cropped = image[y1 : y2 + 1, x1 : x2 + 1]
resized = cv2.resize(cropped, self.input_size)
return resized
def preprocess(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> np.ndarray:
"""
Preprocess the input image for model inference.
Crops the face region, converts to float32, and arranges
dimensions for the model (NCHW format).
Args:
image: Input image in BGR format with shape (H, W, C).
bbox: Face bounding box in [x1, y1, x2, y2] format.
Returns:
Preprocessed image tensor with shape (1, C, H, W).
"""
# Convert bbox format
bbox_xywh = self._xyxy_to_xywh(bbox)
# Crop and resize face region
face = self._crop_face(image, bbox_xywh)
# Convert to float32 (no normalization needed for this model)
face = face.astype(np.float32)
# HWC -> CHW -> NCHW
face = np.transpose(face, (2, 0, 1))
face = np.expand_dims(face, axis=0)
return face
def _softmax(self, x: np.ndarray) -> np.ndarray:
"""Apply softmax to logits along axis 1."""
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return e_x / e_x.sum(axis=1, keepdims=True)
def postprocess(self, outputs: np.ndarray) -> Tuple[int, float]:
"""
Postprocess raw model outputs into prediction result.
Applies softmax to convert logits to probabilities and
returns the predicted label index and confidence score.
Args:
outputs: Raw outputs from the model inference (logits).
Returns:
Tuple[int, float]: A tuple of (label_idx, score) where:
- label_idx: 0 = Fake (spoof), 1 = Real (live)
- score: Confidence score for the predicted label (0.0 to 1.0)
"""
probs = self._softmax(outputs)
label_idx = int(np.argmax(probs))
score = float(probs[0, label_idx])
return label_idx, score
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, float]:
"""
Perform end-to-end anti-spoofing prediction on a face.
Args:
image: Input image in BGR format containing the face.
bbox: Face bounding box in [x1, y1, x2, y2] format.
Returns:
Tuple[int, float]: A tuple of (label_idx, score) where:
- label_idx: 0 = Fake (spoof), 1 = Real (live)
- score: Confidence score for the predicted label (0.0 to 1.0)
"""
# Preprocess
input_tensor = self.preprocess(image, bbox)
# Run inference
outputs = self.session.run([self.output_name], {self.input_name: input_tensor})[0]
# Postprocess and return
return self.postprocess(outputs)

View File

@@ -2,49 +2,329 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import List, Tuple, Union
import cv2
import numpy as np
from typing import List, Union
# Face parsing component names (19 classes)
FACE_PARSING_LABELS = [
'background',
'skin',
'l_brow',
'r_brow',
'l_eye',
'r_eye',
'eye_g',
'l_ear',
'r_ear',
'ear_r',
'nose',
'mouth',
'u_lip',
'l_lip',
'neck',
'neck_l',
'cloth',
'hair',
'hat',
]
# Color palette for face parsing visualization
FACE_PARSING_COLORS = [
[0, 0, 0],
[255, 85, 0],
[255, 170, 0],
[255, 0, 85],
[255, 0, 170],
[0, 255, 0],
[85, 255, 0],
[170, 255, 0],
[0, 255, 85],
[0, 255, 170],
[0, 0, 255],
[85, 0, 255],
[170, 0, 255],
[0, 85, 255],
[0, 170, 255],
[255, 255, 0],
[255, 255, 85],
[255, 255, 170],
[255, 0, 255],
]
def draw_detections(
*,
image: np.ndarray,
bboxes: Union[np.ndarray, List[List[float]]],
bboxes: Union[List[np.ndarray], List[List[float]]],
scores: Union[np.ndarray, List[float]],
landmarks: Union[np.ndarray, List[List[List[float]]]],
vis_threshold: float = 0.6
landmarks: Union[List[np.ndarray], List[List[List[float]]]],
vis_threshold: float = 0.6,
draw_score: bool = False,
fancy_bbox: bool = True,
):
"""
Draws bounding boxes, scores, and landmarks from separate lists onto an image.
Draws bounding boxes, landmarks, and optional scores on an image.
Args:
image (np.ndarray): The image to draw on.
bboxes (list or np.ndarray): A list of bounding boxes, e.g., [[x1,y1,x2,y2], ...].
scores (list or np.ndarray): A list of confidence scores.
landmarks (list or np.ndarray): A list of landmark sets, e.g., [[[x,y],...],...].
vis_threshold (float): Confidence threshold for filtering which detections to draw.
image: Input image to draw on.
bboxes: List of bounding boxes [x1, y1, x2, y2].
scores: List of confidence scores.
landmarks: List of landmark sets with shape (5, 2).
vis_threshold: Confidence threshold for filtering. Defaults to 0.6.
draw_score: Whether to draw confidence scores. Defaults to False.
"""
_colors = [(0, 0, 255), (0, 255, 255), (255, 0, 255), (0, 255, 0), (255, 0, 0)]
colors = [(0, 0, 255), (0, 255, 255), (255, 0, 255), (0, 255, 0), (255, 0, 0)]
# Filter detections by score
# Calculate line thickness based on image size
line_thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
# Filter detections by confidence threshold
keep_indices = [i for i, score in enumerate(scores) if score >= vis_threshold]
# Draw the filtered detections
for i in keep_indices:
bbox = np.array(bboxes[i], dtype=np.int32)
score = scores[i]
landmark_set = np.array(landmarks[i], dtype=np.int32)
# Calculate adaptive thickness
thickness = max(1, int(min(bbox[2] - bbox[0], bbox[3] - bbox[1]) / 100))
# Calculate dynamic font scale based on bbox height
bbox_h = bbox[3] - bbox[1]
font_scale = max(0.4, min(0.7, bbox_h / 200))
font_thickness = 2
# Draw bounding box
cv2.rectangle(image, tuple(bbox[:2]), tuple(bbox[2:]), (0, 0, 255), thickness)
if fancy_bbox:
draw_fancy_bbox(image, bbox, color=(0, 255, 0), thickness=line_thickness, proportion=0.2)
else:
cv2.rectangle(image, tuple(bbox[:2]), tuple(bbox[2:]), (0, 255, 0), line_thickness)
# Draw score
cv2.putText(image, f"{score:.2f}", (bbox[0], bbox[1] - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), thickness)
# Draw confidence score with background
if draw_score:
text = f'{score:.2f}'
(text_width, text_height), baseline = cv2.getTextSize(
text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, font_thickness
)
# Draw background rectangle
cv2.rectangle(
image,
(bbox[0], bbox[1] - text_height - baseline - 10),
(bbox[0] + text_width + 10, bbox[1]),
(0, 255, 0),
-1,
)
# Draw text
cv2.putText(
image,
text,
(bbox[0] + 5, bbox[1] - 5),
cv2.FONT_HERSHEY_SIMPLEX,
font_scale,
(0, 0, 0),
font_thickness,
)
# Draw landmarks
for j, point in enumerate(landmark_set):
cv2.circle(image, tuple(point), thickness + 1, _colors[j], -1)
cv2.circle(image, tuple(point), line_thickness + 1, colors[j], -1)
def draw_fancy_bbox(
image: np.ndarray,
bbox: np.ndarray,
color: Tuple[int, int, int] = (0, 255, 0),
thickness: int = 3,
proportion: float = 0.2,
):
"""
Draws a bounding box with fancy corners on an image.
Args:
image: Input image to draw on.
bbox: Bounding box coordinates [x1, y1, x2, y2].
color: Color of the bounding box. Defaults to green.
thickness: Thickness of the bounding box lines. Defaults to 3.
proportion: Proportion of the corner length to the width/height of the bounding box. Defaults to 0.2.
"""
x1, y1, x2, y2 = map(int, bbox)
width = x2 - x1
height = y2 - y1
corner_length = int(proportion * min(width, height))
# Draw the rectangle
cv2.rectangle(image, (x1, y1), (x2, y2), color, 1)
# Top-left corner
cv2.line(image, (x1, y1), (x1 + corner_length, y1), color, thickness)
cv2.line(image, (x1, y1), (x1, y1 + corner_length), color, thickness)
# Top-right corner
cv2.line(image, (x2, y1), (x2 - corner_length, y1), color, thickness)
cv2.line(image, (x2, y1), (x2, y1 + corner_length), color, thickness)
# Bottom-left corner
cv2.line(image, (x1, y2), (x1, y2 - corner_length), color, thickness)
cv2.line(image, (x1, y2), (x1 + corner_length, y2), color, thickness)
# Bottom-right corner
cv2.line(image, (x2, y2), (x2, y2 - corner_length), color, thickness)
cv2.line(image, (x2, y2), (x2 - corner_length, y2), color, thickness)
def draw_gaze(
image: np.ndarray,
bbox: np.ndarray,
pitch: np.ndarray,
yaw: np.ndarray,
*,
draw_bbox: bool = True,
fancy_bbox: bool = True,
draw_angles: bool = True,
):
"""
Draws gaze direction with optional bounding box on an image.
Args:
image: Input image to draw on (modified in-place).
bbox: Face bounding box [x1, y1, x2, y2].
pitch: Vertical gaze angle in radians.
yaw: Horizontal gaze angle in radians.
draw_bbox: Whether to draw the bounding box. Defaults to True.
fancy_bbox: Use fancy corner-style bbox. Defaults to True.
draw_angles: Whether to display pitch/yaw values as text. Defaults to False.
"""
x_min, y_min, x_max, y_max = map(int, bbox[:4])
# Calculate dynamic line thickness based on image size (same as draw_detections)
line_thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
# Calculate dynamic font scale based on bbox height (same as draw_detections)
bbox_h = y_max - y_min
font_scale = max(0.4, min(0.7, bbox_h / 200))
font_thickness = 2
# Draw bounding box if requested
if draw_bbox:
if fancy_bbox:
draw_fancy_bbox(image, bbox, color=(0, 255, 0), thickness=line_thickness)
else:
cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), line_thickness)
# Calculate center of the bounding box
x_center = (x_min + x_max) // 2
y_center = (y_min + y_max) // 2
# Calculate the direction of the gaze
length = x_max - x_min
dx = int(-length * np.sin(pitch) * np.cos(yaw))
dy = int(-length * np.sin(yaw))
point1 = (x_center, y_center)
point2 = (x_center + dx, y_center + dy)
# Calculate dynamic center point radius based on line thickness
center_radius = max(line_thickness + 1, 4)
# Draw gaze direction
cv2.circle(image, (x_center, y_center), radius=center_radius, color=(0, 0, 255), thickness=-1)
cv2.arrowedLine(
image,
point1,
point2,
color=(0, 0, 255),
thickness=line_thickness,
line_type=cv2.LINE_AA,
tipLength=0.25,
)
# Draw angle values
if draw_angles:
text = f'P:{np.degrees(pitch):.0f}deg Y:{np.degrees(yaw):.0f}deg'
(text_width, text_height), baseline = cv2.getTextSize(
text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, font_thickness
)
# Draw background rectangle for text
cv2.rectangle(
image,
(x_min, y_min - text_height - baseline - 10),
(x_min + text_width + 10, y_min),
(0, 0, 255),
-1,
)
# Draw text
cv2.putText(
image,
text,
(x_min + 5, y_min - 5),
cv2.FONT_HERSHEY_SIMPLEX,
font_scale,
(255, 255, 255),
font_thickness,
)
def vis_parsing_maps(
image: np.ndarray,
segmentation_mask: np.ndarray,
*,
save_image: bool = False,
save_path: str = 'result.png',
) -> np.ndarray:
"""
Visualizes face parsing segmentation mask by overlaying colored regions on the image.
Args:
image: Input face image in RGB format with shape (H, W, 3).
segmentation_mask: Segmentation mask with shape (H, W) where each pixel
value represents a facial component class (0-18).
save_image: Whether to save the visualization to disk. Defaults to False.
save_path: Path to save the visualization if save_image is True.
Returns:
np.ndarray: Blended image with segmentation overlay in BGR format.
Example:
>>> import cv2
>>> from uniface.parsing import BiSeNet
>>> from uniface.visualization import vis_parsing_maps
>>>
>>> parser = BiSeNet()
>>> face_image = cv2.imread('face.jpg')
>>> mask = parser.parse(face_image)
>>>
>>> # Visualize
>>> face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
>>> result = vis_parsing_maps(face_rgb, mask)
>>> cv2.imwrite('parsed_face.jpg', result)
"""
# Create numpy arrays for image and segmentation mask
image = np.array(image).copy().astype(np.uint8)
segmentation_mask = segmentation_mask.copy().astype(np.uint8)
# Create a color mask
segmentation_mask_color = np.zeros((segmentation_mask.shape[0], segmentation_mask.shape[1], 3))
num_classes = np.max(segmentation_mask)
for class_index in range(1, num_classes + 1):
class_pixels = np.where(segmentation_mask == class_index)
segmentation_mask_color[class_pixels[0], class_pixels[1], :] = FACE_PARSING_COLORS[class_index]
segmentation_mask_color = segmentation_mask_color.astype(np.uint8)
# Convert image to BGR format for blending
bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Blend the image with the segmentation mask
blended_image = cv2.addWeighted(bgr_image, 0.6, segmentation_mask_color, 0.4, 0)
# Save the result if required
if save_image:
cv2.imwrite(save_path, blended_image, [int(cv2.IMWRITE_JPEG_QUALITY), 100])
return blended_image