mirror of
https://github.com/yakhyo/uniface.git
synced 2025-12-30 09:02:25 +00:00
500 lines
16 KiB
Markdown
500 lines
16 KiB
Markdown
# UniFace Model Zoo
|
||
|
||
Complete guide to all available models, their performance characteristics, and selection criteria.
|
||
|
||
---
|
||
|
||
## Face Detection Models
|
||
|
||
### RetinaFace Family
|
||
|
||
RetinaFace models are trained on the WIDER FACE dataset and provide excellent accuracy-speed tradeoffs.
|
||
|
||
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
|
||
| -------------- | ------ | ----- | ------ | ------ | ------ | ----------------------------- |
|
||
| `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% | Mobile/Edge devices |
|
||
| `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% | Mobile/Edge devices |
|
||
| `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% | Balanced mobile |
|
||
| `MNET_V2` ⭐ | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% | **Recommended default** |
|
||
| `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% | Server/High accuracy |
|
||
| `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% | Maximum accuracy |
|
||
|
||
**Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) - from [RetinaFace paper](https://arxiv.org/abs/1905.00641)
|
||
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import RetinaFace
|
||
from uniface.constants import RetinaFaceWeights
|
||
|
||
# Default (recommended)
|
||
detector = RetinaFace() # Uses MNET_V2
|
||
|
||
# Specific model
|
||
detector = RetinaFace(
|
||
model_name=RetinaFaceWeights.MNET_025, # Fastest
|
||
conf_thresh=0.5,
|
||
nms_thresh=0.4,
|
||
input_size=(640, 640)
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
### SCRFD Family
|
||
|
||
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models offer state-of-the-art speed-accuracy tradeoffs.
|
||
|
||
| Model Name | Params | Size | Easy | Medium | Hard | Use Case |
|
||
| ---------------- | ------ | ----- | ------ | ------ | ------ | ------------------------------- |
|
||
| `SCRFD_500M` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% | Real-time applications |
|
||
| `SCRFD_10G` ⭐ | 4.2M | 17MB | 95.16% | 93.87% | 83.05% | **High accuracy + speed** |
|
||
|
||
**Accuracy**: WIDER FACE validation set - from [SCRFD paper](https://arxiv.org/abs/2105.04714)
|
||
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import SCRFD
|
||
from uniface.constants import SCRFDWeights
|
||
|
||
# Fast real-time detection
|
||
detector = SCRFD(
|
||
model_name=SCRFDWeights.SCRFD_500M_KPS,
|
||
conf_thresh=0.5,
|
||
input_size=(640, 640)
|
||
)
|
||
|
||
# High accuracy
|
||
detector = SCRFD(
|
||
model_name=SCRFDWeights.SCRFD_10G_KPS,
|
||
conf_thresh=0.5
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
### YOLOv5-Face Family
|
||
|
||
YOLOv5-Face models provide excellent detection accuracy with 5-point facial landmarks, optimized for real-time applications.
|
||
|
||
| Model Name | Size | Easy | Medium | Hard | Use Case |
|
||
| -------------- | ---- | ------ | ------ | ------ | ------------------------------ |
|
||
| `YOLOV5N` | 11MB | 93.61% | 91.52% | 80.53% | Lightweight/Mobile |
|
||
| `YOLOV5S` ⭐ | 28MB | 94.33% | 92.61% | 83.15% | **Real-time + accuracy** |
|
||
| `YOLOV5M` | 82MB | 95.30% | 93.76% | 85.28% | High accuracy |
|
||
|
||
**Accuracy**: WIDER FACE validation set - from [YOLOv5-Face paper](https://arxiv.org/abs/2105.12931)
|
||
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
|
||
**Note**: Fixed input size of 640×640. Models exported to ONNX from [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face)
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import YOLOv5Face
|
||
from uniface.constants import YOLOv5FaceWeights
|
||
|
||
# Lightweight/Mobile
|
||
detector = YOLOv5Face(
|
||
model_name=YOLOv5FaceWeights.YOLOV5N,
|
||
conf_thresh=0.6,
|
||
nms_thresh=0.5
|
||
)
|
||
|
||
# Real-time detection (recommended)
|
||
detector = YOLOv5Face(
|
||
model_name=YOLOv5FaceWeights.YOLOV5S,
|
||
conf_thresh=0.6,
|
||
nms_thresh=0.5
|
||
)
|
||
|
||
# High accuracy
|
||
detector = YOLOv5Face(
|
||
model_name=YOLOv5FaceWeights.YOLOV5M,
|
||
conf_thresh=0.6
|
||
)
|
||
|
||
# Detect faces with landmarks
|
||
faces = detector.detect(image)
|
||
for face in faces:
|
||
bbox = face['bbox'] # [x1, y1, x2, y2]
|
||
confidence = face['confidence']
|
||
landmarks = face['landmarks'] # 5-point landmarks (5, 2)
|
||
```
|
||
|
||
---
|
||
|
||
## Face Recognition Models
|
||
|
||
### ArcFace
|
||
|
||
State-of-the-art face recognition using additive angular margin loss.
|
||
|
||
| Model Name | Backbone | Params | Size | Use Case |
|
||
| ----------- | --------- | ------ | ----- | -------------------------------- |
|
||
| `MNET` ⭐ | MobileNet | 2.0M | 8MB | **Balanced (recommended)** |
|
||
| `RESNET` | ResNet50 | 43.6M | 166MB | Maximum accuracy |
|
||
|
||
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
|
||
**Accuracy**: Benchmark on your own dataset or use standard face verification benchmarks
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import ArcFace
|
||
from uniface.constants import ArcFaceWeights
|
||
|
||
# Default (MobileNet backbone)
|
||
recognizer = ArcFace()
|
||
|
||
# High accuracy (ResNet50 backbone)
|
||
recognizer = ArcFace(model_name=ArcFaceWeights.RESNET)
|
||
|
||
# Extract embedding
|
||
embedding = recognizer.get_normalized_embedding(image, landmarks)
|
||
# Returns: (1, 512) normalized embedding vector
|
||
```
|
||
|
||
---
|
||
|
||
### MobileFace
|
||
|
||
Lightweight face recognition optimized for mobile devices.
|
||
|
||
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | Use Case |
|
||
| ----------------- | ---------------- | ------ | ---- | ------ | ------ | ------ | -------- | --------------------- |
|
||
| `MNET_025` | MobileNetV1 0.25 | 0.36M | 1MB | 98.76% | 92.02% | 82.37% | 90.02% | Ultra-lightweight |
|
||
| `MNET_V2` ⭐ | MobileNetV2 | 2.29M | 4MB | 99.55% | 94.87% | 86.89% | 95.16% | **Mobile/Edge** |
|
||
| `MNET_V3_SMALL` | MobileNetV3-S | 1.25M | 3MB | 99.30% | 93.77% | 85.29% | 92.79% | Mobile optimized |
|
||
| `MNET_V3_LARGE` | MobileNetV3-L | 3.52M | 10MB | 99.53% | 94.56% | 86.79% | 95.13% | Balanced mobile |
|
||
|
||
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
|
||
**Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
|
||
**Note**: These models are lightweight alternatives to ArcFace for resource-constrained environments
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import MobileFace
|
||
from uniface.constants import MobileFaceWeights
|
||
|
||
# Lightweight
|
||
recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2)
|
||
```
|
||
|
||
---
|
||
|
||
### SphereFace
|
||
|
||
Face recognition using angular softmax loss.
|
||
|
||
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | Use Case |
|
||
| ------------ | -------- | ------ | ---- | ------ | ------ | ------ | -------- | ------------------- |
|
||
| `SPHERE20` | Sphere20 | 24.5M | 50MB | 99.67% | 95.61% | 88.75% | 96.58% | Research/Comparison |
|
||
| `SPHERE36` | Sphere36 | 34.6M | 92MB | 99.72% | 95.64% | 89.92% | 96.83% | Research/Comparison |
|
||
|
||
**Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
|
||
**Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
|
||
**Note**: SphereFace uses angular softmax loss, an earlier approach before ArcFace. These models provide good accuracy with moderate resource requirements.
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import SphereFace
|
||
from uniface.constants import SphereFaceWeights
|
||
|
||
recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20)
|
||
```
|
||
|
||
---
|
||
|
||
## Facial Landmark Models
|
||
|
||
### 106-Point Landmark Detection
|
||
|
||
High-precision facial landmark localization.
|
||
|
||
| Model Name | Points | Params | Size | Use Case |
|
||
| ---------- | ------ | ------ | ---- | ------------------------ |
|
||
| `2D106` | 106 | 3.7M | 14MB | Face alignment, analysis |
|
||
|
||
**Note**: Provides 106 facial keypoints for detailed face analysis and alignment
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import Landmark106
|
||
|
||
landmarker = Landmark106()
|
||
landmarks = landmarker.get_landmarks(image, bbox)
|
||
# Returns: (106, 2) array of (x, y) coordinates
|
||
```
|
||
|
||
**Landmark Groups:**
|
||
|
||
- Face contour: 0-32 (33 points)
|
||
- Eyebrows: 33-50 (18 points)
|
||
- Nose: 51-62 (12 points)
|
||
- Eyes: 63-86 (24 points)
|
||
- Mouth: 87-105 (19 points)
|
||
|
||
---
|
||
|
||
## Attribute Analysis Models
|
||
|
||
### Age & Gender Detection
|
||
|
||
| Model Name | Attributes | Params | Size | Use Case |
|
||
| ----------- | ----------- | ------ | ---- | --------------- |
|
||
| `DEFAULT` | Age, Gender | 2.1M | 8MB | General purpose |
|
||
|
||
**Dataset**: Trained on CelebA
|
||
**Note**: Accuracy varies by demographic and image quality. Test on your specific use case.
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import AgeGender
|
||
|
||
predictor = AgeGender()
|
||
gender, age = predictor.predict(image, bbox)
|
||
# Returns: (gender, age_in_years)
|
||
# gender: 0 for Female, 1 for Male
|
||
```
|
||
|
||
---
|
||
|
||
### Emotion Detection
|
||
|
||
| Model Name | Classes | Params | Size | Use Case |
|
||
| ------------- | ------- | ------ | ---- | --------------- |
|
||
| `AFFECNET7` | 7 | 0.5M | 2MB | 7-class emotion |
|
||
| `AFFECNET8` | 8 | 0.5M | 2MB | 8-class emotion |
|
||
|
||
**Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger
|
||
**Classes (8)**: Above + Contempt
|
||
|
||
**Dataset**: Trained on AffectNet
|
||
**Note**: Emotion detection accuracy depends heavily on facial expression clarity and cultural context
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import Emotion
|
||
from uniface.constants import DDAMFNWeights
|
||
|
||
predictor = Emotion(model_name=DDAMFNWeights.AFFECNET7)
|
||
emotion, confidence = predictor.predict(image, landmarks)
|
||
```
|
||
|
||
---
|
||
|
||
## Gaze Estimation Models
|
||
|
||
### MobileGaze Family
|
||
|
||
Real-time gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
|
||
|
||
| Model Name | Params | Size | MAE* | Use Case |
|
||
| -------------- | ------ | ------- | ----- | ----------------------------- |
|
||
| `RESNET18` | 11.7M | 43 MB | 12.84 | Balanced accuracy/speed |
|
||
| `RESNET34` ⭐ | 24.8M | 81.6 MB | 11.33 | **Recommended default** |
|
||
| `RESNET50` | 25.6M | 91.3 MB | 11.34 | High accuracy |
|
||
| `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 | Mobile/Edge devices |
|
||
| `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 | Lightweight/Real-time |
|
||
|
||
*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better
|
||
|
||
**Dataset**: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)
|
||
**Training**: 200 epochs with classification-based approach (binned angles)
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import MobileGaze
|
||
from uniface.constants import GazeWeights
|
||
import numpy as np
|
||
|
||
# Default (recommended)
|
||
gaze_estimator = MobileGaze() # Uses RESNET34
|
||
|
||
# Lightweight model
|
||
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
|
||
|
||
# Estimate gaze from face crop
|
||
pitch, yaw = gaze_estimator.estimate(face_crop)
|
||
print(f"Pitch: {np.degrees(pitch):.1f}°, Yaw: {np.degrees(yaw):.1f}°")
|
||
```
|
||
|
||
**Note**: Requires face crop as input. Use face detection first to obtain bounding boxes.
|
||
|
||
---
|
||
|
||
## Face Parsing Models
|
||
|
||
### BiSeNet Family
|
||
|
||
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
|
||
|
||
| Model Name | Params | Size | Classes | Use Case |
|
||
| -------------- | ------ | ------- | ------- | ----------------------------- |
|
||
| `RESNET18` ⭐ | 13.3M | 50.7 MB | 19 | **Recommended default** |
|
||
| `RESNET34` | 24.1M | 89.2 MB | 19 | Higher accuracy |
|
||
|
||
**19 Facial Component Classes:**
|
||
1. Background
|
||
2. Skin
|
||
3. Left Eyebrow
|
||
4. Right Eyebrow
|
||
5. Left Eye
|
||
6. Right Eye
|
||
7. Eye Glasses
|
||
8. Left Ear
|
||
9. Right Ear
|
||
10. Ear Ring
|
||
11. Nose
|
||
12. Mouth
|
||
13. Upper Lip
|
||
14. Lower Lip
|
||
15. Neck
|
||
16. Neck Lace
|
||
17. Cloth
|
||
18. Hair
|
||
19. Hat
|
||
|
||
**Dataset**: Trained on CelebAMask-HQ
|
||
**Architecture**: BiSeNet with ResNet backbone
|
||
**Input Size**: 512×512 (automatically resized)
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface.parsing import BiSeNet
|
||
from uniface.constants import ParsingWeights
|
||
from uniface.visualization import vis_parsing_maps
|
||
import cv2
|
||
|
||
# Default (recommended)
|
||
parser = BiSeNet() # Uses RESNET18
|
||
|
||
# Higher accuracy model
|
||
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||
|
||
# Parse face image (already cropped)
|
||
mask = parser.parse(face_image)
|
||
|
||
# Visualize with overlay
|
||
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||
|
||
# mask shape: (H, W) with values 0-18 representing classes
|
||
print(f"Detected {len(np.unique(mask))} facial components")
|
||
```
|
||
|
||
**Applications:**
|
||
- Face makeup and beauty applications
|
||
- Virtual try-on systems
|
||
- Face editing and manipulation
|
||
- Facial feature extraction
|
||
- Portrait segmentation
|
||
|
||
**Note**: Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
|
||
|
||
---
|
||
|
||
## Anti-Spoofing Models
|
||
|
||
### MiniFASNet Family
|
||
|
||
Lightweight face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).
|
||
|
||
| Model Name | Size | Scale | Use Case |
|
||
| ---------- | ------ | ----- | ----------------------------- |
|
||
| `V1SE` | 1.2 MB | 4.0 | Squeeze-and-excitation variant |
|
||
| `V2` ⭐ | 1.2 MB | 2.7 | **Recommended default** |
|
||
|
||
**Dataset**: Trained on face anti-spoofing datasets
|
||
**Output**: Returns (label_idx, score) where label_idx: 0=Fake, 1=Real
|
||
|
||
#### Usage
|
||
|
||
```python
|
||
from uniface import RetinaFace
|
||
from uniface.spoofing import MiniFASNet
|
||
from uniface.constants import MiniFASNetWeights
|
||
|
||
# Default (V2, recommended)
|
||
detector = RetinaFace()
|
||
spoofer = MiniFASNet()
|
||
|
||
# V1SE variant
|
||
spoofer = MiniFASNet(model_name=MiniFASNetWeights.V1SE)
|
||
|
||
# Detect and check liveness
|
||
faces = detector.detect(image)
|
||
for face in faces:
|
||
label_idx, score = spoofer.predict(image, face['bbox'])
|
||
# label_idx: 0 = Fake, 1 = Real
|
||
label = 'Real' if label_idx == 1 else 'Fake'
|
||
print(f"{label}: {score:.1%}")
|
||
```
|
||
|
||
**Note**: Requires face bounding box from a detector. Use with RetinaFace, SCRFD, or YOLOv5Face.
|
||
|
||
---
|
||
|
||
## Model Updates
|
||
|
||
Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
|
||
|
||
### Manual Model Management
|
||
|
||
```python
|
||
from uniface.model_store import verify_model_weights
|
||
from uniface.constants import RetinaFaceWeights
|
||
|
||
# Download specific model
|
||
model_path = verify_model_weights(
|
||
RetinaFaceWeights.MNET_V2,
|
||
root='./custom_cache'
|
||
)
|
||
|
||
# Models are verified with SHA-256 checksums
|
||
```
|
||
|
||
### Download All Models
|
||
|
||
```bash
|
||
# Using the provided script
|
||
python scripts/download_model.py
|
||
|
||
# Download specific model
|
||
python scripts/download_model.py --model MNET_V2
|
||
```
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
### Model Training & Architectures
|
||
|
||
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
|
||
- **YOLOv5-Face Original**: [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face) - Original PyTorch implementation
|
||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
||
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
|
||
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
|
||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
||
|
||
### Papers
|
||
|
||
- **RetinaFace**: [Single-Shot Multi-Level Face Localisation in the Wild](https://arxiv.org/abs/1905.00641)
|
||
- **SCRFD**: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714)
|
||
- **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
|
||
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
|
||
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
|
||
- **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)
|