# Model Zoo Complete guide to all available models and their performance characteristics. --- ## Face Detection Models ### RetinaFace Family RetinaFace models are trained on the [WIDER FACE](datasets.md#wider-face) dataset. | Model Name | Params | Size | Easy | Medium | Hard | | -------------- | ------ | ----- | ------ | ------ | ------ | | `MNET_025` | 0.4M | 1.7MB | 88.48% | 87.02% | 80.61% | | `MNET_050` | 1.0M | 2.6MB | 89.42% | 87.97% | 82.40% | | `MNET_V1` | 3.5M | 3.8MB | 90.59% | 89.14% | 84.13% | | `MNET_V2` :material-check-circle: | 3.2M | 3.5MB | 91.70% | 91.03% | 86.60% | | `RESNET18` | 11.7M | 27MB | 92.50% | 91.02% | 86.63% | | `RESNET34` | 24.8M | 56MB | 94.16% | 93.12% | 88.90% | !!! info "Accuracy & Benchmarks" **Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) - from [RetinaFace paper](https://arxiv.org/abs/1905.00641) **Speed**: Benchmark on your own hardware using `python tools/detect.py --source ` --- ### SCRFD Family SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models trained on [WIDER FACE](datasets.md#wider-face) dataset. | Model Name | Params | Size | Easy | Medium | Hard | | ---------------- | ------ | ----- | ------ | ------ | ------ | | `SCRFD_500M_KPS` | 0.6M | 2.5MB | 90.57% | 88.12% | 68.51% | | `SCRFD_10G_KPS` :material-check-circle: | 4.2M | 17MB | 95.16% | 93.87% | 83.05% | !!! info "Accuracy & Benchmarks" **Accuracy**: WIDER FACE validation set - from [SCRFD paper](https://arxiv.org/abs/2105.04714) **Speed**: Benchmark on your own hardware using `python tools/detect.py --source ` --- ### YOLOv5-Face Family YOLOv5-Face models provide detection with 5-point facial landmarks, trained on [WIDER FACE](datasets.md#wider-face) dataset. | Model Name | Size | Easy | Medium | Hard | | -------------- | ---- | ------ | ------ | ------ | | `YOLOV5N` | 11MB | 93.61% | 91.52% | 80.53% | | `YOLOV5S` :material-check-circle: | 28MB | 94.33% | 92.61% | 83.15% | | `YOLOV5M` | 82MB | 95.30% | 93.76% | 85.28% | !!! info "Accuracy & Benchmarks" **Accuracy**: WIDER FACE validation set - from [YOLOv5-Face paper](https://arxiv.org/abs/2105.12931) **Speed**: Benchmark on your own hardware using `python tools/detect.py --source ` !!! note "Fixed Input Size" All YOLOv5-Face models use a fixed input size of 640×640. --- ### YOLOv8-Face Family YOLOv8-Face models use anchor-free design with DFL (Distribution Focal Loss) for bbox regression. Provides detection with 5-point facial landmarks. | Model Name | Size | Easy | Medium | Hard | | ---------------- | ------ | ------ | ------ | ------ | | `YOLOV8_LITE_S`| 7.4MB | 93.4% | 91.2% | 78.6% | | `YOLOV8N` :material-check-circle: | 12MB | 94.6% | 92.3% | 79.6% | !!! info "Accuracy & Benchmarks" **Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) **Speed**: Benchmark on your own hardware using `python tools/detect.py --source --method yolov8face` !!! note "Fixed Input Size" All YOLOv8-Face models use a fixed input size of 640×640. --- ## Face Recognition Models ### AdaFace Face recognition using adaptive margin based on image quality. | Model Name | Backbone | Dataset | Size | IJB-B TAR | IJB-C TAR | | ----------- | -------- | ----------- | ------ | --------- | --------- | | `IR_18` :material-check-circle: | IR-18 | WebFace4M | 92 MB | 93.03% | 94.99% | | `IR_101` | IR-101 | WebFace12M | 249 MB | - | 97.66% | !!! info "Training Data & Accuracy" **Dataset**: [WebFace4M / WebFace12M](datasets.md#webface4m--webface12m) (4M / 12M images) **Accuracy**: IJB-B and IJB-C benchmarks, TAR@FAR=0.01% !!! tip "Key Innovation" AdaFace introduces adaptive margin that adjusts based on image quality, providing better performance on low-quality images compared to fixed-margin approaches. --- ### ArcFace Face recognition using additive angular margin loss. | Model Name | Backbone | Params | Size | LFW | CFP-FP | AgeDB-30 | IJB-C | | ----------- | --------- | ------ | ----- | ------ | ------ | -------- | ----- | | `MNET` :material-check-circle: | MobileNet | 2.0M | 8MB | 99.70% | 98.00% | 96.58% | 95.02% | | `RESNET` | ResNet50 | 43.6M | 166MB | 99.83% | 99.33% | 98.23% | 97.25% | !!! info "Training Data" **Dataset**: Trained on [WebFace600K](datasets.md#webface600k) (600K images) **Accuracy**: IJB-C accuracy reported as TAR@FAR=1e-4 --- ### MobileFace Lightweight face recognition models with MobileNet backbones. | Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | | ----------------- | ---------------- | ------ | ---- | ------ | ------ | ------ | -------- | | `MNET_025` | MobileNetV1 0.25 | 0.36M | 1MB | 98.76% | 92.02% | 82.37% | 90.02% | | `MNET_V2` :material-check-circle: | MobileNetV2 | 2.29M | 4MB | 99.55% | 94.87% | 86.89% | 95.16% | | `MNET_V3_SMALL` | MobileNetV3-S | 1.25M | 3MB | 99.30% | 93.77% | 85.29% | 92.79% | | `MNET_V3_LARGE` | MobileNetV3-L | 3.52M | 10MB | 99.53% | 94.56% | 86.79% | 95.13% | !!! info "Training Data" **Dataset**: Trained on [MS1MV2](datasets.md#ms1mv2) (5.8M images, 85K identities) **Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks --- ### SphereFace Face recognition using angular softmax loss. | Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 | | ------------ | -------- | ------ | ---- | ------ | ------ | ------ | -------- | | `SPHERE20` | Sphere20 | 24.5M | 50MB | 99.67% | 95.61% | 88.75% | 96.58% | | `SPHERE36` | Sphere36 | 34.6M | 92MB | 99.72% | 95.64% | 89.92% | 96.83% | !!! info "Training Data" **Dataset**: Trained on [MS1MV2](datasets.md#ms1mv2) (5.8M images, 85K identities) **Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks !!! note "Architecture" SphereFace uses angular softmax loss, an earlier approach before ArcFace. These models provide good accuracy with moderate resource requirements. --- ### EdgeFace Efficient face recognition designed for edge devices, using EdgeNeXt backbone with optional LoRA compression. | Model Name | Backbone | Params | MFLOPs | Size | LFW | CALFW | CPLFW | CFP-FP | AgeDB-30 | | --------------- | -------- | ------ | ------ | ----- | ------ | ------ | ------ | ------ | -------- | | `XXS` :material-check-circle: | EdgeNeXt | 1.24M | 94 | ~5 MB | 99.57% | 94.83% | 90.27% | 93.63% | 94.92% | | `XS_GAMMA_06` | EdgeNeXt | 1.77M | 154 | ~7 MB | 99.73% | 95.28% | 91.58% | 94.71% | 96.08% | | `S_GAMMA_05` | EdgeNeXt | 3.65M | 306 | ~14 MB | 99.78% | 95.55% | 92.48% | 95.74% | 97.03% | | `BASE` | EdgeNeXt | 18.2M | 1399 | ~70 MB | 99.83% | 96.07% | 93.75% | 97.01% | 97.60% | !!! info "Training Data & Reference" **Paper**: [EdgeFace: Efficient Face Recognition Model for Edge Devices](https://arxiv.org/abs/2307.01838v2) (IEEE T-BIOM 2024) **Source**: [github.com/otroshi/edgeface](https://github.com/otroshi/edgeface) | [github.com/yakhyo/edgeface-onnx](https://github.com/yakhyo/edgeface-onnx) --- ## Facial Landmark Models ### 106-Point Landmark Detection Facial landmark localization model. | Model Name | Points | Params | Size | | ---------- | ------ | ------ | ---- | | `2D106` | 106 | 3.7M | 14MB | **Landmark Groups:** | Group | Points | Count | |-------|--------|-------| | Face contour | 0-32 | 33 points | | Eyebrows | 33-50 | 18 points | | Nose | 51-62 | 12 points | | Eyes | 63-86 | 24 points | | Mouth | 87-105 | 19 points | --- ## Attribute Analysis Models ### Age & Gender Detection | Model Name | Attributes | Params | Size | | ----------- | ----------- | ------ | ---- | | `AgeGender` | Age, Gender | 2.1M | 8MB | !!! info "Training Data" **Dataset**: Trained on [CelebA](datasets.md#celeba) !!! warning "Accuracy Note" Accuracy varies by demographic and image quality. Test on your specific use case. --- ### FairFace Attributes | Model Name | Attributes | Params | Size | | ----------- | --------------------- | ------ | ----- | | `FairFace` | Race, Gender, Age Group | - | 44MB | !!! info "Training Data" **Dataset**: Trained on [FairFace](datasets.md#fairface) dataset with balanced demographics !!! tip "Equitable Predictions" FairFace provides more equitable predictions across different racial and gender groups. **Race Categories (7):** White, Black, Latino Hispanic, East Asian, Southeast Asian, Indian, Middle Eastern **Age Groups (9):** 0-2, 3-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70+ --- ### Emotion Detection | Model Name | Classes | Params | Size | | ------------- | ------- | ------ | ---- | | `AFFECNET7` | 7 | 0.5M | 2MB | | `AFFECNET8` | 8 | 0.5M | 2MB | **Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Angry **Classes (8)**: Above + Contempt !!! info "Training Data" **Dataset**: Trained on [AffectNet](datasets.md#affectnet) !!! note "Accuracy Note" Emotion detection accuracy depends heavily on facial expression clarity and cultural context. --- ## Gaze Estimation Models ### MobileGaze Family Gaze direction prediction models trained on [Gaze360](datasets.md#gaze360) dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians. | Model Name | Params | Size | MAE* | | -------------- | ------ | ------- | ----- | | `RESNET18` | 11.7M | 43 MB | 12.84 | | `RESNET34` :material-check-circle: | 24.8M | 81.6 MB | 11.33 | | `RESNET50` | 25.6M | 91.3 MB | 11.34 | | `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 | | `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 | *MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better !!! info "Training Data" **Dataset**: Trained on [Gaze360](datasets.md#gaze360) (indoor/outdoor scenes with diverse head poses) **Training**: 200 epochs with classification-based approach (binned angles) !!! note "Input Requirements" Requires face crop as input. Use face detection first to obtain bounding boxes. --- ## Head Pose Estimation Models ### HeadPose Family Head pose estimation models using 6D rotation representation. Trained on [300W-LP](datasets.md#300w-lp) dataset, evaluated on AFLW2000. Returns pitch, yaw, and roll angles in degrees. | Model Name | Backbone | Size | MAE* | | -------------- | -------- | ------- | ----- | | `RESNET18` :material-check-circle: | ResNet18 | 43 MB | 5.22° | | `RESNET34` | ResNet34 | 82 MB | 5.07° | | `RESNET50` | ResNet50 | 91 MB | 4.83° | | `MOBILENET_V2` | MobileNetV2 | 9.6 MB | 5.72° | | `MOBILENET_V3_SMALL` | MobileNetV3-Small | 4.8 MB | 6.31° | | `MOBILENET_V3_LARGE` | MobileNetV3-Large | 16 MB | 5.58° | *MAE (Mean Absolute Error) in degrees on AFLW2000 test set — lower is better !!! info "Training Data" **Dataset**: Trained on [300W-LP](datasets.md#300w-lp) (synthesized large-pose faces from 300W) **Method**: 6D rotation representation (rotation matrix → Euler angles) !!! note "Input Requirements" Requires face crop as input. Use face detection first to obtain bounding boxes. --- ## Face Parsing Models ### BiSeNet Family BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes. | Model Name | Params | Size | Classes | | -------------- | ------ | ------- | ------- | | `RESNET18` :material-check-circle: | 13.3M | 50.7 MB | 19 | | `RESNET34` | 24.1M | 89.2 MB | 19 | !!! info "Training Data" **Dataset**: Trained on [CelebAMask-HQ](datasets.md#celebamask-hq) **Architecture**: BiSeNet with ResNet backbone **Input Size**: 512×512 (automatically resized) **19 Facial Component Classes:** | # | Class | # | Class | # | Class | |---|-------|---|-------|---|-------| | 0 | Background | 7 | Left Ear | 14 | Neck | | 1 | Skin | 8 | Right Ear | 15 | Neck Lace | | 2 | Left Eyebrow | 9 | Ear Ring | 16 | Cloth | | 3 | Right Eyebrow | 10 | Nose | 17 | Hair | | 4 | Left Eye | 11 | Mouth | 18 | Hat | | 5 | Right Eye | 12 | Upper Lip | | | | 6 | Eye Glasses | 13 | Lower Lip | | | **Applications:** - Face makeup and beauty applications - Virtual try-on systems - Face editing and manipulation - Facial feature extraction - Portrait segmentation !!! note "Input Requirements" Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops. --- ### XSeg XSeg from DeepFaceLab outputs masks for face regions. Requires 5-point landmarks for face alignment. | Model Name | Size | Output | |------------|--------|--------| | `DEFAULT` | 67 MB | Mask [0, 1] | !!! info "Model Details" **Origin**: DeepFaceLab **Input**: NHWC format, normalized to [0, 1] **Alignment**: Requires 5-point landmarks (not bbox crops) **Applications:** - Face region extraction - Face swapping pipelines - Occlusion handling !!! note "Input Requirements" Requires 5-point facial landmarks. Use a face detector like RetinaFace to obtain landmarks first. --- ## Portrait Matting Models ### MODNet MODNet (Real-Time Trimap-Free Portrait Matting) produces soft alpha mattes from full images without requiring a trimap. Uses MobileNetV2 backbone with low-resolution, high-resolution, and fusion branches. | Model Name | Variant | Size | Use Case | | ---------- | ------- | ---- | -------- | | `PHOTOGRAPHIC` :material-check-circle: | High-quality | 25 MB | Portrait photos | | `WEBCAM` | Real-time | 25 MB | Webcam feeds | !!! info "Model Details" **Paper**: [MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition](https://arxiv.org/abs/2011.11961) (AAAI 2022) **Source**: [yakhyo/modnet](https://github.com/yakhyo/modnet) — ported weights and clean inference codebase **Output**: Alpha matte `(H, W)` in `[0, 1]` **Applications:** - Background removal / replacement - Green screen compositing - Video conferencing virtual backgrounds - Portrait editing !!! note "Input Requirements" Operates on full images (not face crops). No trimap or face detection required. --- ## Anti-Spoofing Models ### MiniFASNet Family Face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask). | Model Name | Size | Scale | | ---------- | ------ | ----- | | `V1SE` | 1.2 MB | 4.0 | | `V2` :material-check-circle: | 1.2 MB | 2.7 | !!! info "Output Format" **Output**: Returns `SpoofingResult(is_real, confidence)` where is_real: True=Real, False=Fake !!! note "Input Requirements" Requires face bounding box from a detector. --- ## Model Management Models are automatically downloaded and cached on first use. - **Cache location**: `~/.uniface/models/` (configurable via `set_cache_dir()` or `UNIFACE_CACHE_DIR` env var) - **Inspect cache path**: `get_cache_dir()` returns the resolved active path - **Verification**: Models are verified with SHA-256 checksums - **Concurrent download**: `download_models([...])` fetches multiple models in parallel - **Manual download**: Use `python tools/download_model.py` to pre-download models See [Model Cache & Offline Use](concepts/model-cache-offline.md) for full details. --- ## References ### Model Training & Architectures - **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code - **YOLOv5-Face Original**: [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face) - Original PyTorch implementation - **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation - **YOLOv8-Face Original**: [derronqi/yolov8-face](https://github.com/derronqi/yolov8-face) - Original PyTorch implementation - **YOLOv8-Face ONNX**: [yakhyo/yolov8-face-onnx-inference](https://github.com/yakhyo/yolov8-face-onnx-inference) - ONNX inference implementation - **AdaFace Original**: [mk-minchul/AdaFace](https://github.com/mk-minchul/AdaFace) - Original PyTorch implementation - **AdaFace ONNX**: [yakhyo/adaface-onnx](https://github.com/yakhyo/adaface-onnx) - ONNX export and inference - **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code - **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights - **Head Pose Estimation**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) - 6D rotation head pose estimation training and ONNX models - **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights - **Face Segmentation**: [yakhyo/face-segmentation](https://github.com/yakhyo/face-segmentation) - XSeg ONNX Inference - **Portrait Matting**: [yakhyo/modnet](https://github.com/yakhyo/modnet) - MODNet ported weights and inference (from [ZHKKKe/MODNet](https://github.com/ZHKKKe/MODNet)) - **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing)) - **FairFace**: [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx) - FairFace ONNX inference for race, gender, age prediction - **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights ### Papers - **RetinaFace**: [Single-Shot Multi-Level Face Localisation in the Wild](https://arxiv.org/abs/1905.00641) - **SCRFD**: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714) - **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931) - **AdaFace**: [AdaFace: Quality Adaptive Margin for Face Recognition](https://arxiv.org/abs/2204.00964) - **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698) - **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063) - **MODNet**: [Real-Time Trimap-Free Portrait Matting via Objective Decomposition](https://arxiv.org/abs/2011.11961) - **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)