feat: Face detection module has been updated

This commit is contained in:
yakhyo
2025-07-02 16:32:50 +09:00
parent fb29a919b1
commit b15504dfc5
21 changed files with 656 additions and 157 deletions

View File

@@ -19,14 +19,14 @@
## Features
| Date | Feature Description |
| ---------- | --------------------------------------------------------------------------------------------------------------- |
| Planned | 🎭 **Age and Gender Detection**: Planned feature for predicting age and gender from facial images. |
| Planned | 🧩 **Face Recognition**: Upcoming capability to identify and verify faces. |
| 2024-11-21 | 🔄 **Face Alignment**: Added precise face alignment for better downstream tasks. |
| 2024-11-20 | ⚡ **High-Speed Face Detection**: ONNX model integration for faster and efficient face detection. |
| 2024-11-20 | 🎯 **Facial Landmark Localization**: Accurate detection of key facial features like eyes, nose, and mouth. |
| 2024-11-20 | 🛠 **API for Inference and Visualization**: Simplified API for seamless inference and visual results generation. |
| Date | Feature Description |
| ---------- | --------------------------------------------------------------------------------------------------------------------- |
| Planned | 🎭**Age and Gender Detection**: Planned feature for predicting age and gender from facial images. |
| Planned | 🧩**Face Recognition**: Upcoming capability to identify and verify faces. |
| 2024-11-21 | 🔄**Face Alignment**: Added precise face alignment for better downstream tasks. |
| 2024-11-20 | ⚡**High-Speed Face Detection**: ONNX model integration for faster and efficient face detection. |
| 2024-11-20 | 🎯**Facial Landmark Localization**: Accurate detection of key facial features like eyes, nose, and mouth. |
| 2024-11-20 | 🛠**API for Inference and Visualization**: Simplified API for seamless inference and visual results generation. |
---
@@ -43,7 +43,7 @@ To work with the latest version of **UniFace**, which may not yet be released on
```bash
git clone https://github.com/yakhyo/uniface.git
cd uniface
pip install .
pip install -e .
```
---
@@ -179,13 +179,13 @@ cv2.destroyAllWindows()
### Evaluation results of available models on WiderFace
| RetinaFace Models | Easy | Medium | Hard |
| ------------------ | ---------- | ---------- | ---------- |
| retinaface_mnet025 | 88.48% | 87.02% | 80.61% |
| retinaface_mnet050 | 89.42% | 87.97% | 82.40% |
| retinaface_mnet_v1 | 90.59% | 89.14% | 84.13% |
| retinaface_mnet_v2 | 91.70% | 91.03% | 86.60% |
| retinaface_r18 | 92.50% | 91.02% | 86.63% |
| RetinaFace Models | Easy | Medium | Hard |
| ------------------ | ---------------- | ---------------- | ---------------- |
| retinaface_mnet025 | 88.48% | 87.02% | 80.61% |
| retinaface_mnet050 | 89.42% | 87.97% | 82.40% |
| retinaface_mnet_v1 | 90.59% | 89.14% | 84.13% |
| retinaface_mnet_v2 | 91.70% | 91.03% | 86.60% |
| retinaface_r18 | 92.50% | 91.02% | 86.63% |
| retinaface_r34 | **94.16%** | **93.12%** | **88.90%** |
<div align="center">

View File

@@ -1,3 +1,37 @@
# Installation
# 🚀 Installation
Instructions to install UniFace.
## 📦 Install from PyPI
### CPU-only (default):
```bash
pip install uniface
```
This installs the CPU-compatible version of ONNX Runtime (`onnxruntime`) and all core dependencies.
### GPU support:
```bash
pip install "uniface[gpu]"
```
This installs `onnxruntime-gpu` for accelerated inference on supported NVIDIA GPUs.
Make sure your system meets the [ONNX Runtime GPU requirements](https://onnxruntime.ai/docs/build/eps.html#cuda).
---
## 🔧 Install from GitHub (latest version)
Clone the repository and install it manually:
```bash
git clone https://github.com/yakhyo/uniface.git
cd uniface
# CPU version
pip install .
# Or with GPU support
pip install ".[gpu]"
```

View File

@@ -0,0 +1,9 @@
# Facial Attribute API Reference
# Age and Gender Model
::: uniface.attribute.age_gender.AgeGender
# Emotion Model
:::uniface.attribute.emotion.Emotion

View File

@@ -1,3 +1,10 @@
# Detection API Reference
# Face Detection API Reference
::: uniface.RetinaFace
# RetinaFace
::: uniface.detection.retinaface.RetinaFace
# SCRFD
::: uniface.detection.scrfd.SCRFD

View File

@@ -0,0 +1,5 @@
# Landmark API Reference
# Landmark Model
::: uniface.landmark.model.Landmark

View File

@@ -1,13 +1,17 @@
# Recognition API Reference
# Face Recognition API Reference
# SphereFace
# SphereFace Model
::: uniface.recognition.models.SphereFace
# MobileFace
# MobileFace Model
:::uniface.recognition.models.MobileFace
# ArcFace
# Base Face Encoder Model
:::uniface.recognition.base.BaseFaceEncoder
:::uniface.recognition.models.ArcFace
# BaseFaceEncoder class
:::uniface.recognition.base.BaseFaceEncoder

View File

@@ -30,7 +30,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@@ -265,7 +265,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "base",
"display_name": "uniface",
"language": "python",
"name": "python3"
},
@@ -279,7 +279,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
"version": "3.12.11"
}
},
"nbformat": 4,

View File

@@ -70,6 +70,8 @@ nav:
- API Reference:
- Detection: reference/detection.md
- Recognition: reference/recognition.md
- Landmark: reference/landmark.md
- Attribute: reference/attribute.md
- About:
- Changelog: about/changelog.md
- License: about/license.md

View File

@@ -1 +1,3 @@
mkdocs-material
mkdocs-minify-plugin
mkdocstrings[python]

View File

@@ -29,11 +29,11 @@ def run_inference(detector, recognizer, image_path):
print(f"Detected {len(boxes)} face(s). Extracting embeddings...")
for i, landmark in enumerate(landmarks):
for i, landmark in enumerate(landmarks[:1]):
embedding = recognizer.get_embedding(image, landmark)
norm = np.linalg.norm(embedding)
print(f"\nFace {i} embedding (L2 norm = {norm:.4f}):")
print(embedding)
norm_embedding = recognizer.get_normalized_embedding(image, landmark)
print("embedding:", np.sum(embedding))
print("norm embedding:",np.sum(norm_embedding))
def main():

View File

@@ -15,21 +15,32 @@ __license__ = "MIT"
__author__ = "Yakhyokhuja Valikhujaev"
__version__ = "0.1.8"
from .detection import detect_faces, create_detector, list_available_detectors
from uniface.retinaface import RetinaFace
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.face_utils import face_alignment, compute_similarity
from uniface.model_store import verify_model_weights
from uniface.visualization import draw_detections
from uniface.log import Logger
__all__ = [
# Metadata
"__version__",
"__author__",
"__license__",
"RetinaFace",
"Logger",
"verify_model_weights",
"draw_detections",
# Core functions
'detect_faces',
'create_detector',
'list_available_detectors',
# Utility functions
"face_alignment",
"compute_similarity",
"verify_model_weights",
"draw_detections",
# Classes
"Logger",
]

View File

@@ -1,6 +1,11 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
import onnxruntime as ort
from typing import Tuple
from uniface.log import Logger
@@ -14,10 +19,24 @@ __all__ = ["AgeGender"]
class AgeGender:
"""
Age and Gender Prediction Model.
Age and gender prediction model using ONNX Runtime.
This model predicts both a person's gender (male/female) and age from a facial image.
Gender is returned as an integer (0: female, 1: male) and age as years.
Loads a pretrained ONNX model to predict both age (in years) and gender
(0: female, 1: male) from a detected face region. Handles model loading,
preprocessing, inference, and output interpretation.
Attributes:
input_size (Tuple[int, int]): Model's expected input resolution (width, height).
input_mean (float): Mean value used for input normalization.
input_std (float): Standard deviation used for input normalization.
model_path (str): Path to the verified ONNX model file.
session (onnxruntime.InferenceSession): ONNX Runtime session for inference.
input_names (List[str]): List of input node names.
output_names (List[str]): List of output node names.
Args:
model_name (AgeGenderWeights): Enum specifying the age-gender model to load.
input_size (Tuple[int, int]): Resolution for model input; defaults to (112, 112).
"""
def __init__(

View File

@@ -3,9 +3,8 @@
# GitHub: https://github.com/yakhyo
import cv2
import numpy as np
import torch
from PIL import Image
import numpy as np
from typing import Tuple, Union

View File

@@ -1,3 +1,145 @@
from .retinaface import RetinaFace
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import numpy as np
from typing import Tuple, Dict, Any, List
from .scrfd import SCRFD
from uniface.visualization import draw_detections
from .base import BaseDetector
from .retinaface import RetinaFace
# Global cache for detector instances
_detector_cache: Dict[str, BaseDetector] = {}
def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> List[Dict[str, Any]]:
"""
High-level face detection function.
Args:
image (np.ndarray): Input image as numpy array.
method (str): Detection method to use. Options: 'retinaface', 'scrfd'.
**kwargs: Additional arguments passed to the detector.
Returns:
List[Dict[str, Any]]: A list of dictionaries, where each dictionary represents a detected face and contains:
- 'bbox' (List[float]): [x1, y1, x2, y2] bounding box coordinates.
- 'confidence' (float): The confidence score of the detection.
- 'landmarks' (List[List[float]]): 5-point facial landmarks.
Example:
>>> from uniface import detect_faces
>>> image = cv2.imread("your_image.jpg")
>>> faces = detect_faces(image, method='retinaface', conf_thresh=0.8)
>>> for face in faces:
... print(f"Found face with confidence: {face['confidence']}")
... print(f"BBox: {face['bbox']}")
"""
method_name = method.lower()
sorted_kwargs = sorted(kwargs.items())
cache_key = f"{method_name}_{str(sorted_kwargs)}"
if cache_key not in _detector_cache:
# Pass kwargs to create the correctly configured detector
_detector_cache[cache_key] = create_detector(method, **kwargs)
detector = _detector_cache[cache_key]
return detector.detect(image)
def create_detector(method: str = 'retinaface', **kwargs) -> BaseDetector:
"""
Factory function to create face detectors.
Args:
method (str): Detection method. Options:
- 'retinaface': RetinaFace detector (default)
- 'scrfd': SCRFD detector (fast and accurate)
**kwargs: Detector-specific parameters
Returns:
BaseDetector: Initialized detector instance
Raises:
ValueError: If method is not supported
Examples:
>>> # Basic usage
>>> detector = create_detector('retinaface')
>>> # SCRFD detector with custom parameters
>>> detector = create_detector(
... 'scrfd',
... model_name=SCRFDWeights.SCRFD_10G_KPS,
... conf_thresh=0.8,
... input_size=(640, 640)
... )
>>> # RetinaFace detector
>>> detector = create_detector(
... 'retinaface',
... model_name=RetinaFaceWeights.MNET_V2,
... conf_thresh=0.8,
... nms_thresh=0.4
... )
"""
method = method.lower()
if method == 'retinaface':
return RetinaFace(**kwargs)
elif method == 'scrfd':
return SCRFD(**kwargs)
else:
available_methods = ['retinaface', 'scrfd']
raise ValueError(
f"Unsupported detection method: '{method}'. "
f"Available methods: {available_methods}"
)
def list_available_detectors() -> Dict[str, Dict[str, Any]]:
"""
List all available detection methods with their descriptions and parameters.
Returns:
Dict[str, Dict[str, Any]]: Dictionary of detector information
"""
return {
'retinaface': {
'description': 'RetinaFace detector with high accuracy',
'supports_landmarks': True,
'paper': 'https://arxiv.org/abs/1905.00641',
'default_params': {
'model_name': 'mnet_v2',
'conf_thresh': 0.5,
'nms_thresh': 0.4,
'input_size': (640, 640)
}
},
'scrfd': {
'description': 'SCRFD detector - fast and accurate with efficient architecture',
'supports_landmarks': True,
'paper': 'https://arxiv.org/abs/2105.04714',
'default_params': {
'model_name': 'scrfd_10g_kps',
'conf_thresh': 0.5,
'nms_thresh': 0.4,
'input_size': (640, 640)
}
}
}
__all__ = [
'detect_faces',
'create_detector',
'list_available_detectors',
'SCRFD',
'RetinaFace',
'BaseDetector',
]

100
uniface/detection/base.py Normal file
View File

@@ -0,0 +1,100 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
"""
Base classes for face detection.
"""
import numpy as np
from abc import ABC, abstractmethod
from typing import Tuple, Dict, Any
class BaseDetector(ABC):
"""
Abstract base class for all face detectors.
This class defines the interface that all face detectors must implement,
ensuring consistency across different detection methods.
"""
def __init__(self, **kwargs):
"""Initialize the detector with configuration parameters."""
self.config = kwargs
@abstractmethod
def detect(self, image: np.ndarray, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
"""
Detect faces in an image.
Args:
image (np.ndarray): Input image as numpy array with shape (H, W, C)
**kwargs: Additional detection parameters
Returns:
Tuple[np.ndarray, np.ndarray]: (detections, landmarks)
- detections: Bounding boxes with confidence scores, shape (N, 5)
Format: [x_min, y_min, x_max, y_max, confidence]
- landmarks: Facial landmark points, shape (N, 5, 2) for 5-point landmarks
or (N, 68, 2) for 68-point landmarks. Empty array if not supported.
"""
pass
@abstractmethod
def preprocess(self, image: np.ndarray) -> np.ndarray:
"""
Preprocess input image for detection.
Args:
image (np.ndarray): Input image
Returns:
np.ndarray: Preprocessed image tensor
"""
pass
@abstractmethod
def postprocess(self, outputs, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
"""
Postprocess model outputs to get final detections.
Args:
outputs: Raw model outputs
**kwargs: Additional postprocessing parameters
Returns:
Tuple[np.ndarray, np.ndarray]: (detections, landmarks)
"""
pass
def __str__(self) -> str:
"""String representation of the detector."""
return f"{self.__class__.__name__}({self.config})"
def __repr__(self) -> str:
"""Detailed string representation."""
return self.__str__()
@property
def supports_landmarks(self) -> bool:
"""
Whether this detector supports landmark detection.
Returns:
bool: True if landmarks are supported, False otherwise
"""
return hasattr(self, '_supports_landmarks') and self._supports_landmarks
def get_info(self) -> Dict[str, Any]:
"""
Get detector information and configuration.
Returns:
Dict[str, Any]: Detector information
"""
return {
'name': self.__class__.__name__,
'supports_landmarks': self._supports_landmarks,
'config': self.config
}

View File

@@ -2,16 +2,16 @@
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import os
import cv2
import numpy as np
import onnxruntime as ort
from typing import Tuple, List, Optional, Literal
from typing import Tuple, List, Literal, Dict, Any
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.constants import RetinaFaceWeights
from .base import BaseDetector
from .utils import (
non_max_supression,
resize_image,
@@ -21,64 +21,64 @@ from .utils import (
)
class RetinaFace:
class RetinaFace(BaseDetector):
"""
Face detector based on the RetinaFace architecture.
Title: "RetinaFace: Single-stage Dense Face Localisation in the Wild"
Paper: https://arxiv.org/abs/1905.00641
Args:
model_name (RetinaFaceWeights): Model weights to use. Defaults to `RetinaFaceWeights.MNET_V2`.
conf_thresh (float): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float): Non-maximum suppression (NMS) threshold. Defaults to 0.4.
pre_nms_topk (int): Number of top-scoring boxes considered before applying NMS. Defaults to 5000.
post_nms_topk (int): Maximum number of final detections retained after NMS. Defaults to 750.
dynamic_size (bool): If True, anchors are generated dynamically per input image size. Defaults to False.
input_size (Tuple[int, int]): Fixed input size (width, height) used when `dynamic_size` is False. Ignored if `dynamic_size=True`.
**kwargs: Keyword arguments passed to BaseDetector and RetinaFace. Supported keys include:
model_name (RetinaFaceWeights, optional): Model weights to use. Defaults to `RetinaFaceWeights.MNET_V2`.
conf_thresh (float, optional): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float, optional): Non-maximum suppression (NMS) IoU threshold. Defaults to 0.4.
pre_nms_topk (int, optional): Number of top-scoring boxes considered before NMS. Defaults to 5000.
post_nms_topk (int, optional): Max number of detections kept after NMS. Defaults to 750.
dynamic_size (bool, optional): If True, generate anchors dynamically per input image. Defaults to False.
input_size (Tuple[int, int], optional): Fixed input size (width, height) if `dynamic_size=False`. Defaults to (640, 640).
Attributes:
conf_thresh (float): Threshold for filtering detections based on confidence score.
nms_thresh (float): IoU threshold for NMS.
pre_nms_topk (int): Limit on boxes considered before NMS.
post_nms_topk (int): Limit on detections kept after NMS.
dynamic_size (bool): Whether anchors are generated dynamically.
input_size (Tuple[int, int]): Static input size when `dynamic_size` is False.
_model_path (str): Path to verified model weights. (Internal)
_priors (np.ndarray): Anchor boxes used for detection. Precomputed if static input size is used. (Internal)
model_name (RetinaFaceWeights): Selected model variant.
conf_thresh (float): Threshold for confidence-based filtering.
nms_thresh (float): IoU threshold used for NMS.
pre_nms_topk (int): Limit on proposals before applying NMS.
post_nms_topk (int): Limit on retained detections after NMS.
dynamic_size (bool): Flag indicating dynamic or static input sizing.
input_size (Tuple[int, int]): Static input size if `dynamic_size=False`.
_model_path (str): Absolute path to the verified model weights.
_priors (np.ndarray): Precomputed anchor boxes (if static size).
_supports_landmarks (bool): Indicates landmark prediction support.
Raises:
ValueError: If model weights are invalid or not found.
RuntimeError: If the model fails to initialize.
ValueError: If the model weights are invalid or not found.
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(
self,
model_name: RetinaFaceWeights = RetinaFaceWeights.MNET_V2,
conf_thresh: float = 0.5,
nms_thresh: float = 0.4,
pre_nms_topk: int = 5000,
post_nms_topk: int = 750,
dynamic_size: bool = False,
input_size: Tuple[int, int] = (640, 640), # Default input size if dynamic_size=False
) -> None:
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._supports_landmarks = True # RetinaFace supports landmarks
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
self.pre_nms_topk = pre_nms_topk
self.post_nms_topk = post_nms_topk
self.dynamic_size = dynamic_size
self.input_size = input_size
self.model_name = kwargs.get('model_name', RetinaFaceWeights.MNET_V2)
self.conf_thresh = kwargs.get('conf_thresh', 0.5)
self.nms_thresh = kwargs.get('nms_thresh', 0.4)
self.pre_nms_topk = kwargs.get('pre_nms_topk', 5000)
self.post_nms_topk = kwargs.get('post_nms_topk', 750)
self.dynamic_size = kwargs.get('dynamic_size', False)
self.input_size = kwargs.get('input_size', (640, 640))
Logger.info(
f"Initializing RetinaFace with model={model_name}, conf_thresh={conf_thresh}, nms_thresh={nms_thresh}, "
f"input_size={input_size}"
f"Initializing RetinaFace with model={self.model_name}, conf_thresh={self.conf_thresh}, nms_thresh={self.nms_thresh}, "
f"input_size={self.input_size}"
)
# Get path to model weights
self._model_path = verify_model_weights(model_name)
self._model_path = verify_model_weights(self.model_name)
Logger.info(f"Verified model weights located at: {self._model_path}")
# Precompute anchors if using static size
if not dynamic_size and input_size is not None:
self._priors = generate_anchors(image_size=input_size)
if not self.dynamic_size and self.input_size is not None:
self._priors = generate_anchors(image_size=self.input_size)
Logger.debug("Generated anchors for static input size.")
# Initialize model
@@ -137,7 +137,7 @@ class RetinaFace:
max_num: int = 0,
metric: Literal["default", "max"] = "max",
center_weight: float = 2.0
) -> Tuple[np.ndarray, np.ndarray]:
) -> List[Dict[str, Any]]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
@@ -151,9 +151,10 @@ class RetinaFace:
when using the "default" metric. Defaults to 2.0.
Returns:
Tuple[np.ndarray, np.ndarray]:
- detections: Bounding boxes with confidence scores. Shape (N, 5), each row as [x_min, y_min, x_max, y_max, score].
- landmarks: Facial landmark coordinates. Shape (N, 5, 2), where each row contains 5 (x, y) points.
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
- 'bbox': [x1, y1, x2, y2] - Bounding box coordinates
- 'confidence': float - Detection confidence score
- 'landmarks': [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5]] - 5-point facial landmarks
"""
original_height, original_width = image.shape[:2]
@@ -198,7 +199,16 @@ class RetinaFace:
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
return detections, landmarks
faces = []
for i in range(detections.shape[0]):
face_dict = {
'bbox': detections[i, :4].astype(float).tolist(),
'confidence': detections[i, 4].item(),
'landmarks': landmarks[i].astype(float).tolist()
}
faces.append(face_dict)
return faces
def postprocess(self, outputs: List[np.ndarray], resize_factor: float, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]:
"""
@@ -259,3 +269,64 @@ class RetinaFace:
landmarks = landmarks * landmark_scale / resize_factor
return boxes, landmarks
# TODO: below is only for testing, remove it later
def draw_bbox(frame, bbox, score, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = map(int, bbox) # Unpack 4 bbox values
cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness)
cv2.putText(frame, f"{score:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
for (x, y) in points.astype(np.int32):
cv2.circle(frame, (int(x), int(y)), radius, color, -1)
if __name__ == "__main__":
import cv2
detector = RetinaFace(model_name=RetinaFaceWeights.MNET_050)
print(detector.get_info())
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("❌ Failed to open webcam.")
exit()
print("📷 Webcam started. Press 'q' to exit.")
while True:
ret, frame = cap.read()
if not ret:
print("❌ Failed to read frame.")
break
# Get face detections as list of dictionaries
faces = detector.detect(frame)
# Process each detected face
for face in faces:
# Extract bbox and landmarks from dictionary
bbox = face['bbox'] # [x1, y1, x2, y2]
landmarks = face['landmarks'] # [[x1, y1], [x2, y2], ...]
confidence = face['confidence']
# Pass bbox and confidence separately
draw_bbox(frame, bbox, confidence)
# Convert landmarks to numpy array format if needed
if landmarks is not None and len(landmarks) > 0:
# Convert list of [x, y] pairs to numpy array
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
draw_keypoints(frame, points)
# Display face count
cv2.putText(frame, f"Faces: {len(faces)}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.imshow("FaceDetection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()

View File

@@ -1,25 +1,24 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
# Modified from insightface repository
import os
import cv2
import numpy as np
import onnxruntime as ort
from typing import Tuple, List, Literal
from typing import Tuple, List, Literal, Dict, Any
from uniface.log import Logger
from uniface.constants import SCRFDWeights
from uniface.model_store import verify_model_weights
from .base import BaseDetector
from .utils import non_max_supression, distance2bbox, distance2kps, resize_image
__all__ = ['SCRFD']
class SCRFD:
class SCRFD(BaseDetector):
"""
Face detector based on the SCRFD architecture.
@@ -27,10 +26,12 @@ class SCRFD:
Paper: https://arxiv.org/abs/2105.04714
Args:
model_name (SCRFDWeights): Predefined model enum (e.g., `SCRFD_10G_KPS`). Specifies the SCRFD variant to load.
conf_thresh (float): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float): Non-Maximum Suppression (NMS) threshold. Defaults to 0.4.
input_size (Tuple[int, int]): Target input resolution (width, height) to resize images. Defaults to (640, 640).
**kwargs: Keyword arguments passed to BaseDetector and SCRFD. Supported keys include:
model_name (SCRFDWeights, optional): Predefined model enum (e.g., `SCRFD_10G_KPS`).
Specifies the SCRFD variant to load. Defaults to SCRFD_10G_KPS.
conf_thresh (float, optional): Confidence threshold for filtering detections. Defaults to 0.5.
nms_thresh (float, optional): Non-Maximum Suppression threshold. Defaults to 0.4.
input_size (Tuple[int, int], optional): Input image size (width, height). Defaults to (640, 640).
Attributes:
conf_thresh (float): Threshold used to filter low-confidence detections.
@@ -47,13 +48,14 @@ class SCRFD:
RuntimeError: If the ONNX model fails to load or initialize.
"""
def __init__(
self,
model_name: SCRFDWeights = SCRFDWeights.SCRFD_10G_KPS,
conf_thresh: float = 0.5,
nms_thresh: float = 0.4,
input_size: Tuple[int, int] = (640, 640),
) -> None:
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._supports_landmarks = True # SCRFD supports landmarks
model_name = kwargs.get('model_name', SCRFDWeights.SCRFD_10G_KPS)
conf_thresh = kwargs.get('conf_thresh', 0.5)
nms_thresh = kwargs.get('nms_thresh', 0.4)
input_size = kwargs.get('input_size', (640, 640))
self.conf_thresh = conf_thresh
self.nms_thresh = nms_thresh
@@ -179,7 +181,7 @@ class SCRFD:
max_num: int = 0,
metric: Literal["default", "max"] = "max",
center_weight: float = 2
) -> Tuple[np.ndarray, np.ndarray]:
) -> List[Dict[str, Any]]:
"""
Perform face detection on an input image and return bounding boxes and facial landmarks.
@@ -193,9 +195,10 @@ class SCRFD:
when using the "default" metric. Defaults to 2.0.
Returns:
Tuple[np.ndarray, np.ndarray]:
- detections: Bounding boxes with confidence scores. Shape (N, 5), each row as [x_min, y_min, x_max, y_max, score].
- landmarks: Facial landmark coordinates. Shape (N, 5, 2), where each row contains 5 (x, y) points.
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
- 'bbox': [x1, y1, x2, y2] - Bounding box coordinates
- 'confidence': float - Detection confidence score
- 'landmarks': [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5]] - 5-point facial landmarks
"""
original_height, original_width = image.shape[:2]
@@ -221,20 +224,20 @@ class SCRFD:
keep = non_max_supression(pre_det, threshold=self.nms_thresh)
det = pre_det[keep, :]
detections = pre_det[keep, :]
landmarks = landmarks[order, :, :]
landmarks = landmarks[keep, :, :].astype(np.int32)
if 0 < max_num < det.shape[0]:
if 0 < max_num < detections.shape[0]:
# Calculate area of detections
area = (det[:, 2] - det[:, 0]) * (det[:, 3] - det[:, 1])
area = (detections[:, 2] - detections[:, 0]) * (detections[:, 3] - detections[:, 1])
# Calculate offsets from image center
center = (original_height // 2, original_width // 2)
offsets = np.vstack(
[
(det[:, 0] + det[:, 2]) / 2 - center[1],
(det[:, 1] + det[:, 3]) / 2 - center[0],
(detections[:, 0] + detections[:, 2]) / 2 - center[1],
(detections[:, 1] + detections[:, 3]) / 2 - center[0],
]
)
@@ -247,30 +250,36 @@ class SCRFD:
# Sort by scores and select top `max_num`
sorted_indices = np.argsort(values)[::-1][:max_num]
det = det[sorted_indices]
detections = detections[sorted_indices]
landmarks = landmarks[sorted_indices]
return det, landmarks
faces = []
for i in range(detections.shape[0]):
face_dict = {
'bbox': detections[i, :4].astype(float).tolist(),
'confidence': detections[i, 4].item(),
'landmarks': landmarks[i].astype(float).tolist()
}
faces.append(face_dict)
return faces
# TODO: below is only for testing, remove it later
def draw_bbox(frame, bbox, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = bbox[:4].astype(np.int32)
def draw_bbox(frame, bbox, score, color=(0, 255, 0), thickness=2):
x1, y1, x2, y2 = map(int, bbox) # Unpack 4 bbox values
cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness)
score = bbox[4]
cv2.putText(frame, f"{score:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
def draw_keypoints(frame, points, color=(0, 0, 255), radius=2):
for (x, y) in points.astype(np.int32):
cv2.circle(frame, (x, y), radius, color, -1)
cv2.circle(frame, (int(x), int(y)), radius, color, -1)
# TODO: Remove late, just for testing
if __name__ == "__main__":
detector = SCRFD(model_name=SCRFDWeights.SCRFD_500M_KPS)
print(detector.get_info())
cap = cv2.VideoCapture(0)
if not cap.isOpened():
@@ -285,14 +294,29 @@ if __name__ == "__main__":
print("❌ Failed to read frame.")
break
boxes_list, points_list = detector.detect(frame)
# Get face detections as list of dictionaries
faces = detector.detect(frame)
for boxes, points in zip(boxes_list, points_list):
draw_bbox(frame, boxes)
# Process each detected face
for face in faces:
# Extract bbox and landmarks from dictionary
bbox = face['bbox'] # [x1, y1, x2, y2]
landmarks = face['landmarks'] # [[x1, y1], [x2, y2], ...]
confidence = face['confidence']
if points is not None:
# Pass bbox and confidence separately
draw_bbox(frame, bbox, confidence)
# Convert landmarks to numpy array format if needed
if landmarks is not None and len(landmarks) > 0:
# Convert list of [x, y] pairs to numpy array
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
draw_keypoints(frame, points)
# Display face count
cv2.putText(frame, f"Faces: {len(faces)}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.imshow("FaceDetection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break

View File

@@ -0,0 +1 @@
from .model import Landmark

View File

@@ -1,7 +1,10 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import cv2
import onnx
import onnxruntime as ort
import numpy as np
import onnxruntime as ort
from typing import Tuple
@@ -15,12 +18,32 @@ __all__ = ['Landmark']
class Landmark:
"""
Facial landmark detection model for predicting facial keypoints.
"""
Facial landmark detection model for predicting 106 facial keypoints using ONNX model.
This class wraps a pretrained facial landmark model to detect 106 key facial points
such as eyes, eyebrows, nose, lips, and jawline from a given face bounding box.
It handles model verification, input preprocessing, ONNX inference execution,
and projection of landmark coordinates back to the original image space.
Attributes:
input_size (Tuple[int, int]): Model's expected input resolution (width, height).
input_mean (float): Mean value used for input normalization.
input_std (float): Standard deviation used for input normalization.
model_path (str): Path to the verified ONNX model file.
session (onnxruntime.InferenceSession): ONNX Runtime session for inference.
input_names (List[str]): List of input node names.
output_names (List[str]): List of output node names.
lmk_dim (int): Number of dimensions per landmark point (typically 2 for x, y).
lmk_num (int): Total number of landmark points predicted by the model (106).
Args:
model_name (LandmarkWeights): Enum specifying the landmark model to load.
input_size (Tuple[int, int]): Resolution for model input; defaults to (192, 192).
"""
def __init__(
self,
model_name: LandmarkWeights = LandmarkWeights.DEFAULT,
self,
model_name: LandmarkWeights = LandmarkWeights.DEFAULT,
input_size: Tuple[int, int] = (192, 192)
) -> None:
"""
@@ -50,7 +73,7 @@ class Landmark:
def _initialize_model(self):
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
@@ -73,7 +96,7 @@ class Landmark:
output_shape = self.session.get_outputs()[0].shape
self.lmk_dim = 2 # x,y coordinates
self.lmk_num = output_shape[1] // self.lmk_dim # Number of landmarks
Logger.info(f"Model initialized with {self.lmk_num} landmarks")
except Exception as e:
@@ -96,7 +119,7 @@ class Landmark:
# Calculate face dimensions and center
width, height = bbox[2] - bbox[0], bbox[3] - bbox[1]
center = (bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2
# Determine scale to fit face with some margin
scale = self.input_size[0] / (max(width, height) * 1.5)
rotation = 0.0
@@ -105,7 +128,7 @@ class Landmark:
aligned_face, transform_matrix = bbox_center_alignment(
image, center, self.input_size[0], scale, rotation
)
# Convert to blob format for inference
face_blob = cv2.dnn.blobFromImage(
aligned_face,
@@ -114,7 +137,7 @@ class Landmark:
(self.input_mean, self.input_mean, self.input_mean),
swapRB=True # Convert BGR to RGB
)
return face_blob, transform_matrix
def postprocess(self, predictions: np.ndarray, transform_matrix: np.ndarray) -> np.ndarray:
@@ -154,13 +177,13 @@ class Landmark:
"""
# Preprocess image
face_blob, transform_matrix = self.preprocess(image, bbox)
# Run inference
raw_predictions = self.session.run(
self.output_names,
self.output_names,
{self.input_names[0]: face_blob}
)[0][0]
# Postprocess to get landmarks in original image space
landmarks = self.postprocess(raw_predictions, transform_matrix)
@@ -172,7 +195,7 @@ class Landmark:
if __name__ == "__main__":
from uniface.detection import RetinaFace
from uniface.constants import RetinaFaceWeights
face_detector = RetinaFace(
model_name=RetinaFaceWeights.MNET_V2,
conf_thresh=0.5,

View File

@@ -1,22 +1,21 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
# Modified from insightface repository
import os
import cv2
import numpy as np
import onnxruntime as ort
from typing import Tuple, Optional, Union, List
from dataclasses import dataclass
from typing import Tuple, Union, List
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.face_utils import compute_similarity, face_alignment
from uniface.face_utils import face_alignment
from uniface.constants import SphereFaceWeights, MobileFaceWeights, ArcFaceWeights
__all__ = ["BaseFaceEncoder", "PreprocessConfig"]
__all__ = ["BaseModel", "PreprocessConfig"]
@dataclass
@@ -29,7 +28,7 @@ class PreprocessConfig:
input_size: Tuple[int, int] = (112, 112)
class BaseFaceEncoder:
class BaseModel:
"""
Unified Face Encoder supporting multiple model families (e.g., SphereFace, MobileFace).
"""
@@ -133,7 +132,7 @@ class BaseFaceEncoder:
def get_embedding(self, image: np.ndarray, landmarks: np.ndarray) -> np.ndarray:
"""
Extracts face embedding from an aligned image.
Extracts face embedding from an image.
Args:
image: Input face image (BGR format).
@@ -150,3 +149,17 @@ class BaseFaceEncoder:
embedding = self.session.run(self.output_names, {self.input_name: face_blob})[0]
return embedding
def get_normalized_embedding(self, image: np.ndarray, landmarks: np.ndarray) -> np.ndarray:
"""
Extracts l2 normalized face embedding vector from an image
Args:
image: Input face image (BGR format).
landmarks: Facial landmarks (5 points for alignment).
Returns:
Normalied face embedding vector (typically 512-dimensional).
"""
embedding = self.get_embedding(image, landmarks)
return embedding / np.linalg.norm(embedding)

View File

@@ -5,13 +5,24 @@
from typing import Optional
from uniface.constants import SphereFaceWeights, MobileFaceWeights, ArcFaceWeights
from .base import BaseFaceEncoder, PreprocessConfig
from .base import BaseModel, PreprocessConfig
__all__ = ["SphereFace", "MobileFace", "ArcFace"]
class SphereFace(BaseFaceEncoder):
class SphereFace(BaseModel):
"""
SphereFace face encoder class.
This class loads a SphereFace model for face embedding extraction.
It supports configurable preprocessing, with a default mean/std and input size of 112x112.
Args:
model_name (SphereFaceWeights): Enum value representing the model to load. Defaults to SphereFaceWeights.SPHERE20.
preprocessing (Optional[PreprocessConfig]): Preprocessing config (mean, std, size). Defaults to standard 112x112 with normalization.
"""
def __init__(
self, model_name: SphereFaceWeights = SphereFaceWeights.SPHERE20,
preprocessing: Optional[PreprocessConfig] = None
@@ -25,7 +36,18 @@ class SphereFace(BaseFaceEncoder):
super().__init__(model_name=model_name, preprocessing=preprocessing)
class MobileFace(BaseFaceEncoder):
class MobileFace(BaseModel):
"""
MobileFace face encoder class.
Loads a lightweight MobileFaceNet model for fast face embedding extraction.
Default input normalization and resizing applied if preprocessing is not provided.
Args:
model_name (MobileFaceWeights): Enum value specifying the MobileFace model. Defaults to MobileFaceWeights.MNET_V2.
preprocessing (Optional[PreprocessConfig]): Preprocessing config. If None, uses standard normalization and 112x112 input size.
"""
def __init__(
self, model_name: MobileFaceWeights = MobileFaceWeights.MNET_V2,
preprocessing: Optional[PreprocessConfig] = None
@@ -39,7 +61,18 @@ class MobileFace(BaseFaceEncoder):
super().__init__(model_name=model_name)
class ArcFace(BaseFaceEncoder):
class ArcFace(BaseModel):
"""
ArcFace face encoder class.
Loads an ArcFace model (e.g., ResNet-based) for robust face recognition embedding generation.
Applies standard preprocessing unless overridden.
Args:
model_name (ArcFaceWeights): Enum for the ArcFace model variant. Defaults to ArcFaceWeights.MNET.
preprocessing (Optional[PreprocessConfig]): Preprocessing settings. Defaults to standard normalization and resizing if not specified.
"""
def __init__(
self, model_name: ArcFaceWeights = ArcFaceWeights.MNET,
preprocessing: Optional[PreprocessConfig] = None