5 Commits

Author SHA1 Message Date
yakhyo
13b518e96d chore: Upgrade version to v1.5.3 2025-12-15 15:09:54 +09:00
yakhyo
1b877bc9fc fix: Fix the version 2025-12-15 14:53:36 +09:00
Yakhyokhuja Valikhujaev
bb1d209f3b feat: Add BiSeNet face parsing model (#36)
* Add BiSeNet face parsing implementation

* Add parsing model weights configuration

* Export BiSeNet in main package

* Add face parsing tests

* Add face parsing examples and script

* Bump version to 1.5.0

* Update documentation for face parsing

* Fix face parsing notebook to use lips instead of mouth

* chore: Update the face parsing example

* fix: Fix model argument to use Enum

* ref: Move vis_parsing_map function into visualization.py

* docs: Update README.md
2025-12-15 14:50:15 +09:00
Yakhyokhuja Valikhujaev
54b769c0f1 feat: Add Face Parsing model BiSeNet model trained on CelebMask dataset (#35)
* Add BiSeNet face parsing implementation

* Add parsing model weights configuration

* Export BiSeNet in main package

* Add face parsing tests

* Add face parsing examples and script

* Bump version to 1.5.0

* Update documentation for face parsing

* Fix face parsing notebook to use lips instead of mouth

* chore: Update the face parsing example

* fix: Fix model argument to use Enum

* ref: Move vis_parsing_map function into visualization.py

* docs: Update README.md
2025-12-14 21:13:53 +09:00
Yakhyokhuja Valikhujaev
4d1921e531 feat: Add 2D Gaze estimation models (#34)
* feat: Add Gaze Estimation, update docs and Add example notebook, inference code

* docs: Update README.md
2025-12-14 14:07:46 +09:00
22 changed files with 2266 additions and 9 deletions

View File

@@ -58,3 +58,6 @@ Open an issue or start a discussion on GitHub.

116
MODELS.md
View File

@@ -291,6 +291,119 @@ emotion, confidence = predictor.predict(image, landmarks)
---
## Gaze Estimation Models
### MobileGaze Family
Real-time gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
| Model Name | Params | Size | MAE* | Use Case |
| -------------- | ------ | ------- | ----- | ----------------------------- |
| `RESNET18` | 11.7M | 43 MB | 12.84 | Balanced accuracy/speed |
| `RESNET34` ⭐ | 24.8M | 81.6 MB | 11.33 | **Recommended default** |
| `RESNET50` | 25.6M | 91.3 MB | 11.34 | High accuracy |
| `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 | Mobile/Edge devices |
| `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 | Lightweight/Real-time |
*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better
**Dataset**: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)
**Training**: 200 epochs with classification-based approach (binned angles)
#### Usage
```python
from uniface import MobileGaze
from uniface.constants import GazeWeights
import numpy as np
# Default (recommended)
gaze_estimator = MobileGaze() # Uses RESNET34
# Lightweight model
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
# Estimate gaze from face crop
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f"Pitch: {np.degrees(pitch):.1f}°, Yaw: {np.degrees(yaw):.1f}°")
```
**Note**: Requires face crop as input. Use face detection first to obtain bounding boxes.
---
## Face Parsing Models
### BiSeNet Family
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
| Model Name | Params | Size | Classes | Use Case |
| -------------- | ------ | ------- | ------- | ----------------------------- |
| `RESNET18` ⭐ | 13.3M | 50.7 MB | 19 | **Recommended default** |
| `RESNET34` | 24.1M | 89.2 MB | 19 | Higher accuracy |
**19 Facial Component Classes:**
1. Background
2. Skin
3. Left Eyebrow
4. Right Eyebrow
5. Left Eye
6. Right Eye
7. Eye Glasses
8. Left Ear
9. Right Ear
10. Ear Ring
11. Nose
12. Mouth
13. Upper Lip
14. Lower Lip
15. Neck
16. Neck Lace
17. Cloth
18. Hair
19. Hat
**Dataset**: Trained on CelebAMask-HQ
**Architecture**: BiSeNet with ResNet backbone
**Input Size**: 512×512 (automatically resized)
#### Usage
```python
from uniface.parsing import BiSeNet
from uniface.constants import ParsingWeights
from uniface.visualization import vis_parsing_maps
import cv2
# Default (recommended)
parser = BiSeNet() # Uses RESNET18
# Higher accuracy model
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
# Parse face image (already cropped)
mask = parser.parse(face_image)
# Visualize with overlay
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
# mask shape: (H, W) with values 0-18 representing classes
print(f"Detected {len(np.unique(mask))} facial components")
```
**Applications:**
- Face makeup and beauty applications
- Virtual try-on systems
- Face editing and manipulation
- Facial feature extraction
- Portrait segmentation
**Note**: Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
---
## Model Updates
Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
@@ -330,6 +443,8 @@ python scripts/download_model.py --model MNET_V2
- **YOLOv5-Face Original**: [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face) - Original PyTorch implementation
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
### Papers
@@ -339,3 +454,4 @@ python scripts/download_model.py --model MNET_V2
- **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
- **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)

View File

@@ -242,7 +242,93 @@ if faces:
---
## 7. Batch Processing (3 minutes)
## 7. Gaze Estimation (2 minutes)
Estimate where a person is looking:
```python
import cv2
import numpy as np
from uniface import RetinaFace, MobileGaze
from uniface.visualization import draw_gaze
# Initialize models
detector = RetinaFace()
gaze_estimator = MobileGaze()
# Load image
image = cv2.imread("photo.jpg")
faces = detector.detect(image)
# Estimate gaze for each face
for i, face in enumerate(faces):
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size > 0:
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f"Face {i+1}: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
# Draw gaze direction
draw_gaze(image, bbox, pitch, yaw)
cv2.imwrite("gaze_output.jpg", image)
```
**Output:**
```
Face 1: pitch=5.2°, yaw=-12.3°
Face 2: pitch=-8.1°, yaw=15.7°
```
---
## 8. Face Parsing (2 minutes)
Segment face into semantic components (skin, eyes, nose, mouth, hair, etc.):
```python
import cv2
import numpy as np
from uniface.parsing import BiSeNet
from uniface.visualization import vis_parsing_maps
# Initialize parser
parser = BiSeNet() # Uses ResNet18 by default
# Load face image (already cropped)
face_image = cv2.imread("face.jpg")
# Parse face into 19 components
mask = parser.parse(face_image)
# Visualize with overlay
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
# Convert back to BGR for saving
vis_bgr = cv2.cvtColor(vis_result, cv2.COLOR_RGB2BGR)
cv2.imwrite("parsed_face.jpg", vis_bgr)
print(f"Detected {len(np.unique(mask))} facial components")
```
**Output:**
```
Detected 12 facial components
```
**19 Facial Component Classes:**
- Background, Skin, Eyebrows (L/R), Eyes (L/R), Eye Glasses
- Ears (L/R), Ear Ring, Nose, Mouth, Lips (Upper/Lower)
- Neck, Neck Lace, Cloth, Hair, Hat
---
## 9. Batch Processing (3 minutes)
Process multiple images:
@@ -275,7 +361,7 @@ print("Done!")
---
## 8. Model Selection
## 10. Model Selection
Choose the right model for your use case:
@@ -326,6 +412,35 @@ recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2) # Fast, small siz
recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20) # Alternative method
```
### Gaze Estimation Models
```python
from uniface import MobileGaze
from uniface.constants import GazeWeights
# Default (recommended)
gaze_estimator = MobileGaze() # Uses RESNET34
# Lightweight (mobile/edge devices)
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
# High accuracy
gaze_estimator = MobileGaze(model_name=GazeWeights.RESNET50)
```
### Face Parsing Models
```python
from uniface.parsing import BiSeNet
from uniface.constants import ParsingWeights
# Default (recommended, 50.7 MB)
parser = BiSeNet() # Uses RESNET18
# Higher accuracy (89.2 MB)
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
```
---
## Common Issues
@@ -387,6 +502,8 @@ Explore interactive examples for common tasks:
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
| **Gaze Estimation** | Estimate gaze direction | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
### Additional Resources
@@ -400,4 +517,6 @@ Explore interactive examples for common tasks:
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch)
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference)
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)

View File

@@ -11,7 +11,7 @@
<img src=".github/logos/logo_web.webp" width=75%>
</div>
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, and attribute analysis with hardware acceleration support across platforms.
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, face parsing, gaze estimation, and attribute analysis with hardware acceleration support across platforms.
---
@@ -20,6 +20,8 @@
- **High-Speed Face Detection**: ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models
- **Facial Landmark Detection**: Accurate 106-point landmark localization
- **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
- **Face Parsing**: BiSeNet-based semantic segmentation with 19 facial component classes
- **Gaze Estimation**: Real-time gaze direction prediction with MobileGaze
- **Attribute Analysis**: Age, gender, and emotion detection
- **Face Alignment**: Precise alignment for downstream tasks
- **Hardware Acceleration**: ARM64 optimizations (Apple Silicon), CUDA (NVIDIA), CPU fallback
@@ -152,6 +154,50 @@ gender_str = 'Female' if gender == 0 else 'Male'
print(f"{gender_str}, {age} years old")
```
### Gaze Estimation
```python
from uniface import RetinaFace, MobileGaze
from uniface.visualization import draw_gaze
import numpy as np
detector = RetinaFace()
gaze_estimator = MobileGaze()
faces = detector.detect(image)
for face in faces:
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f"Gaze: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
# Visualize
draw_gaze(image, bbox, pitch, yaw)
```
### Face Parsing
```python
from uniface.parsing import BiSeNet
from uniface.visualization import vis_parsing_maps
# Initialize parser
parser = BiSeNet() # Uses ResNet18 by default
# Parse face image (already cropped)
mask = parser.parse(face_image)
# Visualize with overlay
import cv2
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
# mask contains 19 classes: skin, eyes, nose, mouth, hair, etc.
print(f"Unique classes: {len(np.unique(mask))}")
```
---
## Documentation
@@ -252,6 +298,18 @@ faces = detect_faces(image, method='retinaface', conf_thresh=0.8) # methods: re
| `AgeGender` | `model_name=AgeGenderWeights.DEFAULT`; `input_size` auto-detected | Requires bbox; ONNXRuntime |
| `Emotion` | `model_weights=DDAMFNWeights.AFFECNET7`, `input_size=(112, 112)` | Requires 5-point landmarks; TorchScript |
**Gaze Estimation**
| Class | Key params (defaults) | Notes |
| ------------- | ------------------------------------------ | ------------------------------------ |
| `MobileGaze` | `model_name=GazeWeights.RESNET34` | Returns (pitch, yaw) angles in radians; trained on Gaze360 |
**Face Parsing**
| Class | Key params (defaults) | Notes |
| ---------- | ---------------------------------------- | ------------------------------------ |
| `BiSeNet` | `model_name=ParsingWeights.RESNET18`, `input_size=(512, 512)` | 19 facial component classes; BiSeNet architecture with ResNet backbone |
---
## Model Performance
@@ -298,6 +356,8 @@ Interactive examples covering common face analysis tasks:
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
| **Gaze Estimation** | Estimate gaze direction from face images | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
### Webcam Face Detection
@@ -488,6 +548,8 @@ uniface/
│ ├── detection/ # Face detection models
│ ├── recognition/ # Face recognition models
│ ├── landmark/ # Landmark detection
│ ├── parsing/ # Face parsing
│ ├── gaze/ # Gaze estimation
│ ├── attribute/ # Age, gender, emotion
│ ├── onnx_utils.py # ONNX Runtime utilities
│ ├── model_store.py # Model download & caching
@@ -504,6 +566,8 @@ uniface/
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet face parsing training code and pretrained weights
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
## Contributing

387
examples/face_parsing.ipynb Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,7 +1,7 @@
[project]
name = "uniface"
version = "1.3.2"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Age, and Gender Detection"
version = "1.5.3"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
readme = "README.md"
license = { text = "MIT" }
authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
@@ -14,6 +14,9 @@ keywords = [
"face-detection",
"face-recognition",
"facial-landmarks",
"face-parsing",
"face-segmentation",
"gaze-estimation",
"age-detection",
"gender-detection",
"computer-vision",
@@ -21,6 +24,7 @@ keywords = [
"onnx",
"onnxruntime",
"face-analysis",
"bisenet",
]
classifiers = [

View File

@@ -9,6 +9,7 @@ Scripts for testing UniFace features.
| `run_detection.py` | Face detection on image or webcam |
| `run_age_gender.py` | Age and gender prediction |
| `run_emotion.py` | Emotion detection (7 or 8 emotions) |
| `run_gaze_estimation.py` | Gaze direction estimation |
| `run_landmarks.py` | 106-point facial landmark detection |
| `run_recognition.py` | Face embedding extraction and comparison |
| `run_face_analyzer.py` | Complete face analysis (detection + recognition + attributes) |
@@ -33,6 +34,10 @@ python scripts/run_age_gender.py --webcam
python scripts/run_emotion.py --image assets/test.jpg
python scripts/run_emotion.py --webcam
# Gaze estimation
python scripts/run_gaze_estimation.py --image assets/test.jpg
python scripts/run_gaze_estimation.py --webcam
# Landmarks
python scripts/run_landmarks.py --image assets/test.jpg
python scripts/run_landmarks.py --webcam

View File

@@ -79,7 +79,9 @@ def run_webcam(detector, age_gender, threshold: float = 0.6):
bboxes = [f['bbox'] for f in faces]
scores = [f['confidence'] for f in faces]
landmarks = [f['landmarks'] for f in faces]
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=threshold)
draw_detections(
image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
)
for face in faces:
gender_id, age = age_gender.predict(frame, face['bbox']) # predict per face

View File

@@ -98,7 +98,7 @@ def main():
else:
from uniface.constants import YOLOv5FaceWeights
detector = YOLOv5Face(model_name=YOLOv5FaceWeights.YOLOV5N)
detector = YOLOv5Face(model_name=YOLOv5FaceWeights.YOLOV5M)
if args.webcam:
run_webcam(detector, args.threshold)

126
scripts/run_face_parsing.py Normal file
View File

@@ -0,0 +1,126 @@
# Face parsing on detected faces
# Usage: python run_face_parsing.py --image path/to/image.jpg
# python run_face_parsing.py --webcam
import argparse
import os
from pathlib import Path
import cv2
from uniface import RetinaFace
from uniface.constants import ParsingWeights
from uniface.parsing import BiSeNet
from uniface.visualization import vis_parsing_maps
def process_image(detector, parser, image_path: str, save_dir: str = 'outputs'):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
result_image = image.copy()
for i, face in enumerate(faces):
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size == 0:
continue
# Parse the face
mask = parser.parse(face_crop)
print(f' Face {i + 1}: parsed with {len(set(mask.flatten()))} unique classes')
# Visualize the parsing result
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
# Place the visualization back on the original image
result_image[y1:y2, x1:x2] = vis_result
# Draw bounding box
cv2.rectangle(result_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_parsing.jpg')
cv2.imwrite(output_path, result_image)
print(f'Output saved: {output_path}')
def run_webcam(detector, parser):
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
for face in faces:
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = frame[y1:y2, x1:x2]
if face_crop.size == 0:
continue
# Parse the face
mask = parser.parse(face_crop)
# Visualize the parsing result
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
# Place the visualization back on the frame
frame[y1:y2, x1:x2] = vis_result
# Draw bounding box
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Face Parsing', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser_arg = argparse.ArgumentParser(description='Run face parsing')
parser_arg.add_argument('--image', type=str, help='Path to input image')
parser_arg.add_argument('--webcam', action='store_true', help='Use webcam')
parser_arg.add_argument('--save_dir', type=str, default='outputs')
parser_arg.add_argument(
'--model', type=str, default=ParsingWeights.RESNET18, choices=[ParsingWeights.RESNET18, ParsingWeights.RESNET34]
)
args = parser_arg.parse_args()
if not args.image and not args.webcam:
parser_arg.error('Either --image or --webcam must be specified')
detector = RetinaFace()
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
if args.webcam:
run_webcam(detector, parser)
else:
process_image(detector, parser, args.image, args.save_dir)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,104 @@
# Gaze estimation on detected faces
# Usage: python run_gaze_estimation.py --image path/to/image.jpg
# python run_gaze_estimation.py --webcam
import argparse
import os
from pathlib import Path
import cv2
import numpy as np
from uniface import RetinaFace
from uniface.gaze import MobileGaze
from uniface.visualization import draw_gaze
def process_image(detector, gaze_estimator, image_path: str, save_dir: str = 'outputs'):
image = cv2.imread(image_path)
if image is None:
print(f"Error: Failed to load image from '{image_path}'")
return
faces = detector.detect(image)
print(f'Detected {len(faces)} face(s)')
for i, face in enumerate(faces):
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = image[y1:y2, x1:x2]
if face_crop.size == 0:
continue
pitch, yaw = gaze_estimator.estimate(face_crop)
print(f' Face {i + 1}: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°')
# Draw both bbox and gaze arrow with angle text
draw_gaze(image, bbox, pitch, yaw, draw_angles=True)
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_gaze.jpg')
cv2.imwrite(output_path, image)
print(f'Output saved: {output_path}')
def run_webcam(detector, gaze_estimator):
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print('Cannot open webcam')
return
print("Press 'q' to quit")
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.flip(frame, 1)
faces = detector.detect(frame)
for face in faces:
bbox = face['bbox']
x1, y1, x2, y2 = map(int, bbox[:4])
face_crop = frame[y1:y2, x1:x2]
if face_crop.size == 0:
continue
pitch, yaw = gaze_estimator.estimate(face_crop)
# Draw both bbox and gaze arrow
draw_gaze(frame, bbox, pitch, yaw)
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Gaze Estimation', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def main():
parser = argparse.ArgumentParser(description='Run gaze estimation')
parser.add_argument('--image', type=str, help='Path to input image')
parser.add_argument('--webcam', action='store_true', help='Use webcam')
parser.add_argument('--save_dir', type=str, default='outputs')
args = parser.parse_args()
if not args.image and not args.webcam:
parser.error('Either --image or --webcam must be specified')
detector = RetinaFace()
gaze_estimator = MobileGaze()
if args.webcam:
run_webcam(detector, gaze_estimator)
else:
process_image(detector, gaze_estimator, args.image, args.save_dir)
if __name__ == '__main__':
main()

118
tests/test_parsing.py Normal file
View File

@@ -0,0 +1,118 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
import numpy as np
import pytest
from uniface.constants import ParsingWeights
from uniface.parsing import BiSeNet, create_face_parser
def test_bisenet_initialization():
"""Test BiSeNet initialization."""
parser = BiSeNet()
assert parser is not None
assert parser.input_size == (512, 512)
def test_bisenet_with_different_models():
"""Test BiSeNet with different model weights."""
parser_resnet18 = BiSeNet(model_name=ParsingWeights.RESNET18)
parser_resnet34 = BiSeNet(model_name=ParsingWeights.RESNET34)
assert parser_resnet18 is not None
assert parser_resnet34 is not None
def test_bisenet_preprocess():
"""Test preprocessing."""
parser = BiSeNet()
# Create a dummy face image
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
# Preprocess
preprocessed = parser.preprocess(face_image)
assert preprocessed.shape == (1, 3, 512, 512)
assert preprocessed.dtype == np.float32
def test_bisenet_postprocess():
"""Test postprocessing."""
parser = BiSeNet()
# Create dummy model output (batch_size=1, num_classes=19, H=512, W=512)
dummy_output = np.random.randn(1, 19, 512, 512).astype(np.float32)
# Postprocess
mask = parser.postprocess(dummy_output, original_size=(256, 256))
assert mask.shape == (256, 256)
assert mask.dtype == np.uint8
assert mask.min() >= 0
assert mask.max() < 19 # 19 classes (0-18)
def test_bisenet_parse():
"""Test end-to-end parsing."""
parser = BiSeNet()
# Create a dummy face image
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
# Parse
mask = parser.parse(face_image)
assert mask.shape == (256, 256)
assert mask.dtype == np.uint8
assert mask.min() >= 0
assert mask.max() < 19
def test_bisenet_callable():
"""Test that BiSeNet is callable."""
parser = BiSeNet()
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
# Should work as callable
mask = parser(face_image)
assert mask.shape == (256, 256)
assert mask.dtype == np.uint8
def test_create_face_parser_with_enum():
"""Test factory function with enum."""
parser = create_face_parser(ParsingWeights.RESNET18)
assert parser is not None
assert isinstance(parser, BiSeNet)
def test_create_face_parser_with_string():
"""Test factory function with string."""
parser = create_face_parser('parsing_resnet18')
assert parser is not None
assert isinstance(parser, BiSeNet)
def test_create_face_parser_invalid_model():
"""Test factory function with invalid model name."""
with pytest.raises(ValueError, match='Unknown face parsing model'):
create_face_parser('invalid_model')
def test_bisenet_different_input_sizes():
"""Test parsing with different input image sizes."""
parser = BiSeNet()
# Test with different sizes
sizes = [(128, 128), (256, 256), (512, 512), (640, 480)]
for h, w in sizes:
face_image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
mask = parser.parse(face_image)
assert mask.shape == (h, w), f'Failed for size {h}x{w}'
assert mask.dtype == np.uint8

View File

@@ -13,13 +13,13 @@
__license__ = 'MIT'
__author__ = 'Yakhyokhuja Valikhujaev'
__version__ = '1.3.2'
__version__ = '1.5.3'
from uniface.face_utils import compute_similarity, face_alignment
from uniface.log import Logger, enable_logging
from uniface.model_store import verify_model_weights
from uniface.visualization import draw_detections
from uniface.visualization import draw_detections, vis_parsing_maps
from .analyzer import FaceAnalyzer
from .attribute import AgeGender
@@ -37,7 +37,9 @@ from .detection import (
detect_faces,
list_available_detectors,
)
from .gaze import MobileGaze, create_gaze_estimator
from .landmark import Landmark106, create_landmarker
from .parsing import BiSeNet, create_face_parser
from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
__all__ = [
@@ -49,6 +51,8 @@ __all__ = [
'FaceAnalyzer',
# Factory functions
'create_detector',
'create_face_parser',
'create_gaze_estimator',
'create_landmarker',
'create_recognizer',
'detect_faces',
@@ -63,12 +67,17 @@ __all__ = [
'SphereFace',
# Landmark models
'Landmark106',
# Gaze models
'MobileGaze',
# Parsing models
'BiSeNet',
# Attribute models
'AgeGender',
'Emotion',
# Utilities
'compute_similarity',
'draw_detections',
'vis_parsing_maps',
'face_alignment',
'verify_model_weights',
'Logger',

View File

@@ -96,6 +96,29 @@ class LandmarkWeights(str, Enum):
DEFAULT = "2d_106"
class GazeWeights(str, Enum):
"""
MobileGaze: Real-Time Gaze Estimation models.
Trained on Gaze360 dataset.
https://github.com/yakhyo/gaze-estimation
"""
RESNET18 = "gaze_resnet18"
RESNET34 = "gaze_resnet34"
RESNET50 = "gaze_resnet50"
MOBILENET_V2 = "gaze_mobilenetv2"
MOBILEONE_S0 = "gaze_mobileone_s0"
class ParsingWeights(str, Enum):
"""
Face Parsing: Semantic Segmentation of Facial Components.
Trained on CelebAMask-HQ dataset.
https://github.com/yakhyo/face-parsing
"""
RESNET18 = "parsing_resnet18"
RESNET34 = "parsing_resnet34"
MODEL_URLS: Dict[Enum, str] = {
# RetinaFace
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.25.onnx',
@@ -129,6 +152,15 @@ MODEL_URLS: Dict[Enum, str] = {
AgeGenderWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/weights/genderage.onnx',
# Landmarks
LandmarkWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/weights/2d106det.onnx',
# Gaze (MobileGaze)
GazeWeights.RESNET18: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet18_gaze.onnx',
GazeWeights.RESNET34: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet34_gaze.onnx',
GazeWeights.RESNET50: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet50_gaze.onnx',
GazeWeights.MOBILENET_V2: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobilenetv2_gaze.onnx',
GazeWeights.MOBILEONE_S0: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobileone_s0_gaze.onnx',
# Parsing
ParsingWeights.RESNET18: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',
ParsingWeights.RESNET34: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet34.onnx',
}
MODEL_SHA256: Dict[Enum, str] = {
@@ -164,6 +196,15 @@ MODEL_SHA256: Dict[Enum, str] = {
AgeGenderWeights.DEFAULT: '4fde69b1c810857b88c64a335084f1c3fe8f01246c9a191b48c7bb756d6652fb',
# Landmark
LandmarkWeights.DEFAULT: 'f001b856447c413801ef5c42091ed0cd516fcd21f2d6b79635b1e733a7109dbf',
# MobileGaze (trained on Gaze360)
GazeWeights.RESNET18: '23d5d7e4f6f40dce8c35274ce9d08b45b9e22cbaaf5af73182f473229d713d31',
GazeWeights.RESNET34: '4457ee5f7acd1a5ab02da4b61f02fc3a0b17adbf3844dd0ba3cd4288f2b5e1de',
GazeWeights.RESNET50: 'e1eaf98f5ec7c89c6abe7cfe39f7be83e747163f98d1ff945c0603b3c521be22',
GazeWeights.MOBILENET_V2: 'fdcdb84e3e6421b5a79e8f95139f249fc258d7f387eed5ddac2b80a9a15ce076',
GazeWeights.MOBILEONE_S0: 'c0b5a4f4a0ffd24f76ab3c1452354bb2f60110899fd9a88b464c75bafec0fde8',
# Face Parsing
ParsingWeights.RESNET18: '0d9bd318e46987c3bdbfacae9e2c0f461cae1c6ac6ea6d43bbe541a91727e33f',
ParsingWeights.RESNET34: '5b805bba7b5660ab7070b5a381dcf75e5b3e04199f1e9387232a77a00095102e',
}
CHUNK_SIZE = 8192

58
uniface/gaze/__init__.py Normal file
View File

@@ -0,0 +1,58 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from .base import BaseGazeEstimator
from .models import MobileGaze
def create_gaze_estimator(method: str = 'mobilegaze', **kwargs) -> BaseGazeEstimator:
"""
Factory function to create gaze estimators.
This function initializes and returns a gaze estimator instance based on the
specified method. It acts as a high-level interface to the underlying
model classes.
Args:
method (str): The gaze estimation method to use.
Options: 'mobilegaze' (default).
**kwargs: Model-specific parameters passed to the estimator's constructor.
For example, `model_name` can be used to select a specific
backbone from `GazeWeights` enum (RESNET18, RESNET34, RESNET50,
MOBILENET_V2, MOBILEONE_S0).
Returns:
BaseGazeEstimator: An initialized gaze estimator instance ready for use.
Raises:
ValueError: If the specified `method` is not supported.
Examples:
>>> # Create the default MobileGaze estimator (ResNet18 backbone)
>>> estimator = create_gaze_estimator()
>>> # Create with MobileNetV2 backbone
>>> from uniface.constants import GazeWeights
>>> estimator = create_gaze_estimator(
... 'mobilegaze',
... model_name=GazeWeights.MOBILENET_V2
... )
>>> # Use the estimator
>>> pitch, yaw = estimator.estimate(face_crop)
"""
method = method.lower()
if method in ('mobilegaze', 'mobile_gaze', 'gaze'):
return MobileGaze(**kwargs)
else:
available = ['mobilegaze']
raise ValueError(f"Unsupported gaze estimation method: '{method}'. Available: {available}")
__all__ = [
'create_gaze_estimator',
'MobileGaze',
'BaseGazeEstimator',
]

108
uniface/gaze/base.py Normal file
View File

@@ -0,0 +1,108 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from typing import Tuple
import numpy as np
class BaseGazeEstimator(ABC):
"""
Abstract base class for all gaze estimation models.
This class defines the common interface that all gaze estimators must implement,
ensuring consistency across different gaze estimation methods. Gaze estimation
predicts the direction a person is looking based on their face image.
The gaze direction is represented as pitch and yaw angles in radians:
- Pitch: Vertical angle (positive = looking up, negative = looking down)
- Yaw: Horizontal angle (positive = looking right, negative = looking left)
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess the input face image for model inference.
This method should take a raw face crop and convert it into the format
expected by the model's inference engine (e.g., normalized tensor).
Args:
face_image (np.ndarray): A cropped face image in BGR format with
shape (H, W, C).
Returns:
np.ndarray: The preprocessed image tensor ready for inference,
typically with shape (1, C, H, W).
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, outputs: Tuple[np.ndarray, np.ndarray]) -> Tuple[float, float]:
"""
Postprocess raw model outputs into gaze angles.
This method takes the raw output from the model's inference and
converts it into pitch and yaw angles in radians.
Args:
outputs: Raw outputs from the model inference. The format depends
on the specific model architecture.
Returns:
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def estimate(self, face_image: np.ndarray) -> Tuple[float, float]:
"""
Perform end-to-end gaze estimation on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the gaze direction.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
The face should be roughly centered and
well-framed within the image.
Returns:
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians:
- pitch: Vertical gaze angle (positive = up, negative = down)
- yaw: Horizontal gaze angle (positive = right, negative = left)
Example:
>>> estimator = create_gaze_estimator()
>>> pitch, yaw = estimator.estimate(face_crop)
>>> print(f"Looking: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
"""
raise NotImplementedError('Subclasses must implement the estimate method.')
def __call__(self, face_image: np.ndarray) -> Tuple[float, float]:
"""
Provides a convenient, callable shortcut for the `estimate` method.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
Returns:
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians.
"""
return self.estimate(face_image)

187
uniface/gaze/models.py Normal file
View File

@@ -0,0 +1,187 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Tuple
import cv2
import numpy as np
from uniface.constants import GazeWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseGazeEstimator
__all__ = ['MobileGaze']
class MobileGaze(BaseGazeEstimator):
"""
MobileGaze: Real-Time Gaze Estimation with ONNX Runtime.
MobileGaze is a gaze estimation model that predicts gaze direction from a single
face image. It supports multiple backbone architectures including ResNet 18/34/50,
MobileNetV2, and MobileOne S0. The model uses a classification approach with binned
angles, which are then decoded to continuous pitch and yaw values.
The model outputs gaze direction as pitch (vertical) and yaw (horizontal) angles
in radians.
Reference:
https://github.com/yakhyo/gaze-estimation
Args:
model_name (GazeWeights): The enum specifying the gaze model backbone to load.
Options: RESNET18, RESNET34, RESNET50, MOBILENET_V2, MOBILEONE_S0.
Defaults to `GazeWeights.RESNET18`.
input_size (Tuple[int, int]): The resolution (width, height) for the model's
input. Defaults to (448, 448).
Attributes:
input_size (Tuple[int, int]): Model input dimensions.
input_mean (list): Per-channel mean values for normalization (ImageNet).
input_std (list): Per-channel std values for normalization (ImageNet).
Example:
>>> from uniface.gaze import MobileGaze
>>> from uniface import RetinaFace
>>>
>>> detector = RetinaFace()
>>> gaze_estimator = MobileGaze()
>>>
>>> # Detect faces and estimate gaze for each
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face['bbox']
... x1, y1, x2, y2 = map(int, bbox[:4])
... face_crop = image[y1:y2, x1:x2]
... pitch, yaw = gaze_estimator.estimate(face_crop)
... print(f"Gaze: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
"""
def __init__(
self,
model_name: GazeWeights = GazeWeights.RESNET34,
input_size: Tuple[int, int] = (448, 448),
) -> None:
Logger.info(f'Initializing MobileGaze with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.input_mean = [0.485, 0.456, 0.406]
self.input_std = [0.229, 0.224, 0.225]
# Model specific parameters for bin-based classification (Gaze360 config)
self._bins = 90
self._binwidth = 4
self._angle_offset = 180
self._idx_tensor = np.arange(self._bins, dtype=np.float32)
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get input configuration
input_cfg = self.session.get_inputs()[0]
input_shape = input_cfg.shape
self.input_name = input_cfg.name
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
# Get output configuration
outputs = self.session.get_outputs()
self.output_names = [output.name for output in outputs]
if len(self.output_names) != 2:
raise ValueError(f'Expected 2 output nodes (pitch, yaw), got {len(self.output_names)}')
Logger.info(f'MobileGaze initialized with input size {self.input_size}')
except Exception as e:
Logger.error(f"Failed to load gaze model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize gaze model: {e}') from e
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess a face crop for gaze estimation.
Args:
face_image (np.ndarray): A cropped face image in BGR format.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
"""
# Convert BGR to RGB
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
# Resize to model input size
image = cv2.resize(image, self.input_size)
# Normalize to [0, 1] and apply normalization
image = image.astype(np.float32) / 255.0
mean = np.array(self.input_mean, dtype=np.float32)
std = np.array(self.input_std, dtype=np.float32)
image = (image - mean) / std
# HWC -> CHW -> NCHW
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0).astype(np.float32)
return image
def _softmax(self, x: np.ndarray) -> np.ndarray:
"""Apply softmax along axis 1."""
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return e_x / e_x.sum(axis=1, keepdims=True)
def postprocess(self, outputs: Tuple[np.ndarray, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
"""
Postprocess raw model outputs into gaze angles.
This method takes the raw output from the model's inference and
converts it into pitch and yaw angles in radians.
Args:
outputs: Raw outputs from the model inference. The format depends
on the specific model architecture.
Returns:
Tuple[np.ndarray, np.ndarray]: A tuple of (pitch, yaw) angles in radians.
"""
pitch_logits, yaw_logits = outputs
# Convert logits to probabilities
pitch_probs = self._softmax(pitch_logits)
yaw_probs = self._softmax(yaw_logits)
# Compute expected bin index (soft-argmax)
pitch_deg = np.sum(pitch_probs * self._idx_tensor, axis=1) * self._binwidth - self._angle_offset
yaw_deg = np.sum(yaw_probs * self._idx_tensor, axis=1) * self._binwidth - self._angle_offset
# Convert degrees to radians
pitch = np.radians(pitch_deg[0])
yaw = np.radians(yaw_deg[0])
return pitch, yaw
def estimate(self, face_image: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""
Perform end-to-end gaze estimation on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the gaze direction.
"""
input_tensor = self.preprocess(face_image)
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
pitch, yaw = self.postprocess((outputs[0], outputs[1]))
return pitch, yaw

View File

@@ -0,0 +1,61 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Union
from uniface.constants import ParsingWeights
from .base import BaseFaceParser
from .bisenet import BiSeNet
__all__ = ['BaseFaceParser', 'BiSeNet', 'create_face_parser']
def create_face_parser(
model_name: Union[str, ParsingWeights] = ParsingWeights.RESNET18,
) -> BaseFaceParser:
"""
Factory function to create a face parsing model instance.
This function provides a convenient way to instantiate face parsing models
without directly importing the specific model classes. It supports both
string-based and enum-based model selection.
Args:
model_name (Union[str, ParsingWeights]): The face parsing model to create.
Can be either a string or a ParsingWeights enum value.
Available options:
- 'parsing_resnet18' or ParsingWeights.RESNET18 (default)
- 'parsing_resnet34' or ParsingWeights.RESNET34
Returns:
BaseFaceParser: An instance of the requested face parsing model.
Raises:
ValueError: If the model_name is not recognized.
Examples:
>>> # Using enum
>>> from uniface.parsing import create_face_parser
>>> from uniface.constants import ParsingWeights
>>> parser = create_face_parser(ParsingWeights.RESNET18)
>>>
>>> # Using string
>>> parser = create_face_parser('parsing_resnet18')
>>>
>>> # Parse a face image
>>> mask = parser.parse(face_crop)
"""
# Convert string to enum if necessary
if isinstance(model_name, str):
try:
model_name = ParsingWeights(model_name)
except ValueError as e:
valid_models = [e.value for e in ParsingWeights]
raise ValueError(
f"Unknown face parsing model: '{model_name}'. Valid options are: {', '.join(valid_models)}"
) from e
# All parsing models use the same BiSeNet class
return BiSeNet(model_name=model_name)

106
uniface/parsing/base.py Normal file
View File

@@ -0,0 +1,106 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from abc import ABC, abstractmethod
from typing import Tuple
import numpy as np
class BaseFaceParser(ABC):
"""
Abstract base class for all face parsing models.
This class defines the common interface that all face parsing models must implement,
ensuring consistency across different parsing methods. Face parsing segments a face
image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
The output is a segmentation mask where each pixel is assigned a class label
representing a facial component.
"""
@abstractmethod
def _initialize_model(self) -> None:
"""
Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess the input face image for model inference.
This method should take a raw face crop and convert it into the format
expected by the model's inference engine (e.g., normalized tensor).
Args:
face_image (np.ndarray): A face image in BGR format with
shape (H, W, C).
Returns:
np.ndarray: The preprocessed image tensor ready for inference,
typically with shape (1, C, H, W).
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
"""
Postprocess raw model outputs into a segmentation mask.
This method takes the raw output from the model's inference and
converts it into a segmentation mask at the original image size.
Args:
outputs (np.ndarray): Raw outputs from the model inference.
original_size (Tuple[int, int]): Original image size (width, height).
Returns:
np.ndarray: Segmentation mask with the same size as the original image.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def parse(self, face_image: np.ndarray) -> np.ndarray:
"""
Perform end-to-end face parsing on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the segmentation mask.
Args:
face_image (np.ndarray): A face image in BGR format.
The face should be roughly centered and
well-framed within the image.
Returns:
np.ndarray: Segmentation mask with the same size as input image,
where each pixel value represents a facial component class.
Example:
>>> parser = create_face_parser()
>>> mask = parser.parse(face_crop)
>>> print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
"""
raise NotImplementedError('Subclasses must implement the parse method.')
def __call__(self, face_image: np.ndarray) -> np.ndarray:
"""
Provides a convenient, callable shortcut for the `parse` method.
Args:
face_image (np.ndarray): A face image in BGR format.
Returns:
np.ndarray: Segmentation mask with the same size as input image.
"""
return self.parse(face_image)

166
uniface/parsing/bisenet.py Normal file
View File

@@ -0,0 +1,166 @@
# Copyright 2025 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from typing import Tuple
import cv2
import numpy as np
from uniface.constants import ParsingWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseFaceParser
__all__ = ['BiSeNet']
class BiSeNet(BaseFaceParser):
"""
BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
BiSeNet is a semantic segmentation model that segments a face image into
different facial components such as skin, eyes, nose, mouth, hair, etc. The model
uses a BiSeNet architecture with ResNet backbone and outputs a segmentation mask
where each pixel is assigned a class label.
The model supports 19 facial component classes including:
- Background, skin, eyebrows, eyes, nose, mouth, lips, ears, hair, etc.
Reference:
https://github.com/yakhyo/face-parsing
Args:
model_name (ParsingWeights): The enum specifying the parsing model to load.
Options: RESNET18, RESNET34.
Defaults to `ParsingWeights.RESNET18`.
input_size (Tuple[int, int]): The resolution (width, height) for the model's
input. Defaults to (512, 512).
Attributes:
input_size (Tuple[int, int]): Model input dimensions.
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
Example:
>>> from uniface.parsing import BiSeNet
>>> from uniface import RetinaFace
>>>
>>> detector = RetinaFace()
>>> parser = BiSeNet()
>>>
>>> # Detect faces and parse each face
>>> faces = detector.detect(image)
>>> for face in faces:
... bbox = face['bbox']
... x1, y1, x2, y2 = map(int, bbox[:4])
... face_crop = image[y1:y2, x1:x2]
... mask = parser.parse(face_crop)
... print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
"""
def __init__(
self,
model_name: ParsingWeights = ParsingWeights.RESNET18,
input_size: Tuple[int, int] = (512, 512),
) -> None:
Logger.info(f'Initializing BiSeNet with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""
Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path)
# Get input configuration
input_cfg = self.session.get_inputs()[0]
input_shape = input_cfg.shape
self.input_name = input_cfg.name
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
# Get output configuration
outputs = self.session.get_outputs()
self.output_names = [output.name for output in outputs]
Logger.info(f'BiSeNet initialized with input size {self.input_size}')
except Exception as e:
Logger.error(f"Failed to load parsing model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize parsing model: {e}') from e
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
"""
Preprocess a face image for parsing.
Args:
face_image (np.ndarray): A face image in BGR format.
Returns:
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
"""
# Convert BGR to RGB
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
# Resize to model input size
image = cv2.resize(image, self.input_size, interpolation=cv2.INTER_LINEAR)
# Normalize to [0, 1] and apply normalization
image = image.astype(np.float32) / 255.0
image = (image - self.input_mean) / self.input_std
# HWC -> CHW -> NCHW
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0).astype(np.float32)
return image
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
"""
Postprocess model output to segmentation mask.
Args:
outputs (np.ndarray): Raw model output.
original_size (Tuple[int, int]): Original image size (width, height).
Returns:
np.ndarray: Segmentation mask resized to original dimensions.
"""
# Get the class with highest probability for each pixel
predicted_mask = outputs.squeeze(0).argmax(0).astype(np.uint8)
# Resize back to original size
restored_mask = cv2.resize(predicted_mask, original_size, interpolation=cv2.INTER_NEAREST)
return restored_mask
def parse(self, face_image: np.ndarray) -> np.ndarray:
"""
Perform end-to-end face parsing on a face image.
This method orchestrates the full pipeline: preprocessing the input,
running inference, and postprocessing to return the segmentation mask.
Args:
face_image (np.ndarray): A face image in BGR format.
Returns:
np.ndarray: Segmentation mask with the same size as input image.
"""
original_size = (face_image.shape[1], face_image.shape[0]) # (width, height)
input_tensor = self.preprocess(face_image)
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
return self.postprocess(outputs[0], original_size)

View File

@@ -7,6 +7,52 @@ from typing import List, Tuple, Union
import cv2
import numpy as np
# Face parsing component names (19 classes)
FACE_PARSING_LABELS = [
'background',
'skin',
'l_brow',
'r_brow',
'l_eye',
'r_eye',
'eye_g',
'l_ear',
'r_ear',
'ear_r',
'nose',
'mouth',
'u_lip',
'l_lip',
'neck',
'neck_l',
'cloth',
'hair',
'hat',
]
# Color palette for face parsing visualization
FACE_PARSING_COLORS = [
[0, 0, 0],
[255, 85, 0],
[255, 170, 0],
[255, 0, 85],
[255, 0, 170],
[0, 255, 0],
[85, 255, 0],
[170, 255, 0],
[0, 255, 85],
[0, 255, 170],
[0, 0, 255],
[85, 0, 255],
[170, 0, 255],
[0, 85, 255],
[0, 170, 255],
[255, 255, 0],
[255, 255, 85],
[255, 255, 170],
[255, 0, 255],
]
def draw_detections(
*,
@@ -126,3 +172,159 @@ def draw_fancy_bbox(
# Bottom-right corner
cv2.line(image, (x2, y2), (x2, y2 - corner_length), color, thickness)
cv2.line(image, (x2, y2), (x2 - corner_length, y2), color, thickness)
def draw_gaze(
image: np.ndarray,
bbox: np.ndarray,
pitch: np.ndarray,
yaw: np.ndarray,
*,
draw_bbox: bool = True,
fancy_bbox: bool = True,
draw_angles: bool = True,
):
"""
Draws gaze direction with optional bounding box on an image.
Args:
image: Input image to draw on (modified in-place).
bbox: Face bounding box [x1, y1, x2, y2].
pitch: Vertical gaze angle in radians.
yaw: Horizontal gaze angle in radians.
draw_bbox: Whether to draw the bounding box. Defaults to True.
fancy_bbox: Use fancy corner-style bbox. Defaults to True.
draw_angles: Whether to display pitch/yaw values as text. Defaults to False.
"""
x_min, y_min, x_max, y_max = map(int, bbox[:4])
# Calculate dynamic line thickness based on image size (same as draw_detections)
line_thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
# Calculate dynamic font scale based on bbox height (same as draw_detections)
bbox_h = y_max - y_min
font_scale = max(0.4, min(0.7, bbox_h / 200))
font_thickness = 2
# Draw bounding box if requested
if draw_bbox:
if fancy_bbox:
draw_fancy_bbox(image, bbox, color=(0, 255, 0), thickness=line_thickness)
else:
cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), line_thickness)
# Calculate center of the bounding box
x_center = (x_min + x_max) // 2
y_center = (y_min + y_max) // 2
# Calculate the direction of the gaze
length = x_max - x_min
dx = int(-length * np.sin(pitch) * np.cos(yaw))
dy = int(-length * np.sin(yaw))
point1 = (x_center, y_center)
point2 = (x_center + dx, y_center + dy)
# Calculate dynamic center point radius based on line thickness
center_radius = max(line_thickness + 1, 4)
# Draw gaze direction
cv2.circle(image, (x_center, y_center), radius=center_radius, color=(0, 0, 255), thickness=-1)
cv2.arrowedLine(
image,
point1,
point2,
color=(0, 0, 255),
thickness=line_thickness,
line_type=cv2.LINE_AA,
tipLength=0.25,
)
# Draw angle values
if draw_angles:
text = f'P:{np.degrees(pitch):.0f}deg Y:{np.degrees(yaw):.0f}deg'
(text_width, text_height), baseline = cv2.getTextSize(
text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, font_thickness
)
# Draw background rectangle for text
cv2.rectangle(
image,
(x_min, y_min - text_height - baseline - 10),
(x_min + text_width + 10, y_min),
(0, 0, 255),
-1,
)
# Draw text
cv2.putText(
image,
text,
(x_min + 5, y_min - 5),
cv2.FONT_HERSHEY_SIMPLEX,
font_scale,
(255, 255, 255),
font_thickness,
)
def vis_parsing_maps(
image: np.ndarray,
segmentation_mask: np.ndarray,
*,
save_image: bool = False,
save_path: str = 'result.png',
) -> np.ndarray:
"""
Visualizes face parsing segmentation mask by overlaying colored regions on the image.
Args:
image: Input face image in RGB format with shape (H, W, 3).
segmentation_mask: Segmentation mask with shape (H, W) where each pixel
value represents a facial component class (0-18).
save_image: Whether to save the visualization to disk. Defaults to False.
save_path: Path to save the visualization if save_image is True.
Returns:
np.ndarray: Blended image with segmentation overlay in BGR format.
Example:
>>> import cv2
>>> from uniface.parsing import BiSeNet
>>> from uniface.visualization import vis_parsing_maps
>>>
>>> parser = BiSeNet()
>>> face_image = cv2.imread('face.jpg')
>>> mask = parser.parse(face_image)
>>>
>>> # Visualize
>>> face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
>>> result = vis_parsing_maps(face_rgb, mask)
>>> cv2.imwrite('parsed_face.jpg', result)
"""
# Create numpy arrays for image and segmentation mask
image = np.array(image).copy().astype(np.uint8)
segmentation_mask = segmentation_mask.copy().astype(np.uint8)
# Create a color mask
segmentation_mask_color = np.zeros((segmentation_mask.shape[0], segmentation_mask.shape[1], 3))
num_classes = np.max(segmentation_mask)
for class_index in range(1, num_classes + 1):
class_pixels = np.where(segmentation_mask == class_index)
segmentation_mask_color[class_pixels[0], class_pixels[1], :] = FACE_PARSING_COLORS[class_index]
segmentation_mask_color = segmentation_mask_color.astype(np.uint8)
# Convert image to BGR format for blending
bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Blend the image with the segmentation mask
blended_image = cv2.addWeighted(bgr_image, 0.6, segmentation_mask_color, 0.4, 0)
# Save the result if required
if save_image:
cv2.imwrite(save_path, blended_image, [int(cv2.IMWRITE_JPEG_QUALITY), 100])
return blended_image