mirror of
https://github.com/yakhyo/uniface.git
synced 2025-12-30 09:02:25 +00:00
feat: Add Face Parsing model BiSeNet model trained on CelebMask dataset (#35)
* Add BiSeNet face parsing implementation * Add parsing model weights configuration * Export BiSeNet in main package * Add face parsing tests * Add face parsing examples and script * Bump version to 1.5.0 * Update documentation for face parsing * Fix face parsing notebook to use lips instead of mouth * chore: Update the face parsing example * fix: Fix model argument to use Enum * ref: Move vis_parsing_map function into visualization.py * docs: Update README.md
This commit is contained in:
committed by
GitHub
parent
4d1921e531
commit
54b769c0f1
74
MODELS.md
74
MODELS.md
@@ -332,6 +332,78 @@ print(f"Pitch: {np.degrees(pitch):.1f}°, Yaw: {np.degrees(yaw):.1f}°")
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Face Parsing Models
|
||||||
|
|
||||||
|
### BiSeNet Family
|
||||||
|
|
||||||
|
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
|
||||||
|
|
||||||
|
| Model Name | Params | Size | Classes | Use Case |
|
||||||
|
| -------------- | ------ | ------- | ------- | ----------------------------- |
|
||||||
|
| `RESNET18` ⭐ | 13.3M | 50.7 MB | 19 | **Recommended default** |
|
||||||
|
| `RESNET34` | 24.1M | 89.2 MB | 19 | Higher accuracy |
|
||||||
|
|
||||||
|
**19 Facial Component Classes:**
|
||||||
|
1. Background
|
||||||
|
2. Skin
|
||||||
|
3. Left Eyebrow
|
||||||
|
4. Right Eyebrow
|
||||||
|
5. Left Eye
|
||||||
|
6. Right Eye
|
||||||
|
7. Eye Glasses
|
||||||
|
8. Left Ear
|
||||||
|
9. Right Ear
|
||||||
|
10. Ear Ring
|
||||||
|
11. Nose
|
||||||
|
12. Mouth
|
||||||
|
13. Upper Lip
|
||||||
|
14. Lower Lip
|
||||||
|
15. Neck
|
||||||
|
16. Neck Lace
|
||||||
|
17. Cloth
|
||||||
|
18. Hair
|
||||||
|
19. Hat
|
||||||
|
|
||||||
|
**Dataset**: Trained on CelebAMask-HQ
|
||||||
|
**Architecture**: BiSeNet with ResNet backbone
|
||||||
|
**Input Size**: 512×512 (automatically resized)
|
||||||
|
|
||||||
|
#### Usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
from uniface.parsing import BiSeNet
|
||||||
|
from uniface.constants import ParsingWeights
|
||||||
|
from uniface.visualization import vis_parsing_maps
|
||||||
|
import cv2
|
||||||
|
|
||||||
|
# Default (recommended)
|
||||||
|
parser = BiSeNet() # Uses RESNET18
|
||||||
|
|
||||||
|
# Higher accuracy model
|
||||||
|
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||||
|
|
||||||
|
# Parse face image (already cropped)
|
||||||
|
mask = parser.parse(face_image)
|
||||||
|
|
||||||
|
# Visualize with overlay
|
||||||
|
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||||
|
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||||||
|
|
||||||
|
# mask shape: (H, W) with values 0-18 representing classes
|
||||||
|
print(f"Detected {len(np.unique(mask))} facial components")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Applications:**
|
||||||
|
- Face makeup and beauty applications
|
||||||
|
- Virtual try-on systems
|
||||||
|
- Face editing and manipulation
|
||||||
|
- Facial feature extraction
|
||||||
|
- Portrait segmentation
|
||||||
|
|
||||||
|
**Note**: Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Model Updates
|
## Model Updates
|
||||||
|
|
||||||
Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
|
Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
|
||||||
@@ -372,6 +444,7 @@ python scripts/download_model.py --model MNET_V2
|
|||||||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
||||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
||||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
||||||
|
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
|
||||||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
||||||
|
|
||||||
### Papers
|
### Papers
|
||||||
@@ -381,3 +454,4 @@ python scripts/download_model.py --model MNET_V2
|
|||||||
- **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
|
- **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
|
||||||
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
|
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
|
||||||
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
|
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
|
||||||
|
- **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)
|
||||||
|
|||||||
@@ -285,7 +285,50 @@ Face 2: pitch=-8.1°, yaw=15.7°
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 8. Batch Processing (3 minutes)
|
## 8. Face Parsing (2 minutes)
|
||||||
|
|
||||||
|
Segment face into semantic components (skin, eyes, nose, mouth, hair, etc.):
|
||||||
|
|
||||||
|
```python
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
from uniface.parsing import BiSeNet
|
||||||
|
from uniface.visualization import vis_parsing_maps
|
||||||
|
|
||||||
|
# Initialize parser
|
||||||
|
parser = BiSeNet() # Uses ResNet18 by default
|
||||||
|
|
||||||
|
# Load face image (already cropped)
|
||||||
|
face_image = cv2.imread("face.jpg")
|
||||||
|
|
||||||
|
# Parse face into 19 components
|
||||||
|
mask = parser.parse(face_image)
|
||||||
|
|
||||||
|
# Visualize with overlay
|
||||||
|
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||||
|
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||||||
|
|
||||||
|
# Convert back to BGR for saving
|
||||||
|
vis_bgr = cv2.cvtColor(vis_result, cv2.COLOR_RGB2BGR)
|
||||||
|
cv2.imwrite("parsed_face.jpg", vis_bgr)
|
||||||
|
|
||||||
|
print(f"Detected {len(np.unique(mask))} facial components")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Detected 12 facial components
|
||||||
|
```
|
||||||
|
|
||||||
|
**19 Facial Component Classes:**
|
||||||
|
- Background, Skin, Eyebrows (L/R), Eyes (L/R), Eye Glasses
|
||||||
|
- Ears (L/R), Ear Ring, Nose, Mouth, Lips (Upper/Lower)
|
||||||
|
- Neck, Neck Lace, Cloth, Hair, Hat
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Batch Processing (3 minutes)
|
||||||
|
|
||||||
Process multiple images:
|
Process multiple images:
|
||||||
|
|
||||||
@@ -318,7 +361,7 @@ print("Done!")
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Model Selection
|
## 10. Model Selection
|
||||||
|
|
||||||
Choose the right model for your use case:
|
Choose the right model for your use case:
|
||||||
|
|
||||||
@@ -385,6 +428,19 @@ gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
|
|||||||
gaze_estimator = MobileGaze(model_name=GazeWeights.RESNET50)
|
gaze_estimator = MobileGaze(model_name=GazeWeights.RESNET50)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Face Parsing Models
|
||||||
|
|
||||||
|
```python
|
||||||
|
from uniface.parsing import BiSeNet
|
||||||
|
from uniface.constants import ParsingWeights
|
||||||
|
|
||||||
|
# Default (recommended, 50.7 MB)
|
||||||
|
parser = BiSeNet() # Uses RESNET18
|
||||||
|
|
||||||
|
# Higher accuracy (89.2 MB)
|
||||||
|
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Common Issues
|
## Common Issues
|
||||||
@@ -446,6 +502,8 @@ Explore interactive examples for common tasks:
|
|||||||
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
||||||
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
|
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
|
||||||
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
|
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
|
||||||
|
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
|
||||||
|
| **Gaze Estimation** | Estimate gaze direction | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
|
||||||
|
|
||||||
### Additional Resources
|
### Additional Resources
|
||||||
|
|
||||||
@@ -460,4 +518,5 @@ Explore interactive examples for common tasks:
|
|||||||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference)
|
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference)
|
||||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
|
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
|
||||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)
|
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)
|
||||||
|
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
|
||||||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
|
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
|
||||||
|
|||||||
33
README.md
33
README.md
@@ -11,7 +11,7 @@
|
|||||||
<img src=".github/logos/logo_web.webp" width=75%>
|
<img src=".github/logos/logo_web.webp" width=75%>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, and attribute analysis with hardware acceleration support across platforms.
|
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, face parsing, gaze estimation, and attribute analysis with hardware acceleration support across platforms.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -20,6 +20,7 @@
|
|||||||
- **High-Speed Face Detection**: ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models
|
- **High-Speed Face Detection**: ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models
|
||||||
- **Facial Landmark Detection**: Accurate 106-point landmark localization
|
- **Facial Landmark Detection**: Accurate 106-point landmark localization
|
||||||
- **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
|
- **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
|
||||||
|
- **Face Parsing**: BiSeNet-based semantic segmentation with 19 facial component classes
|
||||||
- **Gaze Estimation**: Real-time gaze direction prediction with MobileGaze
|
- **Gaze Estimation**: Real-time gaze direction prediction with MobileGaze
|
||||||
- **Attribute Analysis**: Age, gender, and emotion detection
|
- **Attribute Analysis**: Age, gender, and emotion detection
|
||||||
- **Face Alignment**: Precise alignment for downstream tasks
|
- **Face Alignment**: Precise alignment for downstream tasks
|
||||||
@@ -176,6 +177,27 @@ for face in faces:
|
|||||||
draw_gaze(image, bbox, pitch, yaw)
|
draw_gaze(image, bbox, pitch, yaw)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Face Parsing
|
||||||
|
|
||||||
|
```python
|
||||||
|
from uniface.parsing import BiSeNet
|
||||||
|
from uniface.visualization import vis_parsing_maps
|
||||||
|
|
||||||
|
# Initialize parser
|
||||||
|
parser = BiSeNet() # Uses ResNet18 by default
|
||||||
|
|
||||||
|
# Parse face image (already cropped)
|
||||||
|
mask = parser.parse(face_image)
|
||||||
|
|
||||||
|
# Visualize with overlay
|
||||||
|
import cv2
|
||||||
|
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||||
|
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||||||
|
|
||||||
|
# mask contains 19 classes: skin, eyes, nose, mouth, hair, etc.
|
||||||
|
print(f"Unique classes: {len(np.unique(mask))}")
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
@@ -282,6 +304,12 @@ faces = detect_faces(image, method='retinaface', conf_thresh=0.8) # methods: re
|
|||||||
| ------------- | ------------------------------------------ | ------------------------------------ |
|
| ------------- | ------------------------------------------ | ------------------------------------ |
|
||||||
| `MobileGaze` | `model_name=GazeWeights.RESNET34` | Returns (pitch, yaw) angles in radians; trained on Gaze360 |
|
| `MobileGaze` | `model_name=GazeWeights.RESNET34` | Returns (pitch, yaw) angles in radians; trained on Gaze360 |
|
||||||
|
|
||||||
|
**Face Parsing**
|
||||||
|
|
||||||
|
| Class | Key params (defaults) | Notes |
|
||||||
|
| ---------- | ---------------------------------------- | ------------------------------------ |
|
||||||
|
| `BiSeNet` | `model_name=ParsingWeights.RESNET18`, `input_size=(512, 512)` | 19 facial component classes; BiSeNet architecture with ResNet backbone |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Model Performance
|
## Model Performance
|
||||||
@@ -328,6 +356,7 @@ Interactive examples covering common face analysis tasks:
|
|||||||
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
||||||
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
|
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
|
||||||
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
|
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
|
||||||
|
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
|
||||||
| **Gaze Estimation** | Estimate gaze direction from face images | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
|
| **Gaze Estimation** | Estimate gaze direction from face images | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
|
||||||
|
|
||||||
### Webcam Face Detection
|
### Webcam Face Detection
|
||||||
@@ -519,6 +548,7 @@ uniface/
|
|||||||
│ ├── detection/ # Face detection models
|
│ ├── detection/ # Face detection models
|
||||||
│ ├── recognition/ # Face recognition models
|
│ ├── recognition/ # Face recognition models
|
||||||
│ ├── landmark/ # Landmark detection
|
│ ├── landmark/ # Landmark detection
|
||||||
|
│ ├── parsing/ # Face parsing
|
||||||
│ ├── gaze/ # Gaze estimation
|
│ ├── gaze/ # Gaze estimation
|
||||||
│ ├── attribute/ # Age, gender, emotion
|
│ ├── attribute/ # Age, gender, emotion
|
||||||
│ ├── onnx_utils.py # ONNX Runtime utilities
|
│ ├── onnx_utils.py # ONNX Runtime utilities
|
||||||
@@ -536,6 +566,7 @@ uniface/
|
|||||||
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
|
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
|
||||||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
||||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
||||||
|
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet face parsing training code and pretrained weights
|
||||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
||||||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
||||||
|
|
||||||
|
|||||||
387
examples/face_parsing.ipynb
Normal file
387
examples/face_parsing.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -1,7 +1,7 @@
|
|||||||
[project]
|
[project]
|
||||||
name = "uniface"
|
name = "uniface"
|
||||||
version = "1.4.0"
|
version = "1.5.0"
|
||||||
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Gaze Estimation, Age, and Gender Detection"
|
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
license = { text = "MIT" }
|
license = { text = "MIT" }
|
||||||
authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
|
authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
|
||||||
@@ -14,6 +14,8 @@ keywords = [
|
|||||||
"face-detection",
|
"face-detection",
|
||||||
"face-recognition",
|
"face-recognition",
|
||||||
"facial-landmarks",
|
"facial-landmarks",
|
||||||
|
"face-parsing",
|
||||||
|
"face-segmentation",
|
||||||
"gaze-estimation",
|
"gaze-estimation",
|
||||||
"age-detection",
|
"age-detection",
|
||||||
"gender-detection",
|
"gender-detection",
|
||||||
@@ -22,6 +24,7 @@ keywords = [
|
|||||||
"onnx",
|
"onnx",
|
||||||
"onnxruntime",
|
"onnxruntime",
|
||||||
"face-analysis",
|
"face-analysis",
|
||||||
|
"bisenet",
|
||||||
]
|
]
|
||||||
|
|
||||||
classifiers = [
|
classifiers = [
|
||||||
|
|||||||
126
scripts/run_face_parsing.py
Normal file
126
scripts/run_face_parsing.py
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
# Face parsing on detected faces
|
||||||
|
# Usage: python run_face_parsing.py --image path/to/image.jpg
|
||||||
|
# python run_face_parsing.py --webcam
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
|
||||||
|
from uniface import RetinaFace
|
||||||
|
from uniface.constants import ParsingWeights
|
||||||
|
from uniface.parsing import BiSeNet
|
||||||
|
from uniface.visualization import vis_parsing_maps
|
||||||
|
|
||||||
|
|
||||||
|
def process_image(detector, parser, image_path: str, save_dir: str = 'outputs'):
|
||||||
|
image = cv2.imread(image_path)
|
||||||
|
if image is None:
|
||||||
|
print(f"Error: Failed to load image from '{image_path}'")
|
||||||
|
return
|
||||||
|
|
||||||
|
faces = detector.detect(image)
|
||||||
|
print(f'Detected {len(faces)} face(s)')
|
||||||
|
|
||||||
|
result_image = image.copy()
|
||||||
|
|
||||||
|
for i, face in enumerate(faces):
|
||||||
|
bbox = face['bbox']
|
||||||
|
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||||
|
face_crop = image[y1:y2, x1:x2]
|
||||||
|
|
||||||
|
if face_crop.size == 0:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Parse the face
|
||||||
|
mask = parser.parse(face_crop)
|
||||||
|
print(f' Face {i + 1}: parsed with {len(set(mask.flatten()))} unique classes')
|
||||||
|
|
||||||
|
# Visualize the parsing result
|
||||||
|
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
|
||||||
|
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
|
||||||
|
|
||||||
|
# Place the visualization back on the original image
|
||||||
|
result_image[y1:y2, x1:x2] = vis_result
|
||||||
|
|
||||||
|
# Draw bounding box
|
||||||
|
cv2.rectangle(result_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
|
||||||
|
|
||||||
|
os.makedirs(save_dir, exist_ok=True)
|
||||||
|
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_parsing.jpg')
|
||||||
|
cv2.imwrite(output_path, result_image)
|
||||||
|
print(f'Output saved: {output_path}')
|
||||||
|
|
||||||
|
|
||||||
|
def run_webcam(detector, parser):
|
||||||
|
cap = cv2.VideoCapture(0)
|
||||||
|
if not cap.isOpened():
|
||||||
|
print('Cannot open webcam')
|
||||||
|
return
|
||||||
|
|
||||||
|
print("Press 'q' to quit")
|
||||||
|
|
||||||
|
while True:
|
||||||
|
ret, frame = cap.read()
|
||||||
|
if not ret:
|
||||||
|
break
|
||||||
|
|
||||||
|
frame = cv2.flip(frame, 1)
|
||||||
|
faces = detector.detect(frame)
|
||||||
|
|
||||||
|
for face in faces:
|
||||||
|
bbox = face['bbox']
|
||||||
|
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||||
|
face_crop = frame[y1:y2, x1:x2]
|
||||||
|
|
||||||
|
if face_crop.size == 0:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Parse the face
|
||||||
|
mask = parser.parse(face_crop)
|
||||||
|
|
||||||
|
# Visualize the parsing result
|
||||||
|
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
|
||||||
|
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
|
||||||
|
|
||||||
|
# Place the visualization back on the frame
|
||||||
|
frame[y1:y2, x1:x2] = vis_result
|
||||||
|
|
||||||
|
# Draw bounding box
|
||||||
|
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
|
||||||
|
|
||||||
|
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||||
|
cv2.imshow('Face Parsing', frame)
|
||||||
|
|
||||||
|
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||||
|
break
|
||||||
|
|
||||||
|
cap.release()
|
||||||
|
cv2.destroyAllWindows()
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser_arg = argparse.ArgumentParser(description='Run face parsing')
|
||||||
|
parser_arg.add_argument('--image', type=str, help='Path to input image')
|
||||||
|
parser_arg.add_argument('--webcam', action='store_true', help='Use webcam')
|
||||||
|
parser_arg.add_argument('--save_dir', type=str, default='outputs')
|
||||||
|
parser_arg.add_argument(
|
||||||
|
'--model', type=str, default=ParsingWeights.RESNET18, choices=[ParsingWeights.RESNET18, ParsingWeights.RESNET34]
|
||||||
|
)
|
||||||
|
args = parser_arg.parse_args()
|
||||||
|
|
||||||
|
if not args.image and not args.webcam:
|
||||||
|
parser_arg.error('Either --image or --webcam must be specified')
|
||||||
|
|
||||||
|
detector = RetinaFace()
|
||||||
|
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||||
|
|
||||||
|
if args.webcam:
|
||||||
|
run_webcam(detector, parser)
|
||||||
|
else:
|
||||||
|
process_image(detector, parser, args.image, args.save_dir)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
118
tests/test_parsing.py
Normal file
118
tests/test_parsing.py
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||||
|
# Author: Yakhyokhuja Valikhujaev
|
||||||
|
# GitHub: https://github.com/yakhyo
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from uniface.constants import ParsingWeights
|
||||||
|
from uniface.parsing import BiSeNet, create_face_parser
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_initialization():
|
||||||
|
"""Test BiSeNet initialization."""
|
||||||
|
parser = BiSeNet()
|
||||||
|
assert parser is not None
|
||||||
|
assert parser.input_size == (512, 512)
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_with_different_models():
|
||||||
|
"""Test BiSeNet with different model weights."""
|
||||||
|
parser_resnet18 = BiSeNet(model_name=ParsingWeights.RESNET18)
|
||||||
|
parser_resnet34 = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||||
|
|
||||||
|
assert parser_resnet18 is not None
|
||||||
|
assert parser_resnet34 is not None
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_preprocess():
|
||||||
|
"""Test preprocessing."""
|
||||||
|
parser = BiSeNet()
|
||||||
|
|
||||||
|
# Create a dummy face image
|
||||||
|
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
|
||||||
|
|
||||||
|
# Preprocess
|
||||||
|
preprocessed = parser.preprocess(face_image)
|
||||||
|
|
||||||
|
assert preprocessed.shape == (1, 3, 512, 512)
|
||||||
|
assert preprocessed.dtype == np.float32
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_postprocess():
|
||||||
|
"""Test postprocessing."""
|
||||||
|
parser = BiSeNet()
|
||||||
|
|
||||||
|
# Create dummy model output (batch_size=1, num_classes=19, H=512, W=512)
|
||||||
|
dummy_output = np.random.randn(1, 19, 512, 512).astype(np.float32)
|
||||||
|
|
||||||
|
# Postprocess
|
||||||
|
mask = parser.postprocess(dummy_output, original_size=(256, 256))
|
||||||
|
|
||||||
|
assert mask.shape == (256, 256)
|
||||||
|
assert mask.dtype == np.uint8
|
||||||
|
assert mask.min() >= 0
|
||||||
|
assert mask.max() < 19 # 19 classes (0-18)
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_parse():
|
||||||
|
"""Test end-to-end parsing."""
|
||||||
|
parser = BiSeNet()
|
||||||
|
|
||||||
|
# Create a dummy face image
|
||||||
|
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
|
||||||
|
|
||||||
|
# Parse
|
||||||
|
mask = parser.parse(face_image)
|
||||||
|
|
||||||
|
assert mask.shape == (256, 256)
|
||||||
|
assert mask.dtype == np.uint8
|
||||||
|
assert mask.min() >= 0
|
||||||
|
assert mask.max() < 19
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_callable():
|
||||||
|
"""Test that BiSeNet is callable."""
|
||||||
|
parser = BiSeNet()
|
||||||
|
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
|
||||||
|
|
||||||
|
# Should work as callable
|
||||||
|
mask = parser(face_image)
|
||||||
|
|
||||||
|
assert mask.shape == (256, 256)
|
||||||
|
assert mask.dtype == np.uint8
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_face_parser_with_enum():
|
||||||
|
"""Test factory function with enum."""
|
||||||
|
parser = create_face_parser(ParsingWeights.RESNET18)
|
||||||
|
assert parser is not None
|
||||||
|
assert isinstance(parser, BiSeNet)
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_face_parser_with_string():
|
||||||
|
"""Test factory function with string."""
|
||||||
|
parser = create_face_parser('parsing_resnet18')
|
||||||
|
assert parser is not None
|
||||||
|
assert isinstance(parser, BiSeNet)
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_face_parser_invalid_model():
|
||||||
|
"""Test factory function with invalid model name."""
|
||||||
|
with pytest.raises(ValueError, match='Unknown face parsing model'):
|
||||||
|
create_face_parser('invalid_model')
|
||||||
|
|
||||||
|
|
||||||
|
def test_bisenet_different_input_sizes():
|
||||||
|
"""Test parsing with different input image sizes."""
|
||||||
|
parser = BiSeNet()
|
||||||
|
|
||||||
|
# Test with different sizes
|
||||||
|
sizes = [(128, 128), (256, 256), (512, 512), (640, 480)]
|
||||||
|
|
||||||
|
for h, w in sizes:
|
||||||
|
face_image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
|
||||||
|
mask = parser.parse(face_image)
|
||||||
|
|
||||||
|
assert mask.shape == (h, w), f'Failed for size {h}x{w}'
|
||||||
|
assert mask.dtype == np.uint8
|
||||||
@@ -13,13 +13,13 @@
|
|||||||
|
|
||||||
__license__ = 'MIT'
|
__license__ = 'MIT'
|
||||||
__author__ = 'Yakhyokhuja Valikhujaev'
|
__author__ = 'Yakhyokhuja Valikhujaev'
|
||||||
__version__ = '1.4.0'
|
__version__ = '1.5.0'
|
||||||
|
|
||||||
|
|
||||||
from uniface.face_utils import compute_similarity, face_alignment
|
from uniface.face_utils import compute_similarity, face_alignment
|
||||||
from uniface.log import Logger, enable_logging
|
from uniface.log import Logger, enable_logging
|
||||||
from uniface.model_store import verify_model_weights
|
from uniface.model_store import verify_model_weights
|
||||||
from uniface.visualization import draw_detections
|
from uniface.visualization import draw_detections, vis_parsing_maps
|
||||||
|
|
||||||
from .analyzer import FaceAnalyzer
|
from .analyzer import FaceAnalyzer
|
||||||
from .attribute import AgeGender
|
from .attribute import AgeGender
|
||||||
@@ -39,6 +39,7 @@ from .detection import (
|
|||||||
)
|
)
|
||||||
from .gaze import MobileGaze, create_gaze_estimator
|
from .gaze import MobileGaze, create_gaze_estimator
|
||||||
from .landmark import Landmark106, create_landmarker
|
from .landmark import Landmark106, create_landmarker
|
||||||
|
from .parsing import BiSeNet, create_face_parser
|
||||||
from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
|
from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
@@ -50,6 +51,7 @@ __all__ = [
|
|||||||
'FaceAnalyzer',
|
'FaceAnalyzer',
|
||||||
# Factory functions
|
# Factory functions
|
||||||
'create_detector',
|
'create_detector',
|
||||||
|
'create_face_parser',
|
||||||
'create_gaze_estimator',
|
'create_gaze_estimator',
|
||||||
'create_landmarker',
|
'create_landmarker',
|
||||||
'create_recognizer',
|
'create_recognizer',
|
||||||
@@ -67,12 +69,15 @@ __all__ = [
|
|||||||
'Landmark106',
|
'Landmark106',
|
||||||
# Gaze models
|
# Gaze models
|
||||||
'MobileGaze',
|
'MobileGaze',
|
||||||
|
# Parsing models
|
||||||
|
'BiSeNet',
|
||||||
# Attribute models
|
# Attribute models
|
||||||
'AgeGender',
|
'AgeGender',
|
||||||
'Emotion',
|
'Emotion',
|
||||||
# Utilities
|
# Utilities
|
||||||
'compute_similarity',
|
'compute_similarity',
|
||||||
'draw_detections',
|
'draw_detections',
|
||||||
|
'vis_parsing_maps',
|
||||||
'face_alignment',
|
'face_alignment',
|
||||||
'verify_model_weights',
|
'verify_model_weights',
|
||||||
'Logger',
|
'Logger',
|
||||||
|
|||||||
@@ -109,6 +109,16 @@ class GazeWeights(str, Enum):
|
|||||||
MOBILEONE_S0 = "gaze_mobileone_s0"
|
MOBILEONE_S0 = "gaze_mobileone_s0"
|
||||||
|
|
||||||
|
|
||||||
|
class ParsingWeights(str, Enum):
|
||||||
|
"""
|
||||||
|
Face Parsing: Semantic Segmentation of Facial Components.
|
||||||
|
Trained on CelebAMask-HQ dataset.
|
||||||
|
https://github.com/yakhyo/face-parsing
|
||||||
|
"""
|
||||||
|
RESNET18 = "parsing_resnet18"
|
||||||
|
RESNET34 = "parsing_resnet34"
|
||||||
|
|
||||||
|
|
||||||
MODEL_URLS: Dict[Enum, str] = {
|
MODEL_URLS: Dict[Enum, str] = {
|
||||||
# RetinaFace
|
# RetinaFace
|
||||||
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.25.onnx',
|
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.25.onnx',
|
||||||
@@ -148,6 +158,9 @@ MODEL_URLS: Dict[Enum, str] = {
|
|||||||
GazeWeights.RESNET50: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet50_gaze.onnx',
|
GazeWeights.RESNET50: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet50_gaze.onnx',
|
||||||
GazeWeights.MOBILENET_V2: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobilenetv2_gaze.onnx',
|
GazeWeights.MOBILENET_V2: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobilenetv2_gaze.onnx',
|
||||||
GazeWeights.MOBILEONE_S0: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobileone_s0_gaze.onnx',
|
GazeWeights.MOBILEONE_S0: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobileone_s0_gaze.onnx',
|
||||||
|
# Parsing
|
||||||
|
ParsingWeights.RESNET18: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',
|
||||||
|
ParsingWeights.RESNET34: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet34.onnx',
|
||||||
}
|
}
|
||||||
|
|
||||||
MODEL_SHA256: Dict[Enum, str] = {
|
MODEL_SHA256: Dict[Enum, str] = {
|
||||||
@@ -189,6 +202,9 @@ MODEL_SHA256: Dict[Enum, str] = {
|
|||||||
GazeWeights.RESNET50: 'e1eaf98f5ec7c89c6abe7cfe39f7be83e747163f98d1ff945c0603b3c521be22',
|
GazeWeights.RESNET50: 'e1eaf98f5ec7c89c6abe7cfe39f7be83e747163f98d1ff945c0603b3c521be22',
|
||||||
GazeWeights.MOBILENET_V2: 'fdcdb84e3e6421b5a79e8f95139f249fc258d7f387eed5ddac2b80a9a15ce076',
|
GazeWeights.MOBILENET_V2: 'fdcdb84e3e6421b5a79e8f95139f249fc258d7f387eed5ddac2b80a9a15ce076',
|
||||||
GazeWeights.MOBILEONE_S0: 'c0b5a4f4a0ffd24f76ab3c1452354bb2f60110899fd9a88b464c75bafec0fde8',
|
GazeWeights.MOBILEONE_S0: 'c0b5a4f4a0ffd24f76ab3c1452354bb2f60110899fd9a88b464c75bafec0fde8',
|
||||||
|
# Face Parsing
|
||||||
|
ParsingWeights.RESNET18: '0d9bd318e46987c3bdbfacae9e2c0f461cae1c6ac6ea6d43bbe541a91727e33f',
|
||||||
|
ParsingWeights.RESNET34: '5b805bba7b5660ab7070b5a381dcf75e5b3e04199f1e9387232a77a00095102e',
|
||||||
}
|
}
|
||||||
|
|
||||||
CHUNK_SIZE = 8192
|
CHUNK_SIZE = 8192
|
||||||
|
|||||||
61
uniface/parsing/__init__.py
Normal file
61
uniface/parsing/__init__.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||||
|
# Author: Yakhyokhuja Valikhujaev
|
||||||
|
# GitHub: https://github.com/yakhyo
|
||||||
|
|
||||||
|
from typing import Union
|
||||||
|
|
||||||
|
from uniface.constants import ParsingWeights
|
||||||
|
|
||||||
|
from .base import BaseFaceParser
|
||||||
|
from .bisenet import BiSeNet
|
||||||
|
|
||||||
|
__all__ = ['BaseFaceParser', 'BiSeNet', 'create_face_parser']
|
||||||
|
|
||||||
|
|
||||||
|
def create_face_parser(
|
||||||
|
model_name: Union[str, ParsingWeights] = ParsingWeights.RESNET18,
|
||||||
|
) -> BaseFaceParser:
|
||||||
|
"""
|
||||||
|
Factory function to create a face parsing model instance.
|
||||||
|
|
||||||
|
This function provides a convenient way to instantiate face parsing models
|
||||||
|
without directly importing the specific model classes. It supports both
|
||||||
|
string-based and enum-based model selection.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name (Union[str, ParsingWeights]): The face parsing model to create.
|
||||||
|
Can be either a string or a ParsingWeights enum value.
|
||||||
|
Available options:
|
||||||
|
- 'parsing_resnet18' or ParsingWeights.RESNET18 (default)
|
||||||
|
- 'parsing_resnet34' or ParsingWeights.RESNET34
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
BaseFaceParser: An instance of the requested face parsing model.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValueError: If the model_name is not recognized.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> # Using enum
|
||||||
|
>>> from uniface.parsing import create_face_parser
|
||||||
|
>>> from uniface.constants import ParsingWeights
|
||||||
|
>>> parser = create_face_parser(ParsingWeights.RESNET18)
|
||||||
|
>>>
|
||||||
|
>>> # Using string
|
||||||
|
>>> parser = create_face_parser('parsing_resnet18')
|
||||||
|
>>>
|
||||||
|
>>> # Parse a face image
|
||||||
|
>>> mask = parser.parse(face_crop)
|
||||||
|
"""
|
||||||
|
# Convert string to enum if necessary
|
||||||
|
if isinstance(model_name, str):
|
||||||
|
try:
|
||||||
|
model_name = ParsingWeights(model_name)
|
||||||
|
except ValueError as e:
|
||||||
|
valid_models = [e.value for e in ParsingWeights]
|
||||||
|
raise ValueError(
|
||||||
|
f"Unknown face parsing model: '{model_name}'. Valid options are: {', '.join(valid_models)}"
|
||||||
|
) from e
|
||||||
|
|
||||||
|
# All parsing models use the same BiSeNet class
|
||||||
|
return BiSeNet(model_name=model_name)
|
||||||
106
uniface/parsing/base.py
Normal file
106
uniface/parsing/base.py
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||||
|
# Author: Yakhyokhuja Valikhujaev
|
||||||
|
# GitHub: https://github.com/yakhyo
|
||||||
|
|
||||||
|
from abc import ABC, abstractmethod
|
||||||
|
from typing import Tuple
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
class BaseFaceParser(ABC):
|
||||||
|
"""
|
||||||
|
Abstract base class for all face parsing models.
|
||||||
|
|
||||||
|
This class defines the common interface that all face parsing models must implement,
|
||||||
|
ensuring consistency across different parsing methods. Face parsing segments a face
|
||||||
|
image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
|
||||||
|
|
||||||
|
The output is a segmentation mask where each pixel is assigned a class label
|
||||||
|
representing a facial component.
|
||||||
|
"""
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def _initialize_model(self) -> None:
|
||||||
|
"""
|
||||||
|
Initialize the underlying model for inference.
|
||||||
|
|
||||||
|
This method should handle loading model weights, creating the
|
||||||
|
inference session (e.g., ONNX Runtime), and any necessary
|
||||||
|
setup procedures to prepare the model for prediction.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
RuntimeError: If the model fails to load or initialize.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Preprocess the input face image for model inference.
|
||||||
|
|
||||||
|
This method should take a raw face crop and convert it into the format
|
||||||
|
expected by the model's inference engine (e.g., normalized tensor).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
face_image (np.ndarray): A face image in BGR format with
|
||||||
|
shape (H, W, C).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: The preprocessed image tensor ready for inference,
|
||||||
|
typically with shape (1, C, H, W).
|
||||||
|
"""
|
||||||
|
raise NotImplementedError('Subclasses must implement the preprocess method.')
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Postprocess raw model outputs into a segmentation mask.
|
||||||
|
|
||||||
|
This method takes the raw output from the model's inference and
|
||||||
|
converts it into a segmentation mask at the original image size.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
outputs (np.ndarray): Raw outputs from the model inference.
|
||||||
|
original_size (Tuple[int, int]): Original image size (width, height).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Segmentation mask with the same size as the original image.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError('Subclasses must implement the postprocess method.')
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def parse(self, face_image: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Perform end-to-end face parsing on a face image.
|
||||||
|
|
||||||
|
This method orchestrates the full pipeline: preprocessing the input,
|
||||||
|
running inference, and postprocessing to return the segmentation mask.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
face_image (np.ndarray): A face image in BGR format.
|
||||||
|
The face should be roughly centered and
|
||||||
|
well-framed within the image.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Segmentation mask with the same size as input image,
|
||||||
|
where each pixel value represents a facial component class.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> parser = create_face_parser()
|
||||||
|
>>> mask = parser.parse(face_crop)
|
||||||
|
>>> print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
|
||||||
|
"""
|
||||||
|
raise NotImplementedError('Subclasses must implement the parse method.')
|
||||||
|
|
||||||
|
def __call__(self, face_image: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Provides a convenient, callable shortcut for the `parse` method.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
face_image (np.ndarray): A face image in BGR format.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Segmentation mask with the same size as input image.
|
||||||
|
"""
|
||||||
|
return self.parse(face_image)
|
||||||
166
uniface/parsing/bisenet.py
Normal file
166
uniface/parsing/bisenet.py
Normal file
@@ -0,0 +1,166 @@
|
|||||||
|
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||||
|
# Author: Yakhyokhuja Valikhujaev
|
||||||
|
# GitHub: https://github.com/yakhyo
|
||||||
|
|
||||||
|
from typing import Tuple
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from uniface.constants import ParsingWeights
|
||||||
|
from uniface.log import Logger
|
||||||
|
from uniface.model_store import verify_model_weights
|
||||||
|
from uniface.onnx_utils import create_onnx_session
|
||||||
|
|
||||||
|
from .base import BaseFaceParser
|
||||||
|
|
||||||
|
__all__ = ['BiSeNet']
|
||||||
|
|
||||||
|
|
||||||
|
class BiSeNet(BaseFaceParser):
|
||||||
|
"""
|
||||||
|
BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
|
||||||
|
|
||||||
|
BiSeNet is a semantic segmentation model that segments a face image into
|
||||||
|
different facial components such as skin, eyes, nose, mouth, hair, etc. The model
|
||||||
|
uses a BiSeNet architecture with ResNet backbone and outputs a segmentation mask
|
||||||
|
where each pixel is assigned a class label.
|
||||||
|
|
||||||
|
The model supports 19 facial component classes including:
|
||||||
|
- Background, skin, eyebrows, eyes, nose, mouth, lips, ears, hair, etc.
|
||||||
|
|
||||||
|
Reference:
|
||||||
|
https://github.com/yakhyo/face-parsing
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name (ParsingWeights): The enum specifying the parsing model to load.
|
||||||
|
Options: RESNET18, RESNET34.
|
||||||
|
Defaults to `ParsingWeights.RESNET18`.
|
||||||
|
input_size (Tuple[int, int]): The resolution (width, height) for the model's
|
||||||
|
input. Defaults to (512, 512).
|
||||||
|
|
||||||
|
Attributes:
|
||||||
|
input_size (Tuple[int, int]): Model input dimensions.
|
||||||
|
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
|
||||||
|
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> from uniface.parsing import BiSeNet
|
||||||
|
>>> from uniface import RetinaFace
|
||||||
|
>>>
|
||||||
|
>>> detector = RetinaFace()
|
||||||
|
>>> parser = BiSeNet()
|
||||||
|
>>>
|
||||||
|
>>> # Detect faces and parse each face
|
||||||
|
>>> faces = detector.detect(image)
|
||||||
|
>>> for face in faces:
|
||||||
|
... bbox = face['bbox']
|
||||||
|
... x1, y1, x2, y2 = map(int, bbox[:4])
|
||||||
|
... face_crop = image[y1:y2, x1:x2]
|
||||||
|
... mask = parser.parse(face_crop)
|
||||||
|
... print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
model_name: ParsingWeights = ParsingWeights.RESNET18,
|
||||||
|
input_size: Tuple[int, int] = (512, 512),
|
||||||
|
) -> None:
|
||||||
|
Logger.info(f'Initializing BiSeNet with model={model_name}, input_size={input_size}')
|
||||||
|
|
||||||
|
self.input_size = input_size
|
||||||
|
self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
|
||||||
|
self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
|
||||||
|
|
||||||
|
self.model_path = verify_model_weights(model_name)
|
||||||
|
self._initialize_model()
|
||||||
|
|
||||||
|
def _initialize_model(self) -> None:
|
||||||
|
"""
|
||||||
|
Initialize the ONNX model from the stored model path.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
RuntimeError: If the model fails to load or initialize.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
self.session = create_onnx_session(self.model_path)
|
||||||
|
|
||||||
|
# Get input configuration
|
||||||
|
input_cfg = self.session.get_inputs()[0]
|
||||||
|
input_shape = input_cfg.shape
|
||||||
|
self.input_name = input_cfg.name
|
||||||
|
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
|
||||||
|
|
||||||
|
# Get output configuration
|
||||||
|
outputs = self.session.get_outputs()
|
||||||
|
self.output_names = [output.name for output in outputs]
|
||||||
|
|
||||||
|
Logger.info(f'BiSeNet initialized with input size {self.input_size}')
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
Logger.error(f"Failed to load parsing model from '{self.model_path}'", exc_info=True)
|
||||||
|
raise RuntimeError(f'Failed to initialize parsing model: {e}') from e
|
||||||
|
|
||||||
|
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Preprocess a face image for parsing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
face_image (np.ndarray): A face image in BGR format.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
|
||||||
|
"""
|
||||||
|
# Convert BGR to RGB
|
||||||
|
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||||
|
|
||||||
|
# Resize to model input size
|
||||||
|
image = cv2.resize(image, self.input_size, interpolation=cv2.INTER_LINEAR)
|
||||||
|
|
||||||
|
# Normalize to [0, 1] and apply normalization
|
||||||
|
image = image.astype(np.float32) / 255.0
|
||||||
|
image = (image - self.input_mean) / self.input_std
|
||||||
|
|
||||||
|
# HWC -> CHW -> NCHW
|
||||||
|
image = np.transpose(image, (2, 0, 1))
|
||||||
|
image = np.expand_dims(image, axis=0).astype(np.float32)
|
||||||
|
|
||||||
|
return image
|
||||||
|
|
||||||
|
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Postprocess model output to segmentation mask.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
outputs (np.ndarray): Raw model output.
|
||||||
|
original_size (Tuple[int, int]): Original image size (width, height).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Segmentation mask resized to original dimensions.
|
||||||
|
"""
|
||||||
|
# Get the class with highest probability for each pixel
|
||||||
|
predicted_mask = outputs.squeeze(0).argmax(0).astype(np.uint8)
|
||||||
|
|
||||||
|
# Resize back to original size
|
||||||
|
restored_mask = cv2.resize(predicted_mask, original_size, interpolation=cv2.INTER_NEAREST)
|
||||||
|
|
||||||
|
return restored_mask
|
||||||
|
|
||||||
|
def parse(self, face_image: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Perform end-to-end face parsing on a face image.
|
||||||
|
|
||||||
|
This method orchestrates the full pipeline: preprocessing the input,
|
||||||
|
running inference, and postprocessing to return the segmentation mask.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
face_image (np.ndarray): A face image in BGR format.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Segmentation mask with the same size as input image.
|
||||||
|
"""
|
||||||
|
original_size = (face_image.shape[1], face_image.shape[0]) # (width, height)
|
||||||
|
input_tensor = self.preprocess(face_image)
|
||||||
|
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
|
||||||
|
|
||||||
|
return self.postprocess(outputs[0], original_size)
|
||||||
@@ -7,6 +7,52 @@ from typing import List, Tuple, Union
|
|||||||
import cv2
|
import cv2
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
|
# Face parsing component names (19 classes)
|
||||||
|
FACE_PARSING_LABELS = [
|
||||||
|
'background',
|
||||||
|
'skin',
|
||||||
|
'l_brow',
|
||||||
|
'r_brow',
|
||||||
|
'l_eye',
|
||||||
|
'r_eye',
|
||||||
|
'eye_g',
|
||||||
|
'l_ear',
|
||||||
|
'r_ear',
|
||||||
|
'ear_r',
|
||||||
|
'nose',
|
||||||
|
'mouth',
|
||||||
|
'u_lip',
|
||||||
|
'l_lip',
|
||||||
|
'neck',
|
||||||
|
'neck_l',
|
||||||
|
'cloth',
|
||||||
|
'hair',
|
||||||
|
'hat',
|
||||||
|
]
|
||||||
|
|
||||||
|
# Color palette for face parsing visualization
|
||||||
|
FACE_PARSING_COLORS = [
|
||||||
|
[0, 0, 0],
|
||||||
|
[255, 85, 0],
|
||||||
|
[255, 170, 0],
|
||||||
|
[255, 0, 85],
|
||||||
|
[255, 0, 170],
|
||||||
|
[0, 255, 0],
|
||||||
|
[85, 255, 0],
|
||||||
|
[170, 255, 0],
|
||||||
|
[0, 255, 85],
|
||||||
|
[0, 255, 170],
|
||||||
|
[0, 0, 255],
|
||||||
|
[85, 0, 255],
|
||||||
|
[170, 0, 255],
|
||||||
|
[0, 85, 255],
|
||||||
|
[0, 170, 255],
|
||||||
|
[255, 255, 0],
|
||||||
|
[255, 255, 85],
|
||||||
|
[255, 255, 170],
|
||||||
|
[255, 0, 255],
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
def draw_detections(
|
def draw_detections(
|
||||||
*,
|
*,
|
||||||
@@ -220,3 +266,65 @@ def draw_gaze(
|
|||||||
(255, 255, 255),
|
(255, 255, 255),
|
||||||
font_thickness,
|
font_thickness,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def vis_parsing_maps(
|
||||||
|
image: np.ndarray,
|
||||||
|
segmentation_mask: np.ndarray,
|
||||||
|
*,
|
||||||
|
save_image: bool = False,
|
||||||
|
save_path: str = 'result.png',
|
||||||
|
) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Visualizes face parsing segmentation mask by overlaying colored regions on the image.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image: Input face image in RGB format with shape (H, W, 3).
|
||||||
|
segmentation_mask: Segmentation mask with shape (H, W) where each pixel
|
||||||
|
value represents a facial component class (0-18).
|
||||||
|
save_image: Whether to save the visualization to disk. Defaults to False.
|
||||||
|
save_path: Path to save the visualization if save_image is True.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
np.ndarray: Blended image with segmentation overlay in BGR format.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> import cv2
|
||||||
|
>>> from uniface.parsing import BiSeNet
|
||||||
|
>>> from uniface.visualization import vis_parsing_maps
|
||||||
|
>>>
|
||||||
|
>>> parser = BiSeNet()
|
||||||
|
>>> face_image = cv2.imread('face.jpg')
|
||||||
|
>>> mask = parser.parse(face_image)
|
||||||
|
>>>
|
||||||
|
>>> # Visualize
|
||||||
|
>>> face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||||
|
>>> result = vis_parsing_maps(face_rgb, mask)
|
||||||
|
>>> cv2.imwrite('parsed_face.jpg', result)
|
||||||
|
"""
|
||||||
|
# Create numpy arrays for image and segmentation mask
|
||||||
|
image = np.array(image).copy().astype(np.uint8)
|
||||||
|
segmentation_mask = segmentation_mask.copy().astype(np.uint8)
|
||||||
|
|
||||||
|
# Create a color mask
|
||||||
|
segmentation_mask_color = np.zeros((segmentation_mask.shape[0], segmentation_mask.shape[1], 3))
|
||||||
|
|
||||||
|
num_classes = np.max(segmentation_mask)
|
||||||
|
|
||||||
|
for class_index in range(1, num_classes + 1):
|
||||||
|
class_pixels = np.where(segmentation_mask == class_index)
|
||||||
|
segmentation_mask_color[class_pixels[0], class_pixels[1], :] = FACE_PARSING_COLORS[class_index]
|
||||||
|
|
||||||
|
segmentation_mask_color = segmentation_mask_color.astype(np.uint8)
|
||||||
|
|
||||||
|
# Convert image to BGR format for blending
|
||||||
|
bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
|
||||||
|
|
||||||
|
# Blend the image with the segmentation mask
|
||||||
|
blended_image = cv2.addWeighted(bgr_image, 0.6, segmentation_mask_color, 0.4, 0)
|
||||||
|
|
||||||
|
# Save the result if required
|
||||||
|
if save_image:
|
||||||
|
cv2.imwrite(save_path, blended_image, [int(cv2.IMWRITE_JPEG_QUALITY), 100])
|
||||||
|
|
||||||
|
return blended_image
|
||||||
|
|||||||
Reference in New Issue
Block a user