feat: Add Face Parsing model BiSeNet model trained on CelebMask dataset (#35)

* Add BiSeNet face parsing implementation * Add parsing model weights configuration * Export BiSeNet in main package * Add face parsing tests * Add face parsing examples and script * Bump version to 1.5.0 * Update documentation for face parsing * Fix face parsing notebook to use lips instead of mouth * chore: Update the face parsing example * fix: Fix model argument to use Enum * ref: Move vis_parsing_map function into visualization.py * docs: Update README.md
2025-12-30 09:02:25 +00:00 · 2025-12-14 21:13:53 +09:00
parent 4d1921e531
commit 54b769c0f1
13 changed files with 1267 additions and 7 deletions
--- a/MODELS.md
+++ b/MODELS.md
@@ -332,6 +332,78 @@ print(f"Pitch: {np.degrees(pitch):.1f}°, Yaw: {np.degrees(yaw):.1f}°")
 ---
 ## Face Parsing Models
 ### BiSeNet Family
 BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
 | Model Name     | Params | Size    | Classes | Use Case                      |
 | -------------- | ------ | ------- | ------- | ----------------------------- |
 | `RESNET18` ⭐ | 13.3M  | 50.7 MB | 19      | **Recommended default** |
 | `RESNET34`   | 24.1M  | 89.2 MB | 19      | Higher accuracy               |
 **19 Facial Component Classes:**
 1. Background
 2. Skin
 3. Left Eyebrow
 4. Right Eyebrow
 5. Left Eye
 6. Right Eye
 7. Eye Glasses
 8. Left Ear
 9. Right Ear
 10. Ear Ring
 11. Nose
 12. Mouth
 13. Upper Lip
 14. Lower Lip
 15. Neck
 16. Neck Lace
 17. Cloth
 18. Hair
 19. Hat
 **Dataset**: Trained on CelebAMask-HQ
 **Architecture**: BiSeNet with ResNet backbone
 **Input Size**: 512×512 (automatically resized)
 #### Usage
 ```python
 from uniface.parsing import BiSeNet
 from uniface.constants import ParsingWeights
 from uniface.visualization import vis_parsing_maps
 import cv2
 # Default (recommended)
 parser = BiSeNet()  # Uses RESNET18
 # Higher accuracy model
 parser = BiSeNet(model_name=ParsingWeights.RESNET34)
 # Parse face image (already cropped)
 mask = parser.parse(face_image)
 # Visualize with overlay
 face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
 vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
 # mask shape: (H, W) with values 0-18 representing classes
 print(f"Detected {len(np.unique(mask))} facial components")
 ```
 **Applications:**
 - Face makeup and beauty applications
 - Virtual try-on systems
 - Face editing and manipulation
 - Facial feature extraction
 - Portrait segmentation
 **Note**: Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
 ---
 ## Model Updates
 Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
@@ -372,6 +444,7 @@ python scripts/download_model.py --model MNET_V2
 - **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
 - **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
 - **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
 - **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
 - **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
 ### Papers
@@ -381,3 +454,4 @@ python scripts/download_model.py --model MNET_V2
 - **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
 - **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
 - **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
 - **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -285,7 +285,50 @@ Face 2: pitch=-8.1°, yaw=15.7°
 ---
-## 8. Batch Processing (3 minutes)
+## 8. Face Parsing (2 minutes)
 Segment face into semantic components (skin, eyes, nose, mouth, hair, etc.):
 ```python
 import cv2
 import numpy as np
 from uniface.parsing import BiSeNet
 from uniface.visualization import vis_parsing_maps
 # Initialize parser
 parser = BiSeNet()  # Uses ResNet18 by default
 # Load face image (already cropped)
 face_image = cv2.imread("face.jpg")
 # Parse face into 19 components
 mask = parser.parse(face_image)
 # Visualize with overlay
 face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
 vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
 # Convert back to BGR for saving
 vis_bgr = cv2.cvtColor(vis_result, cv2.COLOR_RGB2BGR)
 cv2.imwrite("parsed_face.jpg", vis_bgr)
 print(f"Detected {len(np.unique(mask))} facial components")
 ```
 **Output:**
 ```
 Detected 12 facial components
 ```
 **19 Facial Component Classes:**
 - Background, Skin, Eyebrows (L/R), Eyes (L/R), Eye Glasses
 - Ears (L/R), Ear Ring, Nose, Mouth, Lips (Upper/Lower)
 - Neck, Neck Lace, Cloth, Hair, Hat
 ---
 ## 9. Batch Processing (3 minutes)
 Process multiple images:
@@ -318,7 +361,7 @@ print("Done!")
 ---
-## 9. Model Selection
+## 10. Model Selection
 Choose the right model for your use case:
@@ -385,6 +428,19 @@ gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
 gaze_estimator = MobileGaze(model_name=GazeWeights.RESNET50)
 ```
 ### Face Parsing Models
 ```python
 from uniface.parsing import BiSeNet
 from uniface.constants import ParsingWeights
 # Default (recommended, 50.7 MB)
 parser = BiSeNet()  # Uses RESNET18
 # Higher accuracy (89.2 MB)
 parser = BiSeNet(model_name=ParsingWeights.RESNET34)
 ```
 ---
 ## Common Issues
@@ -446,6 +502,8 @@ Explore interactive examples for common tasks:
 | **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
 | **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
 | **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
 | **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
 | **Gaze Estimation** | Estimate gaze direction | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
 ### Additional Resources
@@ -460,4 +518,5 @@ Explore interactive examples for common tasks:
 - **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference)
 - **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
 - **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)
 - **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
 - **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@
    <img src=".github/logos/logo_web.webp" width=75%>
 </div>
-**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, and attribute analysis with hardware acceleration support across platforms.
+**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, face parsing, gaze estimation, and attribute analysis with hardware acceleration support across platforms.
 ---
@@ -20,6 +20,7 @@
 - **High-Speed Face Detection**: ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models
 - **Facial Landmark Detection**: Accurate 106-point landmark localization
 - **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
 - **Face Parsing**: BiSeNet-based semantic segmentation with 19 facial component classes
 - **Gaze Estimation**: Real-time gaze direction prediction with MobileGaze
 - **Attribute Analysis**: Age, gender, and emotion detection
 - **Face Alignment**: Precise alignment for downstream tasks
@@ -176,6 +177,27 @@ for face in faces:
    draw_gaze(image, bbox, pitch, yaw)
 ```
 ### Face Parsing
 ```python
 from uniface.parsing import BiSeNet
 from uniface.visualization import vis_parsing_maps
 # Initialize parser
 parser = BiSeNet()  # Uses ResNet18 by default
 # Parse face image (already cropped)
 mask = parser.parse(face_image)
 # Visualize with overlay
 import cv2
 face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
 vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
 # mask contains 19 classes: skin, eyes, nose, mouth, hair, etc.
 print(f"Unique classes: {len(np.unique(mask))}")
 ```
 ---
 ## Documentation
@@ -282,6 +304,12 @@ faces = detect_faces(image, method='retinaface', conf_thresh=0.8)  # methods: re
 | ------------- | ------------------------------------------ | ------------------------------------ |
 | `MobileGaze` | `model_name=GazeWeights.RESNET34`       | Returns (pitch, yaw) angles in radians; trained on Gaze360 |
 **Face Parsing**
 | Class      | Key params (defaults)                    | Notes                                |
 | ---------- | ---------------------------------------- | ------------------------------------ |
 | `BiSeNet` | `model_name=ParsingWeights.RESNET18`, `input_size=(512, 512)` | 19 facial component classes; BiSeNet architecture with ResNet backbone |
 ---
 ## Model Performance
@@ -328,6 +356,7 @@ Interactive examples covering common face analysis tasks:
 | **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
 | **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
 | **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
 | **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
 | **Gaze Estimation** | Estimate gaze direction from face images | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
 ### Webcam Face Detection
@@ -519,6 +548,7 @@ uniface/
 │   ├── detection/       # Face detection models
 │   ├── recognition/     # Face recognition models
 │   ├── landmark/        # Landmark detection
 │   ├── parsing/         # Face parsing
 │   ├── gaze/            # Gaze estimation
 │   ├── attribute/       # Age, gender, emotion
 │   ├── onnx_utils.py    # ONNX Runtime utilities
@@ -536,6 +566,7 @@ uniface/
 - **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
 - **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
 - **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
 - **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet face parsing training code and pretrained weights
 - **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
 - **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
--- a/examples/face_parsing.ipynb
+++ b/examples/face_parsing.ipynb
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "uniface"
-version = "1.4.0"
+version = "1.5.0"
-description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Gaze Estimation, Age, and Gender Detection"
+description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
 readme = "README.md"
 license = { text = "MIT" }
 authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
@@ -14,6 +14,8 @@ keywords = [
    "face-detection",
    "face-recognition",
    "facial-landmarks",
    "face-parsing",
    "face-segmentation",
    "gaze-estimation",
    "age-detection",
    "gender-detection",
@@ -22,6 +24,7 @@ keywords = [
    "onnx",
    "onnxruntime",
    "face-analysis",
    "bisenet",
 ]
 classifiers = [
--- a/scripts/run_face_parsing.py
+++ b/scripts/run_face_parsing.py
@@ -0,0 +1,126 @@
 # Face parsing on detected faces
 # Usage: python run_face_parsing.py --image path/to/image.jpg
 #        python run_face_parsing.py --webcam
 import argparse
 import os
 from pathlib import Path
 import cv2
 from uniface import RetinaFace
 from uniface.constants import ParsingWeights
 from uniface.parsing import BiSeNet
 from uniface.visualization import vis_parsing_maps
 def process_image(detector, parser, image_path: str, save_dir: str = 'outputs'):
    image = cv2.imread(image_path)
    if image is None:
        print(f"Error: Failed to load image from '{image_path}'")
        return
    faces = detector.detect(image)
    print(f'Detected {len(faces)} face(s)')
    result_image = image.copy()
    for i, face in enumerate(faces):
        bbox = face['bbox']
        x1, y1, x2, y2 = map(int, bbox[:4])
        face_crop = image[y1:y2, x1:x2]
        if face_crop.size == 0:
            continue
        # Parse the face
        mask = parser.parse(face_crop)
        print(f'  Face {i + 1}: parsed with {len(set(mask.flatten()))} unique classes')
        # Visualize the parsing result
        face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
        vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
        # Place the visualization back on the original image
        result_image[y1:y2, x1:x2] = vis_result
        # Draw bounding box
        cv2.rectangle(result_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
    os.makedirs(save_dir, exist_ok=True)
    output_path = os.path.join(save_dir, f'{Path(image_path).stem}_parsing.jpg')
    cv2.imwrite(output_path, result_image)
    print(f'Output saved: {output_path}')
 def run_webcam(detector, parser):
    cap = cv2.VideoCapture(0)
    if not cap.isOpened():
        print('Cannot open webcam')
        return
    print("Press 'q' to quit")
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        frame = cv2.flip(frame, 1)
        faces = detector.detect(frame)
        for face in faces:
            bbox = face['bbox']
            x1, y1, x2, y2 = map(int, bbox[:4])
            face_crop = frame[y1:y2, x1:x2]
            if face_crop.size == 0:
                continue
            # Parse the face
            mask = parser.parse(face_crop)
            # Visualize the parsing result
            face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
            vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
            # Place the visualization back on the frame
            frame[y1:y2, x1:x2] = vis_result
            # Draw bounding box
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        cv2.imshow('Face Parsing', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    cap.release()
    cv2.destroyAllWindows()
 def main():
    parser_arg = argparse.ArgumentParser(description='Run face parsing')
    parser_arg.add_argument('--image', type=str, help='Path to input image')
    parser_arg.add_argument('--webcam', action='store_true', help='Use webcam')
    parser_arg.add_argument('--save_dir', type=str, default='outputs')
    parser_arg.add_argument(
        '--model', type=str, default=ParsingWeights.RESNET18, choices=[ParsingWeights.RESNET18, ParsingWeights.RESNET34]
    )
    args = parser_arg.parse_args()
    if not args.image and not args.webcam:
        parser_arg.error('Either --image or --webcam must be specified')
    detector = RetinaFace()
    parser = BiSeNet(model_name=ParsingWeights.RESNET34)
    if args.webcam:
        run_webcam(detector, parser)
    else:
        process_image(detector, parser, args.image, args.save_dir)
 if __name__ == '__main__':
    main()
--- a/tests/test_parsing.py
+++ b/tests/test_parsing.py
@@ -0,0 +1,118 @@
 # Copyright 2025 Yakhyokhuja Valikhujaev
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo
 import numpy as np
 import pytest
 from uniface.constants import ParsingWeights
 from uniface.parsing import BiSeNet, create_face_parser
 def test_bisenet_initialization():
    """Test BiSeNet initialization."""
    parser = BiSeNet()
    assert parser is not None
    assert parser.input_size == (512, 512)
 def test_bisenet_with_different_models():
    """Test BiSeNet with different model weights."""
    parser_resnet18 = BiSeNet(model_name=ParsingWeights.RESNET18)
    parser_resnet34 = BiSeNet(model_name=ParsingWeights.RESNET34)
    assert parser_resnet18 is not None
    assert parser_resnet34 is not None
 def test_bisenet_preprocess():
    """Test preprocessing."""
    parser = BiSeNet()
    # Create a dummy face image
    face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
    # Preprocess
    preprocessed = parser.preprocess(face_image)
    assert preprocessed.shape == (1, 3, 512, 512)
    assert preprocessed.dtype == np.float32
 def test_bisenet_postprocess():
    """Test postprocessing."""
    parser = BiSeNet()
    # Create dummy model output (batch_size=1, num_classes=19, H=512, W=512)
    dummy_output = np.random.randn(1, 19, 512, 512).astype(np.float32)
    # Postprocess
    mask = parser.postprocess(dummy_output, original_size=(256, 256))
    assert mask.shape == (256, 256)
    assert mask.dtype == np.uint8
    assert mask.min() >= 0
    assert mask.max() < 19  # 19 classes (0-18)
 def test_bisenet_parse():
    """Test end-to-end parsing."""
    parser = BiSeNet()
    # Create a dummy face image
    face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
    # Parse
    mask = parser.parse(face_image)
    assert mask.shape == (256, 256)
    assert mask.dtype == np.uint8
    assert mask.min() >= 0
    assert mask.max() < 19
 def test_bisenet_callable():
    """Test that BiSeNet is callable."""
    parser = BiSeNet()
    face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
    # Should work as callable
    mask = parser(face_image)
    assert mask.shape == (256, 256)
    assert mask.dtype == np.uint8
 def test_create_face_parser_with_enum():
    """Test factory function with enum."""
    parser = create_face_parser(ParsingWeights.RESNET18)
    assert parser is not None
    assert isinstance(parser, BiSeNet)
 def test_create_face_parser_with_string():
    """Test factory function with string."""
    parser = create_face_parser('parsing_resnet18')
    assert parser is not None
    assert isinstance(parser, BiSeNet)
 def test_create_face_parser_invalid_model():
    """Test factory function with invalid model name."""
    with pytest.raises(ValueError, match='Unknown face parsing model'):
        create_face_parser('invalid_model')
 def test_bisenet_different_input_sizes():
    """Test parsing with different input image sizes."""
    parser = BiSeNet()
    # Test with different sizes
    sizes = [(128, 128), (256, 256), (512, 512), (640, 480)]
    for h, w in sizes:
        face_image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
        mask = parser.parse(face_image)
        assert mask.shape == (h, w), f'Failed for size {h}x{w}'
        assert mask.dtype == np.uint8
--- a/uniface/init.py
+++ b/uniface/init.py
@@ -13,13 +13,13 @@
 __license__ = 'MIT'
 __author__ = 'Yakhyokhuja Valikhujaev'
-__version__ = '1.4.0'
+__version__ = '1.5.0'
 from uniface.face_utils import compute_similarity, face_alignment
 from uniface.log import Logger, enable_logging
 from uniface.model_store import verify_model_weights
-from uniface.visualization import draw_detections
+from uniface.visualization import draw_detections, vis_parsing_maps
 from .analyzer import FaceAnalyzer
 from .attribute import AgeGender
@@ -39,6 +39,7 @@ from .detection import (
 )
 from .gaze import MobileGaze, create_gaze_estimator
 from .landmark import Landmark106, create_landmarker
 from .parsing import BiSeNet, create_face_parser
 from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
 __all__ = [
@@ -50,6 +51,7 @@ __all__ = [
    'FaceAnalyzer',
    # Factory functions
    'create_detector',
    'create_face_parser',
    'create_gaze_estimator',
    'create_landmarker',
    'create_recognizer',
@@ -67,12 +69,15 @@ __all__ = [
    'Landmark106',
    # Gaze models
    'MobileGaze',
    # Parsing models
    'BiSeNet',
    # Attribute models
    'AgeGender',
    'Emotion',
    # Utilities
    'compute_similarity',
    'draw_detections',
    'vis_parsing_maps',
    'face_alignment',
    'verify_model_weights',
    'Logger',
--- a/uniface/constants.py
+++ b/uniface/constants.py
@@ -109,6 +109,16 @@ class GazeWeights(str, Enum):
    MOBILEONE_S0 = "gaze_mobileone_s0"
 class ParsingWeights(str, Enum):
    """
    Face Parsing: Semantic Segmentation of Facial Components.
    Trained on CelebAMask-HQ dataset.
    https://github.com/yakhyo/face-parsing
    """
    RESNET18 = "parsing_resnet18"
    RESNET34 = "parsing_resnet34"
 MODEL_URLS: Dict[Enum, str] = {
    # RetinaFace
    RetinaFaceWeights.MNET_025:      'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.25.onnx',
@@ -148,6 +158,9 @@ MODEL_URLS: Dict[Enum, str] = {
    GazeWeights.RESNET50:            'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet50_gaze.onnx',
    GazeWeights.MOBILENET_V2:        'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobilenetv2_gaze.onnx',
    GazeWeights.MOBILEONE_S0:        'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobileone_s0_gaze.onnx',
    # Parsing
    ParsingWeights.RESNET18:         'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',
    ParsingWeights.RESNET34:         'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet34.onnx',
 }
 MODEL_SHA256: Dict[Enum, str] = {
@@ -189,6 +202,9 @@ MODEL_SHA256: Dict[Enum, str] = {
    GazeWeights.RESNET50:            'e1eaf98f5ec7c89c6abe7cfe39f7be83e747163f98d1ff945c0603b3c521be22',
    GazeWeights.MOBILENET_V2:        'fdcdb84e3e6421b5a79e8f95139f249fc258d7f387eed5ddac2b80a9a15ce076',
    GazeWeights.MOBILEONE_S0:        'c0b5a4f4a0ffd24f76ab3c1452354bb2f60110899fd9a88b464c75bafec0fde8',
    # Face Parsing
    ParsingWeights.RESNET18:         '0d9bd318e46987c3bdbfacae9e2c0f461cae1c6ac6ea6d43bbe541a91727e33f',
    ParsingWeights.RESNET34:         '5b805bba7b5660ab7070b5a381dcf75e5b3e04199f1e9387232a77a00095102e',
 }
 CHUNK_SIZE = 8192
--- a/uniface/parsing/init.py
+++ b/uniface/parsing/init.py
@@ -0,0 +1,61 @@
 # Copyright 2025 Yakhyokhuja Valikhujaev
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo
 from typing import Union
 from uniface.constants import ParsingWeights
 from .base import BaseFaceParser
 from .bisenet import BiSeNet
 __all__ = ['BaseFaceParser', 'BiSeNet', 'create_face_parser']
 def create_face_parser(
    model_name: Union[str, ParsingWeights] = ParsingWeights.RESNET18,
 ) -> BaseFaceParser:
    """
    Factory function to create a face parsing model instance.
    This function provides a convenient way to instantiate face parsing models
    without directly importing the specific model classes. It supports both
    string-based and enum-based model selection.
    Args:
        model_name (Union[str, ParsingWeights]): The face parsing model to create.
            Can be either a string or a ParsingWeights enum value.
            Available options:
            - 'parsing_resnet18' or ParsingWeights.RESNET18 (default)
            - 'parsing_resnet34' or ParsingWeights.RESNET34
    Returns:
        BaseFaceParser: An instance of the requested face parsing model.
    Raises:
        ValueError: If the model_name is not recognized.
    Examples:
        >>> # Using enum
        >>> from uniface.parsing import create_face_parser
        >>> from uniface.constants import ParsingWeights
        >>> parser = create_face_parser(ParsingWeights.RESNET18)
        >>>
        >>> # Using string
        >>> parser = create_face_parser('parsing_resnet18')
        >>>
        >>> # Parse a face image
        >>> mask = parser.parse(face_crop)
    """
    # Convert string to enum if necessary
    if isinstance(model_name, str):
        try:
            model_name = ParsingWeights(model_name)
        except ValueError as e:
            valid_models = [e.value for e in ParsingWeights]
            raise ValueError(
                f"Unknown face parsing model: '{model_name}'. Valid options are: {', '.join(valid_models)}"
            ) from e
    # All parsing models use the same BiSeNet class
    return BiSeNet(model_name=model_name)
--- a/uniface/parsing/base.py
+++ b/uniface/parsing/base.py
@@ -0,0 +1,106 @@
 # Copyright 2025 Yakhyokhuja Valikhujaev
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo
 from abc import ABC, abstractmethod
 from typing import Tuple
 import numpy as np
 class BaseFaceParser(ABC):
    """
    Abstract base class for all face parsing models.
    This class defines the common interface that all face parsing models must implement,
    ensuring consistency across different parsing methods. Face parsing segments a face
    image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
    The output is a segmentation mask where each pixel is assigned a class label
    representing a facial component.
    """
    @abstractmethod
    def _initialize_model(self) -> None:
        """
        Initialize the underlying model for inference.
        This method should handle loading model weights, creating the
        inference session (e.g., ONNX Runtime), and any necessary
        setup procedures to prepare the model for prediction.
        Raises:
            RuntimeError: If the model fails to load or initialize.
        """
        raise NotImplementedError('Subclasses must implement the _initialize_model method.')
    @abstractmethod
    def preprocess(self, face_image: np.ndarray) -> np.ndarray:
        """
        Preprocess the input face image for model inference.
        This method should take a raw face crop and convert it into the format
        expected by the model's inference engine (e.g., normalized tensor).
        Args:
            face_image (np.ndarray): A face image in BGR format with
                                     shape (H, W, C).
        Returns:
            np.ndarray: The preprocessed image tensor ready for inference,
                        typically with shape (1, C, H, W).
        """
        raise NotImplementedError('Subclasses must implement the preprocess method.')
    @abstractmethod
    def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
        """
        Postprocess raw model outputs into a segmentation mask.
        This method takes the raw output from the model's inference and
        converts it into a segmentation mask at the original image size.
        Args:
            outputs (np.ndarray): Raw outputs from the model inference.
            original_size (Tuple[int, int]): Original image size (width, height).
        Returns:
            np.ndarray: Segmentation mask with the same size as the original image.
        """
        raise NotImplementedError('Subclasses must implement the postprocess method.')
    @abstractmethod
    def parse(self, face_image: np.ndarray) -> np.ndarray:
        """
        Perform end-to-end face parsing on a face image.
        This method orchestrates the full pipeline: preprocessing the input,
        running inference, and postprocessing to return the segmentation mask.
        Args:
            face_image (np.ndarray): A face image in BGR format.
                                     The face should be roughly centered and
                                     well-framed within the image.
        Returns:
            np.ndarray: Segmentation mask with the same size as input image,
                       where each pixel value represents a facial component class.
        Example:
            >>> parser = create_face_parser()
            >>> mask = parser.parse(face_crop)
            >>> print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
        """
        raise NotImplementedError('Subclasses must implement the parse method.')
    def __call__(self, face_image: np.ndarray) -> np.ndarray:
        """
        Provides a convenient, callable shortcut for the `parse` method.
        Args:
            face_image (np.ndarray): A face image in BGR format.
        Returns:
            np.ndarray: Segmentation mask with the same size as input image.
        """
        return self.parse(face_image)
--- a/uniface/parsing/bisenet.py
+++ b/uniface/parsing/bisenet.py
@@ -0,0 +1,166 @@
 # Copyright 2025 Yakhyokhuja Valikhujaev
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo
 from typing import Tuple
 import cv2
 import numpy as np
 from uniface.constants import ParsingWeights
 from uniface.log import Logger
 from uniface.model_store import verify_model_weights
 from uniface.onnx_utils import create_onnx_session
 from .base import BaseFaceParser
 __all__ = ['BiSeNet']
 class BiSeNet(BaseFaceParser):
    """
    BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
    BiSeNet is a semantic segmentation model that segments a face image into
    different facial components such as skin, eyes, nose, mouth, hair, etc. The model
    uses a BiSeNet architecture with ResNet backbone and outputs a segmentation mask
    where each pixel is assigned a class label.
    The model supports 19 facial component classes including:
    - Background, skin, eyebrows, eyes, nose, mouth, lips, ears, hair, etc.
    Reference:
        https://github.com/yakhyo/face-parsing
    Args:
        model_name (ParsingWeights): The enum specifying the parsing model to load.
            Options: RESNET18, RESNET34.
            Defaults to `ParsingWeights.RESNET18`.
        input_size (Tuple[int, int]): The resolution (width, height) for the model's
            input. Defaults to (512, 512).
    Attributes:
        input_size (Tuple[int, int]): Model input dimensions.
        input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
        input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
    Example:
        >>> from uniface.parsing import BiSeNet
        >>> from uniface import RetinaFace
        >>>
        >>> detector = RetinaFace()
        >>> parser = BiSeNet()
        >>>
        >>> # Detect faces and parse each face
        >>> faces = detector.detect(image)
        >>> for face in faces:
        ...     bbox = face['bbox']
        ...     x1, y1, x2, y2 = map(int, bbox[:4])
        ...     face_crop = image[y1:y2, x1:x2]
        ...     mask = parser.parse(face_crop)
        ...     print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
    """
    def __init__(
        self,
        model_name: ParsingWeights = ParsingWeights.RESNET18,
        input_size: Tuple[int, int] = (512, 512),
    ) -> None:
        Logger.info(f'Initializing BiSeNet with model={model_name}, input_size={input_size}')
        self.input_size = input_size
        self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
        self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
        self.model_path = verify_model_weights(model_name)
        self._initialize_model()
    def _initialize_model(self) -> None:
        """
        Initialize the ONNX model from the stored model path.
        Raises:
            RuntimeError: If the model fails to load or initialize.
        """
        try:
            self.session = create_onnx_session(self.model_path)
            # Get input configuration
            input_cfg = self.session.get_inputs()[0]
            input_shape = input_cfg.shape
            self.input_name = input_cfg.name
            self.input_size = tuple(input_shape[2:4][::-1])  # Update from model
            # Get output configuration
            outputs = self.session.get_outputs()
            self.output_names = [output.name for output in outputs]
            Logger.info(f'BiSeNet initialized with input size {self.input_size}')
        except Exception as e:
            Logger.error(f"Failed to load parsing model from '{self.model_path}'", exc_info=True)
            raise RuntimeError(f'Failed to initialize parsing model: {e}') from e
    def preprocess(self, face_image: np.ndarray) -> np.ndarray:
        """
        Preprocess a face image for parsing.
        Args:
            face_image (np.ndarray): A face image in BGR format.
        Returns:
            np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
        """
        # Convert BGR to RGB
        image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
        # Resize to model input size
        image = cv2.resize(image, self.input_size, interpolation=cv2.INTER_LINEAR)
        # Normalize to [0, 1] and apply normalization
        image = image.astype(np.float32) / 255.0
        image = (image - self.input_mean) / self.input_std
        # HWC -> CHW -> NCHW
        image = np.transpose(image, (2, 0, 1))
        image = np.expand_dims(image, axis=0).astype(np.float32)
        return image
    def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
        """
        Postprocess model output to segmentation mask.
        Args:
            outputs (np.ndarray): Raw model output.
            original_size (Tuple[int, int]): Original image size (width, height).
        Returns:
            np.ndarray: Segmentation mask resized to original dimensions.
        """
        # Get the class with highest probability for each pixel
        predicted_mask = outputs.squeeze(0).argmax(0).astype(np.uint8)
        # Resize back to original size
        restored_mask = cv2.resize(predicted_mask, original_size, interpolation=cv2.INTER_NEAREST)
        return restored_mask
    def parse(self, face_image: np.ndarray) -> np.ndarray:
        """
        Perform end-to-end face parsing on a face image.
        This method orchestrates the full pipeline: preprocessing the input,
        running inference, and postprocessing to return the segmentation mask.
        Args:
            face_image (np.ndarray): A face image in BGR format.
        Returns:
            np.ndarray: Segmentation mask with the same size as input image.
        """
        original_size = (face_image.shape[1], face_image.shape[0])  # (width, height)
        input_tensor = self.preprocess(face_image)
        outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
        return self.postprocess(outputs[0], original_size)
--- a/uniface/visualization.py
+++ b/uniface/visualization.py
@@ -7,6 +7,52 @@ from typing import List, Tuple, Union
 import cv2
 import numpy as np
 # Face parsing component names (19 classes)
 FACE_PARSING_LABELS = [
    'background',
    'skin',
    'l_brow',
    'r_brow',
    'l_eye',
    'r_eye',
    'eye_g',
    'l_ear',
    'r_ear',
    'ear_r',
    'nose',
    'mouth',
    'u_lip',
    'l_lip',
    'neck',
    'neck_l',
    'cloth',
    'hair',
    'hat',
 ]
 # Color palette for face parsing visualization
 FACE_PARSING_COLORS = [
    [0, 0, 0],
    [255, 85, 0],
    [255, 170, 0],
    [255, 0, 85],
    [255, 0, 170],
    [0, 255, 0],
    [85, 255, 0],
    [170, 255, 0],
    [0, 255, 85],
    [0, 255, 170],
    [0, 0, 255],
    [85, 0, 255],
    [170, 0, 255],
    [0, 85, 255],
    [0, 170, 255],
    [255, 255, 0],
    [255, 255, 85],
    [255, 255, 170],
    [255, 0, 255],
 ]
 def draw_detections(
    *,
@@ -220,3 +266,65 @@ def draw_gaze(
            (255, 255, 255),
            font_thickness,
        )
 def vis_parsing_maps(
    image: np.ndarray,
    segmentation_mask: np.ndarray,
    *,
    save_image: bool = False,
    save_path: str = 'result.png',
 ) -> np.ndarray:
    """
    Visualizes face parsing segmentation mask by overlaying colored regions on the image.
    Args:
        image: Input face image in RGB format with shape (H, W, 3).
        segmentation_mask: Segmentation mask with shape (H, W) where each pixel
                          value represents a facial component class (0-18).
        save_image: Whether to save the visualization to disk. Defaults to False.
        save_path: Path to save the visualization if save_image is True.
    Returns:
        np.ndarray: Blended image with segmentation overlay in BGR format.
    Example:
        >>> import cv2
        >>> from uniface.parsing import BiSeNet
        >>> from uniface.visualization import vis_parsing_maps
        >>>
        >>> parser = BiSeNet()
        >>> face_image = cv2.imread('face.jpg')
        >>> mask = parser.parse(face_image)
        >>>
        >>> # Visualize
        >>> face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
        >>> result = vis_parsing_maps(face_rgb, mask)
        >>> cv2.imwrite('parsed_face.jpg', result)
    """
    # Create numpy arrays for image and segmentation mask
    image = np.array(image).copy().astype(np.uint8)
    segmentation_mask = segmentation_mask.copy().astype(np.uint8)
    # Create a color mask
    segmentation_mask_color = np.zeros((segmentation_mask.shape[0], segmentation_mask.shape[1], 3))
    num_classes = np.max(segmentation_mask)
    for class_index in range(1, num_classes + 1):
        class_pixels = np.where(segmentation_mask == class_index)
        segmentation_mask_color[class_pixels[0], class_pixels[1], :] = FACE_PARSING_COLORS[class_index]
    segmentation_mask_color = segmentation_mask_color.astype(np.uint8)
    # Convert image to BGR format for blending
    bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    # Blend the image with the segmentation mask
    blended_image = cv2.addWeighted(bgr_image, 0.6, segmentation_mask_color, 0.4, 0)
    # Save the result if required
    if save_image:
        cv2.imwrite(save_path, blended_image, [int(cv2.IMWRITE_JPEG_QUALITY), 100])
    return blended_image