feat: Add head pose estimation model (#99 )

* feat: Add Head Pose Estimation with 6 different models * chore: Update jupyter notebook examples * docs: Update head pose estimation related docs
ref: Unify attribute/detector base classes and fix tools reliability (#98 )
2026-05-14 20:35:28 +00:00 · 2026-03-26 22:57:05 +09:00 · 2026-03-25 23:43:56 +09:00 · 2026-03-19 10:04:52 +09:00 · 2026-03-19 09:46:16 +09:00 · 2026-03-19 09:41:44 +09:00
138 changed files with 8916 additions and 3295 deletions
--- a/.github/logos/gaze_crop.png
+++ b/.github/logos/gaze_crop.png
--- a/.github/logos/gaze_org.png
+++ b/.github/logos/gaze_org.png
--- a/.github/logos/logo_preview.jpg
+++ b/.github/logos/logo_preview.jpg
--- a/.github/logos/logo_readme.png
+++ b/.github/logos/logo_readme.png
--- a/.github/logos/logo_web.webp
+++ b/.github/logos/logo_web.webp
--- a/.github/logos/uniface_enhanced.webp
+++ b/.github/logos/uniface_enhanced.webp
--- a/.github/logos/uniface_high_res_original.png
+++ b/.github/logos/uniface_high_res_original.png
--- a/.github/logos/uniface_rounded.png
+++ b/.github/logos/uniface_rounded.png
--- a/.github/logos/uniface_rounded_150px.png
+++ b/.github/logos/uniface_rounded_150px.png
--- a/.github/logos/uniface_rounded_q80.png
+++ b/.github/logos/uniface_rounded_q80.png
--- a/.github/logos/uniface_rounded_q80.webp
+++ b/.github/logos/uniface_rounded_q80.webp
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -1,4 +1,4 @@
-name: CI
+name: Build

 on:
  push:
@@ -20,7 +20,7 @@ jobs:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
-          python-version: "3.10"
+          python-version: "3.11"
      - uses: pre-commit/action@v3.0.1

  test:
@@ -34,9 +34,11 @@ jobs:
        include:
          # Full Python range on Linux (fastest runner)
          - os: ubuntu-latest
-            python-version: "3.10"
+            python-version: "3.11"
          - os: ubuntu-latest
            python-version: "3.13"
+          - os: ubuntu-latest
+            python-version: "3.14"
          - os: macos-latest
            python-version: "3.13"
          - os: windows-latest
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -2,7 +2,8 @@ name: Deploy docs

 on:
  push:
-    branches: [main]
+    tags:
+      - "v*.*.*"
  workflow_dispatch:

 permissions:
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -54,7 +54,7 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        python-version: ["3.10", "3.13"]
+        python-version: ["3.11", "3.13"]

    steps:
      - name: Checkout code
@@ -92,7 +92,7 @@ jobs:
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
-          python-version: "3.10"
+          python-version: "3.11"
          cache: 'pip'

      - name: Install build tools
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 tmp_*
+.vscode/

 # Byte-compiled / optimized / DLL files
 __pycache__/
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -59,12 +59,12 @@ This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formattin
 #### General Rules

 - **Line length:** 120 characters maximum
- **Python version:** 3.10+ (use modern syntax)
+- **Python version:** 3.11+ (use modern syntax)
 - **Quote style:** Single quotes for strings, double quotes for docstrings

 #### Type Hints

-Use modern Python 3.10+ type hints (PEP 585 and PEP 604):
+Use modern Python 3.11+ type hints (PEP 585 and PEP 604):

 ```python
 # Preferred (modern)
@@ -82,23 +82,23 @@ def process(items: List[str], config: Optional[Dict[str, int]] = None) -> Tuple[
 Use [Google-style docstrings](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) for all public APIs:

 ```python
-def detect_faces(image: np.ndarray, threshold: float = 0.5) -> list[Face]:
-    """Detect faces in an image.
+def create_detector(method: str = 'retinaface', **kwargs: Any) -> BaseDetector:
+    """Factory function to create face detectors.

    Args:
-        image: Input image as a numpy array with shape (H, W, C) in BGR format.
-        threshold: Confidence threshold for filtering detections. Defaults to 0.5.
+        method: Detection method. Options: 'retinaface', 'scrfd', 'yolov5face', 'yolov8face'.
+        **kwargs: Detector-specific parameters.

    Returns:
-        List of Face objects containing bounding boxes, confidence scores,
-        and facial landmarks.
+        Initialized detector instance.

    Raises:
-        ValueError: If the input image has invalid dimensions.
+        ValueError: If method is not supported.

    Example:
-        >>> from uniface import detect_faces
-        >>> faces = detect_faces(image, threshold=0.8)
+        >>> from uniface import create_detector
+        >>> detector = create_detector('retinaface', confidence_threshold=0.8)
+        >>> faces = detector.detect(image)
        >>> print(f"Found {len(faces)} faces")
    """
 ```
@@ -174,16 +174,19 @@ When adding a new model or feature:

 Example notebooks demonstrating library usage:

-| Example | Notebook |
-|---------|----------|
-| Face Detection | [01_face_detection.ipynb](examples/01_face_detection.ipynb) |
-| Face Alignment | [02_face_alignment.ipynb](examples/02_face_alignment.ipynb) |
-| Face Verification | [03_face_verification.ipynb](examples/03_face_verification.ipynb) |
-| Face Search | [04_face_search.ipynb](examples/04_face_search.ipynb) |
-| Face Analyzer | [05_face_analyzer.ipynb](examples/05_face_analyzer.ipynb) |
-| Face Parsing | [06_face_parsing.ipynb](examples/06_face_parsing.ipynb) |
+| Example            | Notebook                                                            |
+| ------------------ | ------------------------------------------------------------------- |
+| Face Detection     | [01_face_detection.ipynb](examples/01_face_detection.ipynb)         |
+| Face Alignment     | [02_face_alignment.ipynb](examples/02_face_alignment.ipynb)         |
+| Face Verification  | [03_face_verification.ipynb](examples/03_face_verification.ipynb)   |
+| Face Search        | [04_face_search.ipynb](examples/04_face_search.ipynb)               |
+| Face Analyzer      | [05_face_analyzer.ipynb](examples/05_face_analyzer.ipynb)           |
+| Face Parsing       | [06_face_parsing.ipynb](examples/06_face_parsing.ipynb)             |
 | Face Anonymization | [07_face_anonymization.ipynb](examples/07_face_anonymization.ipynb) |
-| Gaze Estimation | [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb) |
+| Gaze Estimation    | [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb)       |
+| Face Segmentation  | [09_face_segmentation.ipynb](examples/09_face_segmentation.ipynb)   |
+| Face Vector Store  | [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb)   |
+| Head Pose Estimation | [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) |

 ## Questions?

--- a/README.md
+++ b/README.md
@@ -1,34 +1,39 @@
-# UniFace: All-in-One Face Analysis Library
+<h1 align="center">UniFace: All-in-One Face Analysis Library</h1>

 <div align="center">

-[![PyPI](https://img.shields.io/pypi/v/uniface.svg?label=PyPI)](https://pypi.org/project/uniface/)
-[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
+[![PyPI Version](https://img.shields.io/pypi/v/uniface.svg?label=Version)](https://pypi.org/project/uniface/)
+[![Python Version](https://img.shields.io/badge/Python-3.11%2B-blue)](https://www.python.org/)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
-[![CI](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
+[![Github Build Status](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
 [![PyPI Downloads](https://static.pepy.tech/personalized-badge/uniface?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=Downloads)](https://pepy.tech/projects/uniface)
-[![Docs](https://img.shields.io/badge/Docs-UniFace-blue.svg)](https://yakhyo.github.io/uniface/)
+[![UniFace Documentation](https://img.shields.io/badge/Docs-UniFace-blue.svg)](https://yakhyo.github.io/uniface/)
+[![Kaggle Badge](https://img.shields.io/badge/Notebooks-Kaggle?label=Kaggle&color=blue)](https://www.kaggle.com/yakhyokhuja/code)
+[![Discord](https://img.shields.io/badge/Discord-Join%20Server-5865F2?logo=discord&logoColor=white)](https://discord.gg/wdzrjr7R5j)

 </div>

 <div align="center">
-    <img src=".github/logos/logo_web.webp" width=80%>
+    <img src="https://raw.githubusercontent.com/yakhyo/uniface/main/.github/logos/uniface_rounded_q80.webp" width="90%" alt="UniFace - All-in-One Open-Source Face Analysis Library">
 </div>

+---
+
 **UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, face parsing, gaze estimation, and attribute analysis with hardware acceleration support across platforms.

-> 💬 **Have questions?** [Chat with this codebase on DeepWiki](https://deepwiki.com/yakhyo/uniface) - AI-powered docs that let you ask anything about UniFace.
-
 ---

 ## Features

 - **Face Detection** — RetinaFace, SCRFD, YOLOv5-Face, and YOLOv8-Face with 5-point landmarks
 - **Face Recognition** — ArcFace, MobileFace, and SphereFace embeddings
- **Facial Landmarks** — 106-point landmark localization
- **Face Parsing** — BiSeNet semantic segmentation (19 classes)
+- **Face Tracking** — Multi-object tracking with [BYTETracker](https://github.com/yakhyo/bytetrack-tracker) for persistent IDs across video frames
+- **Facial Landmarks** — 106-point landmark localization module (separate from 5-point detector landmarks)
+- **Face Parsing** — BiSeNet semantic segmentation (19 classes), XSeg face masking
 - **Gaze Estimation** — Real-time gaze direction with MobileGaze
+- **Head Pose Estimation** — 3D head orientation (pitch, yaw, roll) with 6D rotation representation
 - **Attribute Analysis** — Age, gender, race (FairFace), and emotion
+- **Vector Indexing** — FAISS-backed embedding store for fast multi-identity search
 - **Anti-Spoofing** — Face liveness detection with MiniFASNet
 - **Face Anonymization** — 5 blur methods for privacy protection
 - **Hardware Acceleration** — ARM64 (Apple Silicon), CUDA (NVIDIA), CPU
@@ -37,31 +42,72 @@

 ## Installation

+**Standard installation**
+
 ```bash
-# Standard installation
 pip install uniface
+```

-# GPU support (CUDA)
+**GPU support (CUDA)**
+
+```bash
 pip install uniface[gpu]
+```

-# From source
+**From source (latest version)**
+
+```bash
 git clone https://github.com/yakhyo/uniface.git
 cd uniface && pip install -e .
 ```

+**FAISS vector indexing**
+
+```bash
+pip install faiss-cpu   # or faiss-gpu for CUDA
+```
+
+**Optional dependencies**
+- Emotion model uses TorchScript and requires `torch`:
+  `pip install torch` (choose the correct build for your OS/CUDA)
+- YOLOv5-Face and YOLOv8-Face support faster NMS with `torchvision`:
+  `pip install torch torchvision` then use `nms_mode='torchvision'`
+
 ---

-## Quick Example
+## Model Downloads and Cache
+
+Models are downloaded automatically on first use and verified via SHA-256.
+
+Default cache location: `~/.uniface/models`
+
+Override with the programmatic API or environment variable:
+
+```python
+from uniface.model_store import get_cache_dir, set_cache_dir
+
+set_cache_dir('/data/models')
+print(get_cache_dir())  # /data/models
+```
+
+```bash
+export UNIFACE_CACHE_DIR=/data/models
+```
+
+---
+
+## Quick Example (Detection)

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

-# Initialize detector (models auto-download on first use)
 detector = RetinaFace()

-# Detect faces
 image = cv2.imread("photo.jpg")
+if image is None:
+    raise ValueError("Failed to load image. Check the path to 'photo.jpg'.")
+
 faces = detector.detect(image)

 for face in faces:
@@ -71,14 +117,54 @@ for face in faces:
 ```

 <div align="center">
-    <img src="assets/test_result.png">
+    <img src="https://raw.githubusercontent.com/yakhyo/uniface/main/assets/test_result.png" width="90%">
+    <p>Face Detection Model Output</p>
 </div>

 ---

+## Example (Face Analyzer)
+
+```python
+import cv2
+from uniface.analyzer import FaceAnalyzer
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+
+detector = RetinaFace()
+recognizer = ArcFace()
+
+analyzer = FaceAnalyzer(detector, recognizer=recognizer)
+
+image = cv2.imread("photo.jpg")
+if image is None:
+    raise ValueError("Failed to load image. Check the path to 'photo.jpg'.")
+
+faces = analyzer.analyze(image)
+
+for face in faces:
+    print(face.bbox, face.embedding.shape if face.embedding is not None else None)
+```
+
+---
+
+## Execution Providers (ONNX Runtime)
+
+```python
+from uniface.detection import RetinaFace
+
+# Force CPU-only inference
+detector = RetinaFace(providers=["CPUExecutionProvider"])
+```
+
+See more in the docs:
+https://yakhyo.github.io/uniface/concepts/execution-providers/
+
+---
+
 ## Documentation

-📚 **Full documentation**: [yakhyo.github.io/uniface](https://yakhyo.github.io/uniface/)
+Full documentation: https://yakhyo.github.io/uniface/

 | Resource | Description |
 |----------|-------------|
@@ -87,8 +173,28 @@ for face in faces:
 | [API Reference](https://yakhyo.github.io/uniface/modules/detection/) | Detailed module documentation |
 | [Tutorials](https://yakhyo.github.io/uniface/recipes/image-pipeline/) | Step-by-step workflow examples |
 | [Guides](https://yakhyo.github.io/uniface/concepts/overview/) | Architecture and design principles |
+| [Datasets](https://yakhyo.github.io/uniface/datasets/) | Training data and evaluation benchmarks |

-### Jupyter Notebooks
+---
+
+## Datasets
+
+| Task | Training Dataset | Models |
+|------|-----------------|--------|
+| Detection | WIDER FACE | RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face |
+| Recognition | MS1MV2 | MobileFace, SphereFace |
+| Recognition | WebFace600K | ArcFace |
+| Recognition | WebFace4M / 12M | AdaFace |
+| Gaze | Gaze360 | MobileGaze |
+| Head Pose | 300W-LP | HeadPose (ResNet, MobileNet) |
+| Parsing | CelebAMask-HQ | BiSeNet |
+| Attributes | CelebA, FairFace, AffectNet | AgeGender, FairFace, Emotion |
+
+> See [Datasets documentation](https://yakhyo.github.io/uniface/datasets/) for download links, benchmarks, and details.
+
+---
+
+## Jupyter Notebooks

 | Example | Colab | Description |
 |---------|:-----:|-------------|
@@ -100,6 +206,22 @@ for face in faces:
 | [06_face_parsing.ipynb](examples/06_face_parsing.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/06_face_parsing.ipynb) | Semantic face segmentation |
 | [07_face_anonymization.ipynb](examples/07_face_anonymization.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/07_face_anonymization.ipynb) | Privacy-preserving blur |
 | [08_gaze_estimation.ipynb](examples/08_gaze_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | Gaze direction estimation |
+| [09_face_segmentation.ipynb](examples/09_face_segmentation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | Face segmentation with XSeg |
+| [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
+| [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | Head pose estimation (pitch, yaw, roll) |
+
+---
+
+## Licensing and Model Usage
+
+UniFace is MIT-licensed, but several pretrained models carry their own licenses.
+Review: https://yakhyo.github.io/uniface/license-attribution/
+
+Notable examples:
+- YOLOv5-Face and YOLOv8-Face weights are GPL-3.0
+- FairFace weights are CC BY 4.0
+
+If you plan commercial use, verify model license compatibility.

 ---

@@ -107,12 +229,15 @@ for face in faces:

 | Feature | Repository | Training | Description |
 |---------|------------|:--------:|-------------|
-| Detection | [retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) | [x] | RetinaFace PyTorch Training & Export |
+| Detection | [retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) | ✓ | RetinaFace PyTorch Training & Export |
 | Detection | [yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) | - | YOLOv5-Face ONNX Inference |
 | Detection | [yolov8-face-onnx-inference](https://github.com/yakhyo/yolov8-face-onnx-inference) | - | YOLOv8-Face ONNX Inference |
-| Recognition | [face-recognition](https://github.com/yakhyo/face-recognition) | [x] | MobileFace, SphereFace Training |
-| Parsing | [face-parsing](https://github.com/yakhyo/face-parsing) | [x] | BiSeNet Face Parsing |
-| Gaze | [gaze-estimation](https://github.com/yakhyo/gaze-estimation) | [x] | MobileGaze Training |
+| Tracking | [bytetrack-tracker](https://github.com/yakhyo/bytetrack-tracker) | - | BYTETracker Multi-Object Tracking |
+| Recognition | [face-recognition](https://github.com/yakhyo/face-recognition) | ✓ | MobileFace, SphereFace Training |
+| Parsing | [face-parsing](https://github.com/yakhyo/face-parsing) | ✓ | BiSeNet Face Parsing |
+| Parsing | [face-segmentation](https://github.com/yakhyo/face-segmentation) | - | XSeg Face Segmentation |
+| Gaze | [gaze-estimation](https://github.com/yakhyo/gaze-estimation) | ✓ | MobileGaze Training |
+| Head Pose | [head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) | ✓ | Head Pose Training (6DRepNet-style) |
 | Anti-Spoofing | [face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) | - | MiniFASNet Inference |
 | Attributes | [fairface-onnx](https://github.com/yakhyo/fairface-onnx) | - | FairFace ONNX Inference |

@@ -122,7 +247,16 @@ for face in faces:

 ## Contributing

-Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+Contributions are welcome. Please see [CONTRIBUTING.md](CONTRIBUTING.md).
+
+## Support
+
+If you find this project useful, consider giving it a ⭐ on GitHub — it helps others discover it!
+
+Questions or feedback:
+- Discord: https://discord.gg/wdzrjr7R5j
+- GitHub Issues: https://github.com/yakhyo/uniface/issues
+- DeepWiki Q&A: https://deepwiki.com/yakhyo/uniface

 ## License

--- a/assets/einstein/img_0.png
+++ b/assets/einstein/img_0.png
--- a/docs/concepts/coordinate-systems.md
+++ b/docs/concepts/coordinate-systems.md
@@ -93,7 +93,7 @@ landmarks = face.landmarks  # Shape: (5, 2)
 Returned by `Landmark106`:

 ```python
-from uniface import Landmark106
+from uniface.landmark import Landmark106

 landmarker = Landmark106()
 landmarks = landmarker.get_landmarks(image, face.bbox)
@@ -174,7 +174,7 @@ yaw = -90° ────┼──── yaw = +90°
 Face alignment uses 5-point landmarks to normalize face orientation:

 ```python
-from uniface import face_alignment
+from uniface.face_utils import face_alignment

 # Align face to standard template
 aligned_face = face_alignment(image, face.landmarks)
--- a/docs/concepts/execution-providers.md
+++ b/docs/concepts/execution-providers.md
@@ -9,7 +9,7 @@ UniFace uses ONNX Runtime for model inference, which supports multiple hardware
 UniFace automatically selects the optimal execution provider based on available hardware:

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 # Automatically uses best available provider
 detector = RetinaFace()
@@ -17,8 +17,8 @@ detector = RetinaFace()

 **Priority order:**

-1. **CUDAExecutionProvider** - NVIDIA GPU
-2. **CoreMLExecutionProvider** - Apple Silicon
+1. **CoreMLExecutionProvider** - Apple Silicon
+2. **CUDAExecutionProvider** - NVIDIA GPU
 3. **CPUExecutionProvider** - Fallback

 ---
@@ -28,7 +28,8 @@ detector = RetinaFace()
 You can specify which execution provider to use by passing the `providers` parameter:

 ```python
-from uniface import RetinaFace, ArcFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace

 # Force CPU execution (even if GPU is available)
 detector = RetinaFace(providers=['CPUExecutionProvider'])
@@ -38,16 +39,20 @@ recognizer = ArcFace(providers=['CPUExecutionProvider'])
 detector = RetinaFace(providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
 ```

-All model classes accept the `providers` parameter:
+All **ONNX-based** model classes accept the `providers` parameter:

 - Detection: `RetinaFace`, `SCRFD`, `YOLOv5Face`, `YOLOv8Face`
 - Recognition: `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace`
 - Landmarks: `Landmark106`
 - Gaze: `MobileGaze`
- Parsing: `BiSeNet`
+- Parsing: `BiSeNet`, `XSeg`
 - Attributes: `AgeGender`, `FairFace`
 - Anti-Spoofing: `MiniFASNet`

+!!! note "Non-ONNX components"
+    - **Emotion** uses TorchScript and selects its device automatically (`mps` / `cuda` / `cpu`). It does **not** accept the `providers` parameter.
+    - **BlurFace** is a pure OpenCV utility and does not load any model.
+
 ---

 ## Check Available Providers
@@ -174,7 +179,7 @@ pip install uniface[gpu]
 Smaller input sizes are faster but may reduce accuracy:

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 # Faster, lower accuracy
 detector = RetinaFace(input_size=(320, 320))
--- a/docs/concepts/inputs-outputs.md
+++ b/docs/concepts/inputs-outputs.md
@@ -53,6 +53,7 @@ class Face:
    race: str | None = None             # "East Asian", etc.
    emotion: str | None = None          # "Happy", etc.
    emotion_confidence: float | None = None
+    track_id: int | None = None         # Persistent ID from tracker
 ```

 ### Properties
@@ -105,6 +106,27 @@ print(f"Yaw: {np.degrees(result.yaw):.1f}°")

 ---

+### HeadPoseResult
+
+```python
+@dataclass(frozen=True)
+class HeadPoseResult:
+    pitch: float  # Rotation around X-axis (degrees), + = looking down
+    yaw: float    # Rotation around Y-axis (degrees), + = looking right
+    roll: float   # Rotation around Z-axis (degrees), + = tilting clockwise
+```
+
+**Usage:**
+
+```python
+result = head_pose.estimate(face_crop)
+print(f"Pitch: {result.pitch:.1f}°")
+print(f"Yaw: {result.yaw:.1f}°")
+print(f"Roll: {result.roll:.1f}°")
+```
+
+---
+
 ### SpoofingResult

 ```python
@@ -143,11 +165,11 @@ class AttributeResult:

 ```python
 # AgeGender model
-result = age_gender.predict(image, face.bbox)
+result = age_gender.predict(image, face)
 print(f"{result.sex}, {result.age} years old")

 # FairFace model
-result = fairface.predict(image, face.bbox)
+result = fairface.predict(image, face)
 print(f"{result.sex}, {result.age_group}, {result.race}")
 ```

@@ -170,14 +192,14 @@ Face recognition models return normalized 512-dimensional embeddings:

 ```python
 embedding = recognizer.get_normalized_embedding(image, landmarks)
-print(f"Shape: {embedding.shape}")  # (1, 512)
+print(f"Shape: {embedding.shape}")  # (512,)
 print(f"Norm: {np.linalg.norm(embedding):.4f}")  # ~1.0
 ```

 ### Similarity Computation

 ```python
-from uniface import compute_similarity
+from uniface.face_utils import compute_similarity

 similarity = compute_similarity(embedding1, embedding2)
 # Returns: float between -1 and 1 (cosine similarity)
@@ -199,16 +221,16 @@ print(f"Classes: {np.unique(mask)}")  # [0, 1, 2, ...]

 | ID | Class | ID | Class |
 |----|-------|----|-------|
-| 0 | Background | 10 | Ear Ring |
-| 1 | Skin | 11 | Nose |
-| 2 | Left Eyebrow | 12 | Mouth |
-| 3 | Right Eyebrow | 13 | Upper Lip |
-| 4 | Left Eye | 14 | Lower Lip |
-| 5 | Right Eye | 15 | Neck |
-| 6 | Eye Glasses | 16 | Neck Lace |
-| 7 | Left Ear | 17 | Cloth |
-| 8 | Right Ear | 18 | Hair |
-| 9 | Hat | | |
+| 0 | Background | 10 | Nose |
+| 1 | Skin | 11 | Mouth |
+| 2 | Left Eyebrow | 12 | Upper Lip |
+| 3 | Right Eyebrow | 13 | Lower Lip |
+| 4 | Left Eye | 14 | Neck |
+| 5 | Right Eye | 15 | Necklace |
+| 6 | Eyeglasses | 16 | Cloth |
+| 7 | Left Ear | 17 | Hair |
+| 8 | Right Ear | 18 | Hat |
+| 9 | Earring | | |

 ---

--- a/docs/concepts/model-cache-offline.md
+++ b/docs/concepts/model-cache-offline.md
@@ -9,7 +9,7 @@ UniFace automatically downloads and caches models. This page explains how model
 Models are downloaded on first use:

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 # First run: downloads model to cache
 detector = RetinaFace()  # ~3.5 MB download
@@ -32,9 +32,9 @@ Default cache directory:

 ```
 ~/.uniface/models/
-├── retinaface_mv2.onnx
-├── w600k_mbf.onnx
-├── 2d106det.onnx
+├── retinaface_mnet_v2.onnx
+├── arcface_mnet.onnx
+├── 2d_106.onnx
 ├── gaze_resnet34.onnx
 ├── parsing_resnet18.onnx
 └── ...
@@ -44,44 +44,57 @@ Default cache directory:

 ## Custom Cache Directory

-Specify a custom cache location:
+Use the programmatic API to change the cache location at runtime:

 ```python
-from uniface.model_store import verify_model_weights
-from uniface.constants import RetinaFaceWeights
+from uniface.model_store import get_cache_dir, set_cache_dir

-# Download to custom directory
-model_path = verify_model_weights(
-    RetinaFaceWeights.MNET_V2,
-    root='./my_models'
-)
-print(f"Model at: {model_path}")
+# Set a custom cache directory
+set_cache_dir('/data/models')
+
+# Verify the current path
+print(get_cache_dir())  # /data/models
+
+# All subsequent model loads use the new directory
+from uniface.detection import RetinaFace
+detector = RetinaFace()  # Downloads to /data/models/
 ```

+Or set the `UNIFACE_CACHE_DIR` environment variable (see [Environment Variables](#environment-variables) below).
+
 ---

 ## Pre-Download Models

-Download models before deployment:
+Download models before deployment using the concurrent downloader:

 ```python
-from uniface.model_store import verify_model_weights
+from uniface.model_store import download_models
 from uniface.constants import (
    RetinaFaceWeights,
    ArcFaceWeights,
    AgeGenderWeights,
 )

-# Download all needed models
-models = [
+# Download multiple models concurrently (up to 4 threads by default)
+paths = download_models([
    RetinaFaceWeights.MNET_V2,
    ArcFaceWeights.MNET,
    AgeGenderWeights.DEFAULT,
-]
+])

-for model in models:
-    path = verify_model_weights(model)
-    print(f"Downloaded: {path}")
+for model, path in paths.items():
+    print(f"{model.value} -> {path}")
+```
+
+Or download one at a time:
+
+```python
+from uniface.model_store import verify_model_weights
+from uniface.constants import RetinaFaceWeights
+
+path = verify_model_weights(RetinaFaceWeights.MNET_V2)
+print(f"Downloaded: {path}")
 ```

 Or use the CLI tool:
@@ -115,11 +128,20 @@ print(f"Copy from: {path}")
 scp -r ~/.uniface/models/ user@offline-machine:~/.uniface/models/
 ```

-### 3. Use normally
+### 3. Point to the cache (if non-default location)
+
+```python
+from uniface.model_store import set_cache_dir
+
+# Only needed if the models are not at ~/.uniface/models/
+set_cache_dir('/path/to/copied/models')
+```
+
+### 4. Use normally

 ```python
 # Models load from local cache
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 detector = RetinaFace()  # No network required
 ```

@@ -182,7 +204,12 @@ If a model fails verification, it's re-downloaded automatically.

 ## Clear Cache

-Remove cached models:
+Find and remove cached models:
+
+```python
+from uniface.model_store import get_cache_dir
+print(get_cache_dir())  # shows the active cache path
+```

 ```bash
 # Remove all cached models
@@ -198,20 +225,35 @@ Models will be re-downloaded on next use.

 ## Environment Variables

-Set custom cache location via environment variable:
+There are three equivalent ways to configure the cache directory:

-```bash
-export UNIFACE_CACHE_DIR=/path/to/custom/cache
+**1. Programmatic API (recommended)**
+
+```python
+from uniface.model_store import get_cache_dir, set_cache_dir
+
+set_cache_dir('/path/to/custom/cache')
+print(get_cache_dir())  # /path/to/custom/cache
 ```

+**2. Direct environment variable (Python)**
+
 ```python
 import os
 os.environ['UNIFACE_CACHE_DIR'] = '/path/to/custom/cache'

-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 detector = RetinaFace()  # Uses custom cache
 ```

+**3. Shell environment variable**
+
+```bash
+export UNIFACE_CACHE_DIR=/path/to/custom/cache
+```
+
+All three methods set the same `UNIFACE_CACHE_DIR` environment variable under the hood. `get_cache_dir()` always returns the resolved path.
+
 ---

 ## Next Steps
--- a/docs/concepts/overview.md
+++ b/docs/concepts/overview.md
@@ -23,11 +23,20 @@ graph TB
        LMK[Landmarks]
        ATTR[Attributes]
        GAZE[Gaze]
+        HPOSE[Head Pose]
        PARSE[Parsing]
        SPOOF[Anti-Spoofing]
        PRIV[Privacy]
    end

+    subgraph Tracking
+        TRK[BYTETracker]
+    end
+
+    subgraph Indexing
+        IDX[FAISS Vector Store]
+    end
+
    subgraph Output
        FACE[Face Objects]
    end
@@ -37,12 +46,16 @@ graph TB
    DET --> LMK
    DET --> ATTR
    DET --> GAZE
+    DET --> HPOSE
    DET --> PARSE
    DET --> SPOOF
    DET --> PRIV
+    DET --> TRK
+    REC --> IDX
    REC --> FACE
    LMK --> FACE
    ATTR --> FACE
+    TRK --> FACE
 ```

 ---
@@ -51,12 +64,14 @@ graph TB

 ### 1. ONNX-First

-All models use ONNX Runtime for inference:
+UniFace runs inference primarily via ONNX Runtime for core components:

 - **Cross-platform**: Same models work on macOS, Linux, Windows
 - **Hardware acceleration**: Automatic selection of optimal provider
 - **Production-ready**: No Python-only dependencies for inference

+Some optional components (e.g., emotion TorchScript, torchvision NMS) require PyTorch.
+
 ### 2. Minimal Dependencies

 Core dependencies are kept minimal:
@@ -74,12 +89,14 @@ tqdm          # Progress bars
 Factory functions and direct instantiation:

 ```python
-# Factory function
-detector = create_detector('retinaface')
+from uniface.detection import RetinaFace

-# Direct instantiation (recommended)
-from uniface import RetinaFace
 detector = RetinaFace()
+
+# Or via factory function
+from uniface.detection import create_detector
+
+detector = create_detector('retinaface')
 ```

 ### 4. Type Safety
@@ -99,17 +116,20 @@ def detect(self, image: np.ndarray) -> list[Face]:
 uniface/
 ├── detection/      # Face detection (RetinaFace, SCRFD, YOLOv5Face, YOLOv8Face)
 ├── recognition/    # Face recognition (AdaFace, ArcFace, MobileFace, SphereFace)
+├── tracking/       # Multi-object tracking (BYTETracker)
 ├── landmark/       # 106-point landmarks
 ├── attribute/      # Age, gender, emotion, race
 ├── parsing/        # Face semantic segmentation
 ├── gaze/           # Gaze estimation
+├── headpose/       # Head pose estimation
 ├── spoofing/       # Anti-spoofing
 ├── privacy/        # Face anonymization
-├── types.py        # Dataclasses (Face, GazeResult, etc.)
+├── indexing/       # Vector indexing (FAISS)
+├── types.py        # Dataclasses (Face, GazeResult, HeadPoseResult, etc.)
 ├── constants.py    # Model weights and URLs
 ├── model_store.py  # Model download and caching
 ├── onnx_utils.py   # ONNX Runtime utilities
-└── visualization.py # Drawing utilities
+└── draw.py         # Drawing utilities
 ```

 ---
@@ -120,7 +140,9 @@ A typical face analysis workflow:

 ```python
 import cv2
-from uniface import RetinaFace, ArcFace, AgeGender
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace

 # 1. Initialize models
 detector = RetinaFace()
@@ -139,7 +161,7 @@ for face in faces:
    embedding = recognizer.get_normalized_embedding(image, face.landmarks)

    # Attributes
-    attrs = age_gender.predict(image, face.bbox)
+    attrs = age_gender.predict(image, face)

    print(f"Face: {attrs.sex}, {attrs.age} years")
 ```
@@ -151,12 +173,20 @@ for face in faces:
 For convenience, `FaceAnalyzer` combines multiple modules:

 ```python
-from uniface import FaceAnalyzer
+from uniface.analyzer import FaceAnalyzer
+from uniface.attribute import AgeGender, FairFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+
+detector = RetinaFace()
+recognizer = ArcFace()
+age_gender = AgeGender()
+fairface = FairFace()

 analyzer = FaceAnalyzer(
-    detect=True,
-    recognize=True,
-    attributes=True
+    detector,
+    recognizer=recognizer,
+    attributes=[age_gender, fairface],
 )

 faces = analyzer.analyze(image)
@@ -170,7 +200,7 @@ for face in faces:
 ## Model Lifecycle

 1. **First use**: Model is downloaded from GitHub releases
-2. **Cached**: Stored in `~/.uniface/models/`
+2. **Cached**: Stored in `~/.uniface/models/` (configurable via `set_cache_dir()` or `UNIFACE_CACHE_DIR`)
 3. **Verified**: SHA-256 checksum validation
 4. **Loaded**: ONNX Runtime session created
 5. **Inference**: Hardware-accelerated execution
@@ -179,6 +209,11 @@ for face in faces:
 # Models auto-download on first use
 detector = RetinaFace()  # Downloads if not cached

+# Optionally configure cache location
+from uniface.model_store import get_cache_dir, set_cache_dir
+set_cache_dir('/data/models')
+print(get_cache_dir())  # /data/models
+
 # Or manually pre-download
 from uniface.model_store import verify_model_weights
 from uniface.constants import RetinaFaceWeights
--- a/docs/concepts/thresholds-calibration.md
+++ b/docs/concepts/thresholds-calibration.md
@@ -11,7 +11,7 @@ This page explains how to tune detection and recognition thresholds for your use
 Controls minimum confidence for face detection:

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 # Default (balanced)
 detector = RetinaFace(confidence_threshold=0.5)
@@ -81,7 +81,7 @@ For identity verification (same person check):

 ```python
 import numpy as np
-from uniface import compute_similarity
+from uniface.face_utils import compute_similarity

 similarity = compute_similarity(embedding1, embedding2)

@@ -199,7 +199,7 @@ else:
 For drawing detections, filter by confidence:

 ```python
-from uniface.visualization import draw_detections
+from uniface.draw import draw_detections

 # Only draw high-confidence detections
 bboxes = [f.bbox for f in faces if f.confidence > 0.7]
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -32,7 +32,7 @@ ruff check . --fix
 **Guidelines:**

 - Line length: 120
- Python 3.10+ type hints
+- Python 3.11+ type hints
 - Google-style docstrings

 ---
--- a/docs/datasets.md
+++ b/docs/datasets.md
@@ -0,0 +1,348 @@
+# Datasets
+
+Overview of all training datasets and evaluation benchmarks used by UniFace models.
+
+---
+
+## Quick Reference
+
+| Task        | Dataset                                          | Scale                  | Models                                      |
+| ----------- | ------------------------------------------------ | ---------------------- | ------------------------------------------- |
+| Detection   | [WIDER FACE](#wider-face)                        | 32K images             | RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face |
+| Recognition | [MS1MV2](#ms1mv2)                                | 5.8M images, 85.7K IDs | MobileFace, SphereFace                      |
+| Recognition | [WebFace600K](#webface600k)                      | 600K images            | ArcFace                                     |
+| Recognition | [WebFace4M / WebFace12M](#webface4m--webface12m) | 4M / 12M images        | AdaFace                                     |
+| Gaze        | [Gaze360](#gaze360)                              | 238 subjects           | MobileGaze                                  |
+| Parsing     | [CelebAMask-HQ](#celebamask-hq)                  | 30K images             | BiSeNet                                     |
+| Attributes  | [CelebA](#celeba)                                | 200K images            | AgeGender                                   |
+| Attributes  | [FairFace](#fairface)                            | Balanced demographics  | FairFace                                    |
+| Attributes  | [AffectNet](#affectnet)                          | Emotion labels         | Emotion                                     |
+
+---
+
+## Training Datasets
+
+### Face Detection
+
+#### WIDER FACE
+
+Large-scale face detection benchmark with images across 61 event categories. Contains faces with a high degree of variability in scale, pose, occlusion, expression, and illumination.
+
+| Property | Value                                       |
+| -------- | ------------------------------------------- |
+| Images   | ~32,000 (train/val/test split)              |
+| Faces    | ~394,000 annotated                          |
+| Subsets  | Easy, Medium, Hard                          |
+| Used by  | RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face |
+
+!!! info "Download & References"
+**Paper**: [WIDER FACE: A Face Detection Benchmark](https://arxiv.org/abs/1511.06523)
+
+    **Download**: [http://shuoyang1213.me/WIDERFACE/](http://shuoyang1213.me/WIDERFACE/)
+
+---
+
+### Face Recognition
+
+#### MS1MV2
+
+Refined version of the MS-Celeb-1M dataset, cleaned by InsightFace. Widely used for training face recognition models.
+
+| Property   | Value                          |
+| ---------- | ------------------------------ |
+| Identities | 85.7K                          |
+| Images     | 5.8M                           |
+| Format     | Aligned and cropped to 112x112 |
+| Used by    | MobileFace, SphereFace         |
+
+!!! info "Download"
+**Kaggle (aligned 112x112)**: [ms1m-arcface-dataset](https://www.kaggle.com/datasets/yakhyokhuja/ms1m-arcface-dataset) (from InsightFace)
+
+    **Training code**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
+
+---
+
+#### WebFace600K
+
+Medium-scale face recognition dataset from the WebFace series.
+
+| Property | Value   |
+| -------- | ------- |
+| Images   | ~600K   |
+| Used by  | ArcFace |
+
+!!! info "Source"
+**Origin**: [InsightFace](https://github.com/deepinsight/insightface)
+
+    **Paper**: [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
+
+---
+
+#### WebFace4M / WebFace12M
+
+Large-scale face recognition datasets from the WebFace260M collection. Used for training AdaFace models with adaptive quality-aware margin.
+
+| Property | WebFace4M     | WebFace12M     |
+| -------- | ------------- | -------------- |
+| Images   | ~4M           | ~12M           |
+| Used by  | AdaFace IR_18 | AdaFace IR_101 |
+
+!!! info "Source"
+**Paper**: [AdaFace: Quality Adaptive Margin for Face Recognition](https://arxiv.org/abs/2204.00964)
+
+    **Original code**: [mk-minchul/AdaFace](https://github.com/mk-minchul/AdaFace)
+
+---
+
+#### CASIA-WebFace
+
+Smaller-scale face recognition dataset suitable for academic research and lighter training runs.
+
+| Property   | Value                          |
+| ---------- | ------------------------------ |
+| Identities | 10.6K                          |
+| Images     | 491K                           |
+| Format     | Aligned and cropped to 112x112 |
+| Used by    | Alternative training set       |
+
+!!! info "Download"
+**Kaggle (aligned 112x112)**: [webface-112x112](https://www.kaggle.com/datasets/yakhyokhuja/webface-112x112) (from OpenSphere)
+
+---
+
+#### VGGFace2
+
+Large-scale dataset with wide variations in pose, age, illumination, ethnicity, and profession.
+
+| Property   | Value                          |
+| ---------- | ------------------------------ |
+| Identities | 8.6K                           |
+| Images     | 3.1M                           |
+| Format     | Aligned and cropped to 112x112 |
+| Used by    | Alternative training set       |
+
+!!! info "Download"
+**Kaggle (aligned 112x112)**: [vggface2-112x112](https://www.kaggle.com/datasets/yakhyokhuja/vggface2-112x112) (from OpenSphere)
+
+---
+
+### Gaze Estimation
+
+#### Gaze360
+
+Large-scale gaze estimation dataset collected in indoor and outdoor environments with diverse head poses and wide gaze ranges (up to 360 degrees).
+
+| Property    | Value                 |
+| ----------- | --------------------- |
+| Subjects    | 238                   |
+| Environment | Indoor and outdoor    |
+| Used by     | All MobileGaze models |
+
+!!! info "Download & Preprocessing"
+**Download**: [gaze360.csail.mit.edu/download.php](https://gaze360.csail.mit.edu/download.php)
+
+    **Preprocessing**: [GazeHub - Gaze360](https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#gaze360)
+
+!!! note "UniFace Models"
+All MobileGaze models shipped with UniFace are trained exclusively on Gaze360 for 200 epochs.
+
+**Dataset structure:**
+
+```
+data/
+└── Gaze360/
+    ├── Image/
+    └── Label/
+```
+
+---
+
+#### MPIIFaceGaze
+
+Dataset for appearance-based gaze estimation from laptop webcam images of participants during everyday laptop usage. Supported by the gaze estimation training code but not used for the UniFace pretrained weights.
+
+| Property    | Value                                    |
+| ----------- | ---------------------------------------- |
+| Subjects    | 15                                       |
+| Environment | Everyday laptop usage                    |
+| Used by     | Supported (not used for UniFace weights) |
+
+!!! info "Download & Preprocessing"
+**Download**: [MPIIFaceGaze download page](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/gaze-based-human-computer-interaction/its-written-all-over-your-face-full-face-appearance-based-gaze-estimation)
+
+    **Preprocessing**: [GazeHub - MPIIFaceGaze](https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#mpiifacegaze)
+
+**Dataset structure:**
+
+```
+data/
+└── MPIIFaceGaze/
+    ├── Image/
+    └── Label/
+```
+
+---
+
+### Head Pose Estimation
+
+#### 300W-LP
+
+Large-scale synthesized face dataset with large pose variations, generated from 300W by face profiling. Used for training head pose estimation models.
+
+| Property    | Value                         |
+| ----------- | ----------------------------- |
+| Images      | ~122,000 (synthesized)        |
+| Source      | 300W (profiled)               |
+| Pose range  | ±90° yaw                     |
+| Evaluation  | AFLW2000                      |
+| Used by     | All HeadPose models           |
+
+!!! info "Download & Reference"
+    **Paper**: [Face Alignment Across Large Poses: A 3D Solution](https://arxiv.org/abs/1511.07212)
+
+    **Training code**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation)
+
+!!! note "UniFace Models"
+    All HeadPose models shipped with UniFace are trained on 300W-LP and evaluated on AFLW2000.
+
+---
+
+### Face Parsing
+
+#### CelebAMask-HQ
+
+High-quality face parsing dataset with pixel-level annotations for 19 facial component classes.
+
+| Property   | Value                        |
+| ---------- | ---------------------------- |
+| Images     | 30,000                       |
+| Classes    | 19 facial components         |
+| Resolution | High quality                 |
+| Used by    | BiSeNet (ResNet18, ResNet34) |
+
+!!! info "Source"
+**GitHub**: [switchablenorms/CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ)
+
+    **Training code**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
+
+**Dataset structure:**
+
+```
+dataset/
+├── images/           # Input face images
+│   ├── image1.jpg
+│   └── ...
+└── labels/           # Segmentation masks
+    ├── image1.png
+    └── ...
+```
+
+---
+
+### Attribute Analysis
+
+#### CelebA
+
+Large-scale face attributes dataset widely used for training age and gender prediction models.
+
+| Property   | Value                |
+| ---------- | -------------------- |
+| Images     | ~200K                |
+| Attributes | 40 binary attributes |
+| Used by    | AgeGender            |
+
+!!! info "Reference"
+**Paper**: [Deep Learning Face Attributes in the Wild](https://arxiv.org/abs/1411.7766)
+
+---
+
+#### FairFace
+
+Face attribute dataset designed for balanced representation across race, gender, and age groups. Provides more equitable predictions compared to imbalanced datasets.
+
+| Property   | Value                               |
+| ---------- | ----------------------------------- |
+| Attributes | Race (7), Gender (2), Age Group (9) |
+| Used by    | FairFace                            |
+| License    | CC BY 4.0                           |
+
+!!! info "Reference"
+**Paper**: [FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age](https://arxiv.org/abs/1908.04913)
+
+    **ONNX inference**: [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx)
+
+---
+
+#### AffectNet
+
+Large-scale facial expression dataset for emotion recognition training.
+
+| Property | Value                                                                   |
+| -------- | ----------------------------------------------------------------------- |
+| Classes  | 7 or 8 (Neutral, Happy, Sad, Surprise, Fear, Disgust, Angry + Contempt) |
+| Used by  | Emotion (AFFECNET7, AFFECNET8)                                          |
+
+!!! info "Reference"
+**Paper**: [AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild](https://ieeexplore.ieee.org/document/8013713)
+
+---
+
+## Evaluation Benchmarks
+
+### Face Detection
+
+#### WIDER FACE Validation Set
+
+The standard benchmark for face detection models. Results are reported across three difficulty subsets.
+
+| Subset | Criteria                                      |
+| ------ | --------------------------------------------- |
+| Easy   | Large, clear, unoccluded faces                |
+| Medium | Moderate scale and occlusion                  |
+| Hard   | Small, heavily occluded, or challenging faces |
+
+See [Model Zoo - Detection](models.md#face-detection-models) for per-model accuracy on each subset.
+
+---
+
+### Face Recognition
+
+Recognition models are evaluated across multiple benchmarks. Aligned 112x112 validation datasets are available as a single download.
+
+!!! info "Download"
+**Kaggle**: [agedb-30-calfw-cplfw-lfw-aligned-112x112](https://www.kaggle.com/datasets/yakhyokhuja/agedb-30-calfw-cplfw-lfw-aligned-112x112)
+
+| Benchmark    | Description                                                       | Used by                         |
+| ------------ | ----------------------------------------------------------------- | ------------------------------- |
+| **LFW**      | Labeled Faces in the Wild - standard face verification benchmark  | ArcFace, MobileFace, SphereFace |
+| **CALFW**    | Cross-Age LFW - face verification across age gaps                 | MobileFace, SphereFace          |
+| **CPLFW**    | Cross-Pose LFW - face verification across pose variations         | MobileFace, SphereFace          |
+| **AgeDB-30** | Age database with 30-year age gaps                                | ArcFace, MobileFace, SphereFace |
+| **CFP-FP**   | Celebrities in Frontal-Profile - frontal vs. profile verification | ArcFace                         |
+| **IJB-B**    | IARPA Janus Benchmark B - TAR@FAR=0.01%                           | AdaFace                         |
+| **IJB-C**    | IARPA Janus Benchmark C - TAR@FAR=1e-4                            | AdaFace, ArcFace                |
+
+See [Model Zoo - Recognition](models.md#face-recognition-models) for per-model accuracy on each benchmark.
+
+---
+
+### Gaze Estimation
+
+| Benchmark            | Metric        | Description                                  |
+| -------------------- | ------------- | -------------------------------------------- |
+| **Gaze360 test set** | MAE (degrees) | Mean Absolute Error in gaze angle prediction |
+
+See [Model Zoo - Gaze](models.md#gaze-estimation-models) for per-model MAE scores.
+
+---
+
+## Training Repositories
+
+For training your own models or reproducing results, see the following repositories:
+
+| Task        | Repository                                                                | Datasets Supported              |
+| ----------- | ------------------------------------------------------------------------- | ------------------------------- |
+| Detection   | [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) | WIDER FACE                      |
+| Recognition | [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)     | MS1MV2, CASIA-WebFace, VGGFace2 |
+| Gaze        | [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)       | Gaze360, MPIIFaceGaze           |
+| Parsing     | [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)             | CelebAMask-HQ                   |
--- a/docs/index.md
+++ b/docs/index.md
@@ -10,12 +10,17 @@ template: home.html

 # UniFace { .hero-title }

-<p class="hero-subtitle">A lightweight, production-ready face analysis library built on ONNX Runtime</p>
+<p class="hero-subtitle">All-in-One Open-Source Face Analysis Library</p>

-[![PyPI](https://img.shields.io/pypi/v/uniface.svg?label=PyPI)](https://pypi.org/project/uniface/)
-[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
+[![PyPI Version](https://img.shields.io/pypi/v/uniface.svg?label=Version)](https://pypi.org/project/uniface/)
+[![Python Version](https://img.shields.io/badge/Python-3.11%2B-blue)](https://www.python.org/)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
+[![Github Build Status](https://github.com/yakhyo/uniface/actions/workflows/ci.yml/badge.svg)](https://github.com/yakhyo/uniface/actions)
 [![PyPI Downloads](https://static.pepy.tech/personalized-badge/uniface?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=Downloads)](https://pepy.tech/projects/uniface)
+[![Kaggle Badge](https://img.shields.io/badge/Notebooks-Kaggle?label=Kaggle&color=blue)](https://www.kaggle.com/yakhyokhuja/code)
+[![Discord](https://img.shields.io/badge/Discord-Join%20Server-5865F2?logo=discord&logoColor=white)](https://discord.gg/wdzrjr7R5j)
+
+<!-- <img src="https://raw.githubusercontent.com/yakhyo/uniface/main/.github/logos/uniface_rounded_q80.webp" alt="UniFace - All-in-One Open-Source Face Analysis Library" style="max-width: 70%; margin: 1rem 0;"> -->

 [Get Started](quickstart.md){ .md-button .md-button--primary }
 [View on GitHub](https://github.com/yakhyo/uniface){ .md-button }
@@ -54,6 +59,16 @@ BiSeNet semantic segmentation with 19 facial component classes.
 Real-time gaze direction prediction with MobileGaze models.
 </div>

+<div class="feature-card" markdown>
+### :material-axis-arrow: Head Pose
+3D head orientation (pitch, yaw, roll) estimation with 6D rotation models.
+</div>
+
+<div class="feature-card" markdown>
+### :material-motion-play: Tracking
+Multi-object tracking with BYTETracker for persistent face IDs across video frames.
+</div>
+
 <div class="feature-card" markdown>
 ### :material-shield-check: Anti-Spoofing
 Face liveness detection with MiniFASNet to prevent fraud.
@@ -64,31 +79,35 @@ Face liveness detection with MiniFASNet to prevent fraud.
 Face anonymization with 5 blur methods for privacy protection.
 </div>

+<div class="feature-card" markdown>
+### :material-database-search: Vector Indexing
+FAISS-backed embedding store for fast multi-identity face search.
+</div>
+
 </div>

 ---

 ## Installation

-=== "Standard"
+UniFace runs inference primarily via **ONNX Runtime**; some optional components (e.g., emotion TorchScript, torchvision NMS) require **PyTorch**.

-    ```bash
-    pip install uniface
-    ```
+**Standard**
+```bash
+pip install uniface
+```

-=== "GPU (CUDA)"
+**GPU (CUDA)**
+```bash
+pip install uniface[gpu]
+```

-    ```bash
-    pip install uniface[gpu]
-    ```
-
-=== "From Source"
-
-    ```bash
-    git clone https://github.com/yakhyo/uniface.git
-    cd uniface
-    pip install -e .
-    ```
+**From Source**
+```bash
+git clone https://github.com/yakhyo/uniface.git
+cd uniface
+pip install -e .
+```

 ---

--- a/docs/installation.md
+++ b/docs/installation.md
@@ -6,7 +6,7 @@ This guide covers all installation options for UniFace.

 ## Requirements

- **Python**: 3.10 or higher
+- **Python**: 3.11 or higher
 - **Operating Systems**: macOS, Linux, Windows

 ---
@@ -55,11 +55,10 @@ pip install uniface[gpu]

 **Requirements:**

- CUDA 11.x or 12.x
- cuDNN 8.x
+- `uniface[gpu]` automatically installs `onnxruntime-gpu`. Requirements depend on the ORT version and execution provider.

 !!! info "CUDA Compatibility"
-    See [ONNX Runtime GPU requirements](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html) for detailed compatibility matrix.
+    See the [ONNX Runtime GPU compatibility matrix](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html) for matching CUDA and cuDNN versions.

 Verify GPU installation:

@@ -71,6 +70,19 @@ print("Available providers:", ort.get_available_providers())

 ---

+### FAISS Vector Indexing
+
+For fast multi-identity face search using a FAISS index:
+
+```bash
+pip install faiss-cpu   # CPU
+pip install faiss-gpu   # NVIDIA GPU (CUDA)
+```
+
+See the [Indexing module](modules/indexing.md) for usage.
+
+---
+
 ### CPU-Only (All Platforms)

 ```bash
@@ -108,9 +120,19 @@ UniFace has minimal dependencies:
 | `numpy` | Array operations |
 | `opencv-python` | Image processing |
 | `onnxruntime` | Model inference |
+| `scikit-image` | Geometric transforms |
 | `requests` | Model download |
 | `tqdm` | Progress bars |

+**Optional:**
+
+| Package | Install extra | Purpose |
+|---------|---------------|---------|
+| `faiss-cpu` / `faiss-gpu` | `pip install faiss-cpu` | FAISS vector indexing |
+| `onnxruntime-gpu` | `uniface[gpu]` | CUDA acceleration |
+| `torch` | `pip install torch` | Emotion model uses TorchScript |
+| `torchvision` | `pip install torchvision` | Faster NMS for YOLO detectors |
+
 ---

 ## Verify Installation
@@ -126,7 +148,7 @@ import onnxruntime as ort
 print(f"Available providers: {ort.get_available_providers()}")

 # Quick test
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 detector = RetinaFace()
 print("Installation successful!")
 ```
@@ -137,11 +159,11 @@ print("Installation successful!")

 ### Import Errors

-If you encounter import errors, ensure you're using Python 3.10+:
+If you encounter import errors, ensure you're using Python 3.11+:

 ```bash
 python --version
-# Should show: Python 3.10.x or higher
+# Should show: Python 3.11.x or higher
 ```

 ### Model Download Issues
--- a/docs/models.md
+++ b/docs/models.md
@@ -8,7 +8,7 @@ Complete guide to all available models and their performance characteristics.

 ### RetinaFace Family

-RetinaFace models are trained on the WIDER FACE dataset.
+RetinaFace models are trained on the [WIDER FACE](datasets.md#wider-face) dataset.

 | Model Name     | Params | Size  | Easy   | Medium | Hard   |
 | -------------- | ------ | ----- | ------ | ------ | ------ |
@@ -22,29 +22,29 @@ RetinaFace models are trained on the WIDER FACE dataset.
 !!! info "Accuracy & Benchmarks"
    **Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets) - from [RetinaFace paper](https://arxiv.org/abs/1905.00641)

-    **Speed**: Benchmark on your own hardware using `python tools/detection.py --source <image> --iterations 100`
+    **Speed**: Benchmark on your own hardware using `python tools/detect.py --source <image>`

 ---

 ### SCRFD Family

-SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models trained on WIDER FACE dataset.
+SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models trained on [WIDER FACE](datasets.md#wider-face) dataset.

 | Model Name       | Params | Size  | Easy   | Medium | Hard   |
 | ---------------- | ------ | ----- | ------ | ------ | ------ |
-| `SCRFD_500M`   | 0.6M   | 2.5MB | 90.57% | 88.12% | 68.51% |
-| `SCRFD_10G` :material-check-circle: | 4.2M   | 17MB  | 95.16% | 93.87% | 83.05% |
+| `SCRFD_500M_KPS`   | 0.6M   | 2.5MB | 90.57% | 88.12% | 68.51% |
+| `SCRFD_10G_KPS` :material-check-circle: | 4.2M   | 17MB  | 95.16% | 93.87% | 83.05% |

 !!! info "Accuracy & Benchmarks"
    **Accuracy**: WIDER FACE validation set - from [SCRFD paper](https://arxiv.org/abs/2105.04714)

-    **Speed**: Benchmark on your own hardware using `python tools/detection.py --source <image> --iterations 100`
+    **Speed**: Benchmark on your own hardware using `python tools/detect.py --source <image>`

 ---

 ### YOLOv5-Face Family

-YOLOv5-Face models provide detection with 5-point facial landmarks, trained on WIDER FACE dataset.
+YOLOv5-Face models provide detection with 5-point facial landmarks, trained on [WIDER FACE](datasets.md#wider-face) dataset.

 | Model Name     | Size | Easy   | Medium | Hard   |
 | -------------- | ---- | ------ | ------ | ------ |
@@ -55,7 +55,7 @@ YOLOv5-Face models provide detection with 5-point facial landmarks, trained on W
 !!! info "Accuracy & Benchmarks"
    **Accuracy**: WIDER FACE validation set - from [YOLOv5-Face paper](https://arxiv.org/abs/2105.12931)

-    **Speed**: Benchmark on your own hardware using `python tools/detection.py --source <image> --iterations 100`
+    **Speed**: Benchmark on your own hardware using `python tools/detect.py --source <image>`

 !!! note "Fixed Input Size"
    All YOLOv5-Face models use a fixed input size of 640×640.
@@ -74,7 +74,7 @@ YOLOv8-Face models use anchor-free design with DFL (Distribution Focal Loss) for
 !!! info "Accuracy & Benchmarks"
    **Accuracy**: WIDER FACE validation set (Easy/Medium/Hard subsets)

-    **Speed**: Benchmark on your own hardware using `python tools/detection.py --source <image> --method yolov8face`
+    **Speed**: Benchmark on your own hardware using `python tools/detect.py --source <image> --method yolov8face`

 !!! note "Fixed Input Size"
    All YOLOv8-Face models use a fixed input size of 640×640.
@@ -93,7 +93,7 @@ Face recognition using adaptive margin based on image quality.
 | `IR_101`  | IR-101   | WebFace12M  | 249 MB | -         | 97.66%    |

 !!! info "Training Data & Accuracy"
-    **Dataset**: WebFace4M (4M images) / WebFace12M (12M images)
+    **Dataset**: [WebFace4M / WebFace12M](datasets.md#webface4m--webface12m) (4M / 12M images)

    **Accuracy**: IJB-B and IJB-C benchmarks, TAR@FAR=0.01%

@@ -113,7 +113,7 @@ Face recognition using additive angular margin loss.
 | `RESNET`  | ResNet50  | 43.6M  | 166MB | 99.83% | 99.33% | 98.23%   | 97.25% |

 !!! info "Training Data"
-    **Dataset**: Trained on WebFace600K (600K images)
+    **Dataset**: Trained on [WebFace600K](datasets.md#webface600k) (600K images)

    **Accuracy**: IJB-C accuracy reported as TAR@FAR=1e-4

@@ -131,7 +131,7 @@ Lightweight face recognition models with MobileNet backbones.
 | `MNET_V3_LARGE` | MobileNetV3-L    | 3.52M  | 10MB | 99.53% | 94.56% | 86.79% | 95.13%   |

 !!! info "Training Data"
-    **Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
+    **Dataset**: Trained on [MS1MV2](datasets.md#ms1mv2) (5.8M images, 85K identities)

    **Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks

@@ -147,7 +147,7 @@ Face recognition using angular softmax loss.
 | `SPHERE36` | Sphere36 | 34.6M  | 92MB | 99.72% | 95.64% | 89.92% | 96.83%   |

 !!! info "Training Data"
-    **Dataset**: Trained on MS1M-V2 (5.8M images, 85K identities)
+    **Dataset**: Trained on [MS1MV2](datasets.md#ms1mv2) (5.8M images, 85K identities)

    **Accuracy**: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks

@@ -187,7 +187,7 @@ Facial landmark localization model.
 | `AgeGender` | Age, Gender | 2.1M   | 8MB  |

 !!! info "Training Data"
-    **Dataset**: Trained on CelebA
+    **Dataset**: Trained on [CelebA](datasets.md#celeba)

 !!! warning "Accuracy Note"
    Accuracy varies by demographic and image quality. Test on your specific use case.
@@ -201,7 +201,7 @@ Facial landmark localization model.
 | `FairFace` | Race, Gender, Age Group | -      | 44MB  |

 !!! info "Training Data"
-    **Dataset**: Trained on FairFace dataset with balanced demographics
+    **Dataset**: Trained on [FairFace](datasets.md#fairface) dataset with balanced demographics

 !!! tip "Equitable Predictions"
    FairFace provides more equitable predictions across different racial and gender groups.
@@ -219,12 +219,12 @@ Facial landmark localization model.
 | `AFFECNET7` | 7       | 0.5M   | 2MB  |
 | `AFFECNET8` | 8       | 0.5M   | 2MB  |

-**Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger
+**Classes (7)**: Neutral, Happy, Sad, Surprise, Fear, Disgust, Angry

 **Classes (8)**: Above + Contempt

 !!! info "Training Data"
-    **Dataset**: Trained on AffectNet
+    **Dataset**: Trained on [AffectNet](datasets.md#affectnet)

 !!! note "Accuracy Note"
    Emotion detection accuracy depends heavily on facial expression clarity and cultural context.
@@ -235,7 +235,7 @@ Facial landmark localization model.

 ### MobileGaze Family

-Gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
+Gaze direction prediction models trained on [Gaze360](datasets.md#gaze360) dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.

 | Model Name     | Params | Size    | MAE*  |
 | -------------- | ------ | ------- | ----- |
@@ -248,7 +248,7 @@ Gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vert
 *MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better

 !!! info "Training Data"
-    **Dataset**: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)
+    **Dataset**: Trained on [Gaze360](datasets.md#gaze360) (indoor/outdoor scenes with diverse head poses)

    **Training**: 200 epochs with classification-based approach (binned angles)

@@ -257,6 +257,33 @@ Gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vert

 ---

+## Head Pose Estimation Models
+
+### HeadPose Family
+
+Head pose estimation models using 6D rotation representation. Trained on [300W-LP](datasets.md#300w-lp) dataset, evaluated on AFLW2000. Returns pitch, yaw, and roll angles in degrees.
+
+| Model Name     | Backbone | Size    | MAE*  |
+| -------------- | -------- | ------- | ----- |
+| `RESNET18` :material-check-circle: | ResNet18 | 43 MB   | 5.22° |
+| `RESNET34`   | ResNet34 | 82 MB   | 5.07° |
+| `RESNET50`   | ResNet50 | 91 MB   | 4.83° |
+| `MOBILENET_V2` | MobileNetV2 | 9.6 MB  | 5.72° |
+| `MOBILENET_V3_SMALL` | MobileNetV3-Small | 4.8 MB  | 6.31° |
+| `MOBILENET_V3_LARGE` | MobileNetV3-Large | 16 MB   | 5.58° |
+
+*MAE (Mean Absolute Error) in degrees on AFLW2000 test set — lower is better
+
+!!! info "Training Data"
+    **Dataset**: Trained on [300W-LP](datasets.md#300w-lp) (synthesized large-pose faces from 300W)
+
+    **Method**: 6D rotation representation (rotation matrix → Euler angles)
+
+!!! note "Input Requirements"
+    Requires face crop as input. Use face detection first to obtain bounding boxes.
+
+---
+
 ## Face Parsing Models

 ### BiSeNet Family
@@ -269,7 +296,7 @@ BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segme
 | `RESNET34`   | 24.1M  | 89.2 MB | 19      |

 !!! info "Training Data"
-    **Dataset**: Trained on CelebAMask-HQ
+    **Dataset**: Trained on [CelebAMask-HQ](datasets.md#celebamask-hq)

    **Architecture**: BiSeNet with ResNet backbone

@@ -279,13 +306,13 @@ BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segme

 | # | Class | # | Class | # | Class |
 |---|-------|---|-------|---|-------|
-| 1 | Background | 8 | Left Ear | 15 | Neck |
-| 2 | Skin | 9 | Right Ear | 16 | Neck Lace |
-| 3 | Left Eyebrow | 10 | Ear Ring | 17 | Cloth |
-| 4 | Right Eyebrow | 11 | Nose | 18 | Hair |
-| 5 | Left Eye | 12 | Mouth | 19 | Hat |
-| 6 | Right Eye | 13 | Upper Lip | | |
-| 7 | Eye Glasses | 14 | Lower Lip | | |
+| 0 | Background | 7 | Left Ear | 14 | Neck |
+| 1 | Skin | 8 | Right Ear | 15 | Neck Lace |
+| 2 | Left Eyebrow | 9 | Ear Ring | 16 | Cloth |
+| 3 | Right Eyebrow | 10 | Nose | 17 | Hair |
+| 4 | Left Eye | 11 | Mouth | 18 | Hat |
+| 5 | Right Eye | 12 | Upper Lip | | |
+| 6 | Eye Glasses | 13 | Lower Lip | | |

 **Applications:**

@@ -300,6 +327,32 @@ BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segme

 ---

+### XSeg
+
+XSeg from DeepFaceLab outputs masks for face regions. Requires 5-point landmarks for face alignment.
+
+| Model Name | Size   | Output |
+|------------|--------|--------|
+| `DEFAULT`  | 67 MB  | Mask [0, 1] |
+
+!!! info "Model Details"
+    **Origin**: DeepFaceLab
+
+    **Input**: NHWC format, normalized to [0, 1]
+
+    **Alignment**: Requires 5-point landmarks (not bbox crops)
+
+**Applications:**
+
+- Face region extraction
+- Face swapping pipelines
+- Occlusion handling
+
+!!! note "Input Requirements"
+    Requires 5-point facial landmarks. Use a face detector like RetinaFace to obtain landmarks first.
+
+---
+
 ## Anti-Spoofing Models

 ### MiniFASNet Family
@@ -323,10 +376,14 @@ Face anti-spoofing models for liveness detection. Detect if a face is real (live

 Models are automatically downloaded and cached on first use.

- **Cache location**: `~/.uniface/models/`
+- **Cache location**: `~/.uniface/models/` (configurable via `set_cache_dir()` or `UNIFACE_CACHE_DIR` env var)
+- **Inspect cache path**: `get_cache_dir()` returns the resolved active path
 - **Verification**: Models are verified with SHA-256 checksums
+- **Concurrent download**: `download_models([...])` fetches multiple models in parallel
 - **Manual download**: Use `python tools/download_model.py` to pre-download models

+See [Model Cache & Offline Use](concepts/model-cache-offline.md) for full details.
+
 ---

 ## References
@@ -342,7 +399,9 @@ Models are automatically downloaded and cached on first use.
 - **AdaFace ONNX**: [yakhyo/adaface-onnx](https://github.com/yakhyo/adaface-onnx) - ONNX export and inference
 - **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
 - **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
+- **Head Pose Estimation**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) - 6D rotation head pose estimation training and ONNX models
 - **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
+- **Face Segmentation**: [yakhyo/face-segmentation](https://github.com/yakhyo/face-segmentation) - XSeg ONNX Inference
 - **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
 - **FairFace**: [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx) - FairFace ONNX inference for race, gender, age prediction
 - **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
--- a/docs/modules/attributes.md
+++ b/docs/modules/attributes.md
@@ -21,7 +21,8 @@ Predicts exact age and binary gender.
 ### Basic Usage

 ```python
-from uniface import RetinaFace, AgeGender
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace

 detector = RetinaFace()
 age_gender = AgeGender()
@@ -29,9 +30,10 @@ age_gender = AgeGender()
 faces = detector.detect(image)

 for face in faces:
-    result = age_gender.predict(image, face.bbox)
+    result = age_gender.predict(image, face)
    print(f"Gender: {result.sex}")  # "Female" or "Male"
    print(f"Age: {result.age} years")
+    # face.gender and face.age are also set automatically
 ```

 ### Output
@@ -54,7 +56,8 @@ Predicts gender, age group, and race with balanced demographics.
 ### Basic Usage

 ```python
-from uniface import RetinaFace, FairFace
+from uniface.attribute import FairFace
+from uniface.detection import RetinaFace

 detector = RetinaFace()
 fairface = FairFace()
@@ -62,10 +65,11 @@ fairface = FairFace()
 faces = detector.detect(image)

 for face in faces:
-    result = fairface.predict(image, face.bbox)
+    result = fairface.predict(image, face)
    print(f"Gender: {result.sex}")
    print(f"Age Group: {result.age_group}")
    print(f"Race: {result.race}")
+    # face.gender, face.age_group, face.race are also set automatically
 ```

 ### Output
@@ -120,7 +124,7 @@ Predicts facial emotions. Requires PyTorch.
 ### Basic Usage

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.attribute import Emotion
 from uniface.constants import DDAMFNWeights

@@ -130,7 +134,7 @@ emotion = Emotion(model_name=DDAMFNWeights.AFFECNET7)
 faces = detector.detect(image)

 for face in faces:
-    result = emotion.predict(image, face.landmarks)
+    result = emotion.predict(image, face)
    print(f"Emotion: {result.emotion}")
    print(f"Confidence: {result.confidence:.2%}")
 ```
@@ -147,7 +151,7 @@ for face in faces:
    | Surprise |
    | Fear |
    | Disgust |
-    | Anger |
+    | Angry |

 === "8-Class (AFFECNET8)"

@@ -159,7 +163,7 @@ for face in faces:
    | Surprise |
    | Fear |
    | Disgust |
-    | Anger |
+    | Angry |
    | Contempt |

 ### Model Variants
@@ -177,12 +181,29 @@ emotion = Emotion(model_name=DDAMFNWeights.AFFECNET8)

 ---

+## Factory Function
+
+Use `create_attribute_predictor()` for dynamic model selection:
+
+```python
+from uniface import create_attribute_predictor
+
+age_gender = create_attribute_predictor('age_gender')
+fairface = create_attribute_predictor('fairface')
+emotion = create_attribute_predictor('emotion')
+```
+
+Available model names: `'age_gender'`, `'fairface'`, `'emotion'`.
+
+---
+
 ## Combining Models

 ### Full Attribute Analysis

 ```python
-from uniface import RetinaFace, AgeGender, FairFace
+from uniface.attribute import AgeGender, FairFace
+from uniface.detection import RetinaFace

 detector = RetinaFace()
 age_gender = AgeGender()
@@ -192,10 +213,10 @@ faces = detector.detect(image)

 for face in faces:
    # Get exact age from AgeGender
-    ag_result = age_gender.predict(image, face.bbox)
+    ag_result = age_gender.predict(image, face)

    # Get race from FairFace
-    ff_result = fairface.predict(image, face.bbox)
+    ff_result = fairface.predict(image, face)

    print(f"Gender: {ag_result.sex}")
    print(f"Exact Age: {ag_result.age}")
@@ -206,12 +227,13 @@ for face in faces:
 ### Using FaceAnalyzer

 ```python
-from uniface import FaceAnalyzer
+from uniface.analyzer import FaceAnalyzer
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace

 analyzer = FaceAnalyzer(
-    detect=True,
-    recognize=False,
-    attributes=True  # Uses AgeGender
+    RetinaFace(),
+    attributes=[AgeGender()],
 )

 faces = analyzer.analyze(image)
@@ -253,7 +275,7 @@ def draw_attributes(image, face, result):

 # Usage
 for face in faces:
-    result = age_gender.predict(image, face.bbox)
+    result = age_gender.predict(image, face)
    image = draw_attributes(image, face, result)

 cv2.imwrite("attributes.jpg", image)
--- a/docs/modules/detection.md
+++ b/docs/modules/detection.md
@@ -24,7 +24,7 @@ Single-shot face detector with multi-scale feature pyramid.
 ### Basic Usage

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 detector = RetinaFace()
 faces = detector.detect(image)
@@ -38,7 +38,7 @@ for face in faces:
 ### Model Variants

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.constants import RetinaFaceWeights

 # Lightweight (mobile/edge)
@@ -82,7 +82,7 @@ State-of-the-art detection with excellent accuracy-speed tradeoff.
 ### Basic Usage

 ```python
-from uniface import SCRFD
+from uniface.detection import SCRFD

 detector = SCRFD()
 faces = detector.detect(image)
@@ -91,7 +91,7 @@ faces = detector.detect(image)
 ### Model Variants

 ```python
-from uniface import SCRFD
+from uniface.detection import SCRFD
 from uniface.constants import SCRFDWeights

 # Real-time (lightweight)
@@ -127,7 +127,7 @@ YOLO-based detection optimized for faces.
 ### Basic Usage

 ```python
-from uniface import YOLOv5Face
+from uniface.detection import YOLOv5Face

 detector = YOLOv5Face()
 faces = detector.detect(image)
@@ -136,7 +136,7 @@ faces = detector.detect(image)
 ### Model Variants

 ```python
-from uniface import YOLOv5Face
+from uniface.detection import YOLOv5Face
 from uniface.constants import YOLOv5FaceWeights

 # Lightweight
@@ -179,7 +179,7 @@ Anchor-free detection with DFL (Distribution Focal Loss) for accurate bbox regre
 ### Basic Usage

 ```python
-from uniface import YOLOv8Face
+from uniface.detection import YOLOv8Face

 detector = YOLOv8Face()
 faces = detector.detect(image)
@@ -188,7 +188,7 @@ faces = detector.detect(image)
 ### Model Variants

 ```python
-from uniface import YOLOv8Face
+from uniface.detection import YOLOv8Face
 from uniface.constants import YOLOv8FaceWeights

 # Lightweight
@@ -225,7 +225,7 @@ detector = YOLOv8Face(
 Create detectors dynamically:

 ```python
-from uniface import create_detector
+from uniface.detection import create_detector

 detector = create_detector('retinaface')
 # or
@@ -238,22 +238,6 @@ detector = create_detector('yolov8face')

 ---

-## High-Level API
-
-One-line detection:
-
-```python
-from uniface import detect_faces
-
-# Using RetinaFace (default)
-faces = detect_faces(image, method='retinaface', confidence_threshold=0.5)
-
-# Using YOLOv8-Face
-faces = detect_faces(image, method='yolov8face', confidence_threshold=0.5)
-```
-
---
-
 ## Output Format

 All detectors return `list[Face]`:
@@ -276,7 +260,7 @@ for face in faces:
 ## Visualization

 ```python
-from uniface.visualization import draw_detections
+from uniface.draw import draw_detections

 draw_detections(
    image=image,
@@ -296,7 +280,7 @@ cv2.imwrite("result.jpg", image)
 Benchmark on your hardware:

 ```bash
-python tools/detection.py --source image.jpg --iterations 100
+python tools/detect.py --source image.jpg
 ```

 ---
--- a/docs/modules/gaze.md
+++ b/docs/modules/gaze.md
@@ -23,7 +23,8 @@ Gaze estimation predicts where a person is looking (pitch and yaw angles).
 ```python
 import cv2
 import numpy as np
-from uniface import RetinaFace, MobileGaze
+from uniface.detection import RetinaFace
+from uniface.gaze import MobileGaze

 detector = RetinaFace()
 gaze_estimator = MobileGaze()
@@ -52,7 +53,7 @@ for face in faces:
 ## Model Variants

 ```python
-from uniface import MobileGaze
+from uniface.gaze import MobileGaze
 from uniface.constants import GazeWeights

 # Default (ResNet34, recommended)
@@ -102,7 +103,7 @@ yaw = -90° ────┼──── yaw = +90°
 ## Visualization

 ```python
-from uniface.visualization import draw_gaze
+from uniface.draw import draw_gaze

 # Detect faces
 faces = detector.detect(image)
@@ -154,8 +155,9 @@ def draw_gaze_custom(image, bbox, pitch, yaw, length=100, color=(0, 255, 0)):
 ```python
 import cv2
 import numpy as np
-from uniface import RetinaFace, MobileGaze
-from uniface.visualization import draw_gaze
+from uniface.detection import RetinaFace
+from uniface.gaze import MobileGaze
+from uniface.draw import draw_gaze

 detector = RetinaFace()
 gaze_estimator = MobileGaze()
@@ -256,7 +258,7 @@ print(f"Looking: {direction}")
 ## Factory Function

 ```python
-from uniface import create_gaze_estimator
+from uniface.gaze import create_gaze_estimator

 gaze = create_gaze_estimator()  # Returns MobileGaze
 ```
@@ -265,6 +267,7 @@ gaze = create_gaze_estimator()  # Returns MobileGaze

 ## Next Steps

+- [Head Pose Estimation](headpose.md) - 3D head orientation
 - [Anti-Spoofing](spoofing.md) - Face liveness detection
 - [Privacy](privacy.md) - Face anonymization
 - [Video Recipe](../recipes/video-webcam.md) - Real-time processing
--- a/docs/modules/headpose.md
+++ b/docs/modules/headpose.md
@@ -0,0 +1,232 @@
+# Head Pose Estimation
+
+Head pose estimation predicts the 3D orientation of a person's head (pitch, yaw, and roll angles).
+
+---
+
+## Available Models
+
+| Model | Backbone | Size | MAE* |
+|-------|----------|------|------|
+| **ResNet18** :material-check-circle: | ResNet18 | 43 MB | 5.22° |
+| ResNet34 | ResNet34 | 82 MB | 5.07° |
+| ResNet50 | ResNet50 | 91 MB | 4.83° |
+| MobileNetV2 | MobileNetV2 | 9.6 MB | 5.72° |
+| MobileNetV3-Small | MobileNetV3 | 4.8 MB | 6.31° |
+| MobileNetV3-Large | MobileNetV3 | 16 MB | 5.58° |
+
+*MAE = Mean Absolute Error on AFLW2000 test set (lower is better)
+
+---
+
+## Basic Usage
+
+```python
+import cv2
+from uniface.detection import RetinaFace
+from uniface.headpose import HeadPose
+
+detector = RetinaFace()
+head_pose = HeadPose()
+
+image = cv2.imread("photo.jpg")
+faces = detector.detect(image)
+
+for face in faces:
+    # Crop face
+    x1, y1, x2, y2 = map(int, face.bbox)
+    face_crop = image[y1:y2, x1:x2]
+
+    if face_crop.size > 0:
+        # Estimate head pose
+        result = head_pose.estimate(face_crop)
+        print(f"Pitch: {result.pitch:.1f}°, Yaw: {result.yaw:.1f}°, Roll: {result.roll:.1f}°")
+```
+
+---
+
+## Model Variants
+
+```python
+from uniface.headpose import HeadPose
+from uniface.constants import HeadPoseWeights
+
+# Default (ResNet18, recommended balance of speed and accuracy)
+hp = HeadPose()
+
+# Lightweight for mobile/edge
+hp = HeadPose(model_name=HeadPoseWeights.MOBILENET_V3_SMALL)
+
+# Higher accuracy
+hp = HeadPose(model_name=HeadPoseWeights.RESNET50)
+```
+
+---
+
+## Output Format
+
+```python
+result = head_pose.estimate(face_crop)
+
+# HeadPoseResult dataclass
+result.pitch  # Rotation around X-axis in degrees
+result.yaw    # Rotation around Y-axis in degrees
+result.roll   # Rotation around Z-axis in degrees
+```
+
+### Angle Convention
+
+```
+          pitch > 0 (looking down)
+               │
+               │
+yaw < 0  ─────┼───── yaw > 0
+(looking left) │     (looking right)
+               │
+          pitch < 0 (looking up)
+
+roll > 0 = clockwise tilt
+roll < 0 = counter-clockwise tilt
+```
+
+- **Pitch**: Rotation around X-axis (positive = looking down)
+- **Yaw**: Rotation around Y-axis (positive = looking right)
+- **Roll**: Rotation around Z-axis (positive = tilting clockwise)
+
+---
+
+## Visualization
+
+### 3D Cube (default)
+
+The default visualization draws a wireframe cube oriented to match the head pose.
+
+```python
+from uniface.draw import draw_head_pose
+
+faces = detector.detect(image)
+
+for face in faces:
+    x1, y1, x2, y2 = map(int, face.bbox)
+    face_crop = image[y1:y2, x1:x2]
+
+    if face_crop.size > 0:
+        result = head_pose.estimate(face_crop)
+
+        # Draw cube on image (default)
+        draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll)
+
+cv2.imwrite("headpose_output.jpg", image)
+```
+
+### Axis Visualization
+
+```python
+from uniface.draw import draw_head_pose
+
+# X/Y/Z coordinate axes
+draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll, draw_type='axis')
+```
+
+### Low-Level Drawing Functions
+
+```python
+from uniface.draw import draw_head_pose_cube, draw_head_pose_axis
+
+# Draw cube directly
+draw_head_pose_cube(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
+
+# Draw axes directly
+draw_head_pose_axis(image, yaw=10.0, pitch=-5.0, roll=2.0, bbox=[100, 100, 250, 280])
+```
+
+---
+
+## Real-Time Head Pose Tracking
+
+```python
+import cv2
+from uniface.detection import RetinaFace
+from uniface.headpose import HeadPose
+from uniface.draw import draw_head_pose
+
+detector = RetinaFace()
+head_pose = HeadPose()
+
+cap = cv2.VideoCapture(0)
+
+while True:
+    ret, frame = cap.read()
+    if not ret:
+        break
+
+    faces = detector.detect(frame)
+
+    for face in faces:
+        x1, y1, x2, y2 = map(int, face.bbox)
+        face_crop = frame[y1:y2, x1:x2]
+
+        if face_crop.size > 0:
+            result = head_pose.estimate(face_crop)
+            draw_head_pose(frame, face.bbox, result.pitch, result.yaw, result.roll)
+
+    cv2.imshow("Head Pose Estimation", frame)
+
+    if cv2.waitKey(1) & 0xFF == ord('q'):
+        break
+
+cap.release()
+cv2.destroyAllWindows()
+```
+
+---
+
+## Use Cases
+
+### Driver Drowsiness Detection
+
+```python
+def is_head_drooping(result, pitch_threshold=-15):
+    """Check if the head is drooping (looking down significantly)."""
+    return result.pitch < pitch_threshold
+
+result = head_pose.estimate(face_crop)
+if is_head_drooping(result):
+    print("Warning: Head drooping detected")
+```
+
+### Attention Monitoring
+
+```python
+def is_facing_forward(result, threshold=20):
+    """Check if the person is facing roughly forward."""
+    return (
+        abs(result.pitch) < threshold
+        and abs(result.yaw) < threshold
+        and abs(result.roll) < threshold
+    )
+
+result = head_pose.estimate(face_crop)
+if is_facing_forward(result):
+    print("Facing forward")
+else:
+    print("Looking away")
+```
+
+---
+
+## Factory Function
+
+```python
+from uniface.headpose import create_head_pose_estimator
+
+hp = create_head_pose_estimator()  # Returns HeadPose
+```
+
+---
+
+## Next Steps
+
+- [Gaze Estimation](gaze.md) - Eye gaze direction
+- [Anti-Spoofing](spoofing.md) - Face liveness detection
+- [Video Recipe](../recipes/video-webcam.md) - Real-time processing
--- a/docs/modules/indexing.md
+++ b/docs/modules/indexing.md
@@ -0,0 +1,172 @@
+# Indexing
+
+FAISS-backed vector store for fast similarity search over embeddings.
+
+!!! info "Optional dependency"
+    ```bash
+    pip install faiss-cpu
+    ```
+
+---
+
+## FAISS
+
+```python
+from uniface.indexing import FAISS
+```
+
+A thin wrapper around a FAISS `IndexFlatIP` (inner-product) index. Vectors
+**must** be L2-normalised before adding so that inner product equals cosine
+similarity. The store does not normalise internally.
+
+Each vector is paired with a metadata `dict` that can carry any
+JSON-serialisable payload (person ID, name, source path, etc.).
+
+### Constructor
+
+```python
+store = FAISS(embedding_size=512, db_path="./vector_index")
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `embedding_size` | `int` | `512` | Dimension of embedding vectors |
+| `db_path` | `str` | `"./vector_index"` | Directory for persisting index and metadata |
+
+---
+
+### Methods
+
+#### `add(embedding, metadata)`
+
+Add a single embedding with associated metadata.
+
+```python
+store.add(embedding, {"person_id": "alice", "source": "photo.jpg"})
+```
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `embedding` | `np.ndarray` | L2-normalised embedding vector |
+| `metadata` | `dict[str, Any]` | Arbitrary JSON-serialisable key-value pairs |
+
+---
+
+#### `search(embedding, threshold=0.4)`
+
+Find the closest match for a query embedding.
+
+```python
+result, similarity = store.search(query_embedding, threshold=0.4)
+if result:
+    print(result["person_id"], similarity)
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `embedding` | `np.ndarray` | — | L2-normalised query vector |
+| `threshold` | `float` | `0.4` | Minimum cosine similarity to accept a match |
+
+**Returns:** `(metadata, similarity)` if a match is found, or `(None, similarity)` when below threshold or the index is empty.
+
+---
+
+#### `remove(key, value)`
+
+Remove all entries where `metadata[key] == value` and rebuild the index.
+
+```python
+removed = store.remove("person_id", "bob")
+print(f"Removed {removed} entries")
+```
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `key` | `str` | Metadata key to match |
+| `value` | `Any` | Value to match |
+
+**Returns:** Number of entries removed.
+
+---
+
+#### `save()`
+
+Persist the FAISS index and metadata to disk.
+
+```python
+store.save()
+```
+
+Writes two files to `db_path`:
+
+- `faiss_index.bin` — binary FAISS index
+- `metadata.json` — JSON array of metadata dicts
+
+---
+
+#### `load()`
+
+Load a previously saved index and metadata.
+
+```python
+store = FAISS(db_path="./vector_index")
+loaded = store.load()  # True if files exist
+```
+
+**Returns:** `True` if loaded successfully, `False` if files are missing.
+
+**Raises:** `RuntimeError` if files exist but cannot be read.
+
+---
+
+### Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `size` | `int` | Number of vectors in the index |
+| `len(store)` | `int` | Same as `size` |
+
+---
+
+## Example: End-to-End
+
+```python
+import cv2
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+from uniface.indexing import FAISS
+
+detector = RetinaFace()
+recognizer = ArcFace()
+
+# Build
+store = FAISS(db_path="./my_index")
+
+image = cv2.imread("alice.jpg")
+faces = detector.detect(image)
+embedding = recognizer.get_normalized_embedding(image, faces[0].landmarks)
+store.add(embedding, {"person_id": "alice"})
+store.save()
+
+# Search
+store2 = FAISS(db_path="./my_index")
+store2.load()
+
+query = cv2.imread("unknown.jpg")
+faces = detector.detect(query)
+emb = recognizer.get_normalized_embedding(query, faces[0].landmarks)
+
+result, sim = store2.search(emb)
+if result:
+    print(f"Matched: {result['person_id']} (similarity: {sim:.3f})")
+else:
+    print(f"No match (similarity: {sim:.3f})")
+```
+
+---
+
+## See Also
+
+- [Face Search Recipe](../recipes/face-search.md) - Building and querying indexes
+- [Recognition Module](recognition.md) - Embedding extraction
+- [Thresholds Guide](../concepts/thresholds-calibration.md) - Tuning similarity thresholds
--- a/docs/modules/landmarks.md
+++ b/docs/modules/landmarks.md
@@ -20,7 +20,8 @@ Facial landmark detection provides precise localization of facial features.
 ### Basic Usage

 ```python
-from uniface import RetinaFace, Landmark106
+from uniface.detection import RetinaFace
+from uniface.landmark import Landmark106

 detector = RetinaFace()
 landmarker = Landmark106()
@@ -78,7 +79,7 @@ mouth = landmarks[87:106]
 All detection models provide 5-point landmarks:

 ```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 detector = RetinaFace()
 faces = detector.detect(image)
@@ -152,7 +153,7 @@ def draw_landmarks_with_connections(image, landmarks):
 ### Face Alignment

 ```python
-from uniface import face_alignment
+from uniface.face_utils import face_alignment

 # Align face using 5-point landmarks
 aligned = face_alignment(image, faces[0].landmarks)
@@ -236,7 +237,7 @@ def estimate_head_pose(landmarks, image_shape):
 ## Factory Function

 ```python
-from uniface import create_landmarker
+from uniface.landmark import create_landmarker

 landmarker = create_landmarker()  # Returns Landmark106
 ```
--- a/docs/modules/parsing.md
+++ b/docs/modules/parsing.md
@@ -1,15 +1,16 @@
 # Parsing

-Face parsing segments faces into semantic components (skin, eyes, nose, mouth, hair, etc.).
+Face parsing segments faces into semantic components or face regions.

 ---

 ## Available Models

-| Model | Backbone | Size | Classes |
-|-------|----------|------|---------|
-| **BiSeNet ResNet18** :material-check-circle: | ResNet18 | 51 MB | 19 |
-| BiSeNet ResNet34 | ResNet34 | 89 MB | 19 |
+| Model | Backbone | Size | Output |
+|-------|----------|------|--------|
+| **BiSeNet ResNet18** :material-check-circle: | ResNet18 | 51 MB | 19 classes |
+| BiSeNet ResNet34 | ResNet34 | 89 MB | 19 classes |
+| XSeg | - | 67 MB | Mask |

 ---

@@ -18,7 +19,7 @@ Face parsing segments faces into semantic components (skin, eyes, nose, mouth, h
 ```python
 import cv2
 from uniface.parsing import BiSeNet
-from uniface.visualization import vis_parsing_maps
+from uniface.draw import vis_parsing_maps

 # Initialize parser
 parser = BiSeNet()
@@ -45,16 +46,16 @@ cv2.imwrite("parsed.jpg", vis_bgr)

 | ID | Class | ID | Class |
 |----|-------|----|-------|
-| 0 | Background | 10 | Ear Ring |
-| 1 | Skin | 11 | Nose |
-| 2 | Left Eyebrow | 12 | Mouth |
-| 3 | Right Eyebrow | 13 | Upper Lip |
-| 4 | Left Eye | 14 | Lower Lip |
-| 5 | Right Eye | 15 | Neck |
-| 6 | Eye Glasses | 16 | Neck Lace |
-| 7 | Left Ear | 17 | Cloth |
-| 8 | Right Ear | 18 | Hair |
-| 9 | Hat | | |
+| 0 | Background | 10 | Nose |
+| 1 | Skin | 11 | Mouth |
+| 2 | Left Eyebrow | 12 | Upper Lip |
+| 3 | Right Eyebrow | 13 | Lower Lip |
+| 4 | Left Eye | 14 | Neck |
+| 5 | Right Eye | 15 | Necklace |
+| 6 | Eyeglasses | 16 | Cloth |
+| 7 | Left Ear | 17 | Hair |
+| 8 | Right Ear | 18 | Hat |
+| 9 | Earring | | |

 ---

@@ -84,9 +85,9 @@ parser = BiSeNet(model_name=ParsingWeights.RESNET34)

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.parsing import BiSeNet
-from uniface.visualization import vis_parsing_maps
+from uniface.draw import vis_parsing_maps

 detector = RetinaFace()
 parser = BiSeNet()
@@ -125,7 +126,7 @@ mask = parser.parse(face_image)

 # Extract specific component
 SKIN = 1
-HAIR = 18
+HAIR = 17
 LEFT_EYE = 4
 RIGHT_EYE = 5

@@ -148,10 +149,10 @@ mask = parser.parse(face_image)

 component_names = {
    0: 'Background', 1: 'Skin', 2: 'L-Eyebrow', 3: 'R-Eyebrow',
-    4: 'L-Eye', 5: 'R-Eye', 6: 'Glasses', 7: 'L-Ear', 8: 'R-Ear',
-    9: 'Hat', 10: 'Earring', 11: 'Nose', 12: 'Mouth',
-    13: 'U-Lip', 14: 'L-Lip', 15: 'Neck', 16: 'Necklace',
-    17: 'Cloth', 18: 'Hair'
+    4: 'L-Eye', 5: 'R-Eye', 6: 'Eyeglasses', 7: 'L-Ear', 8: 'R-Ear',
+    9: 'Earring', 10: 'Nose', 11: 'Mouth',
+    12: 'U-Lip', 13: 'L-Lip', 14: 'Neck', 15: 'Necklace',
+    16: 'Cloth', 17: 'Hair', 18: 'Hat'
 }

 for class_id in np.unique(mask):
@@ -176,23 +177,19 @@ def apply_lip_color(image, mask, color=(180, 50, 50)):
    """Apply lip color using parsing mask."""
    result = image.copy()

-    # Get lip mask (upper + lower lip)
-    lip_mask = ((mask == 13) | (mask == 14)).astype(np.uint8)
+    # Get lip mask (upper lip=12, lower lip=13)
+    lip_mask = ((mask == 12) | (mask == 13)).astype(np.uint8)

    # Create color overlay
    overlay = np.zeros_like(image)
    overlay[:] = color

-    # Blend with original
-    lip_region = cv2.bitwise_and(overlay, overlay, mask=lip_mask)
-    non_lip = cv2.bitwise_and(result, result, mask=1 - lip_mask)
-
-    # Combine with alpha blending
+    # Alpha blend lip region
    alpha = 0.4
-    result = cv2.addWeighted(result, 1 - alpha * lip_mask[:,:,np.newaxis] / 255,
-                             lip_region, alpha, 0)
+    mask_3ch = lip_mask[:, :, np.newaxis]
+    result = np.where(mask_3ch, (image * (1 - alpha) + overlay * alpha).astype(np.uint8), result)

-    return result.astype(np.uint8)
+    return result
 ```

 ### Background Replacement
@@ -218,7 +215,7 @@ def replace_background(image, mask, background):
 ```python
 def get_hair_mask(mask):
    """Extract clean hair mask."""
-    hair_mask = (mask == 18).astype(np.uint8) * 255
+    hair_mask = (mask == 17).astype(np.uint8) * 255

    # Clean up with morphological operations
    kernel = np.ones((5, 5), np.uint8)
@@ -233,7 +230,7 @@ def get_hair_mask(mask):
 ## Visualization Options

 ```python
-from uniface.visualization import vis_parsing_maps
+from uniface.draw import vis_parsing_maps

 # Default visualization
 vis_result = vis_parsing_maps(face_rgb, mask)
@@ -248,12 +245,83 @@ vis_result = vis_parsing_maps(

 ---

+## XSeg
+
+XSeg outputs a mask for face regions. Unlike BiSeNet which works on bbox crops, XSeg requires 5-point landmarks for face alignment.
+
+### Basic Usage
+
+```python
+import cv2
+from uniface.detection import RetinaFace
+from uniface.parsing import XSeg
+
+detector = RetinaFace()
+parser = XSeg()
+
+image = cv2.imread("photo.jpg")
+faces = detector.detect(image)
+
+for face in faces:
+    if face.landmarks is not None:
+        mask = parser.parse(image, landmarks=face.landmarks)
+        print(f"Mask shape: {mask.shape}")  # (H, W), values in [0, 1]
+```
+
+### Parameters
+
+```python
+from uniface.parsing import XSeg
+
+# Default settings
+parser = XSeg()
+
+# Custom settings
+parser = XSeg(
+    align_size=256,   # Face alignment size
+    blur_sigma=5,     # Gaussian blur for smoothing (0 = raw)
+)
+```
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `align_size` | 256 | Face alignment output size |
+| `blur_sigma` | 0 | Mask smoothing (0 = no blur) |
+
+### Methods
+
+```python
+# Full pipeline: align -> segment -> warp back to original space
+mask = parser.parse(image, landmarks=landmarks)
+
+# For pre-aligned face crops
+mask = parser.parse_aligned(face_crop)
+
+# Get mask + crop + inverse matrix for custom warping
+mask, face_crop, inverse_matrix = parser.parse_with_inverse(image, landmarks)
+```
+
+### BiSeNet vs XSeg
+
+| Feature | BiSeNet | XSeg |
+|---------|---------|------|
+| Output | 19 class labels | Mask [0, 1] |
+| Input | Bbox crop | Requires landmarks |
+| Use case | Facial components | Face region extraction |
+
+---
+
 ## Factory Function

 ```python
-from uniface import create_face_parser
+from uniface.parsing import create_face_parser
+from uniface.constants import ParsingWeights, XSegWeights

-parser = create_face_parser()  # Returns BiSeNet
+# BiSeNet (default)
+parser = create_face_parser()
+
+# XSeg
+parser = create_face_parser(XSegWeights.DEFAULT)
 ```

 ---
--- a/docs/modules/privacy.md
+++ b/docs/modules/privacy.md
@@ -18,25 +18,8 @@ Face anonymization protects privacy by blurring or obscuring faces in images and

 ## Quick Start

-### One-Line Anonymization
-
 ```python
-from uniface.privacy import anonymize_faces
-import cv2
-
-image = cv2.imread("group_photo.jpg")
-anonymized = anonymize_faces(image, method='pixelate')
-cv2.imwrite("anonymized.jpg", anonymized)
-```
-
---
-
-## BlurFace Class
-
-For more control, use the `BlurFace` class:
-
-```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace
 import cv2

@@ -59,12 +42,12 @@ cv2.imwrite("anonymized.jpg", anonymized)
 Blocky pixelation effect (common in news media):

 ```python
-blurrer = BlurFace(method='pixelate', pixel_blocks=10)
+blurrer = BlurFace(method='pixelate', pixel_blocks=15)
 ```

 | Parameter | Default | Description |
 |-----------|---------|-------------|
-| `pixel_blocks` | 10 | Number of blocks (lower = more pixelated) |
+| `pixel_blocks` | 15 | Number of blocks (lower = more pixelated) |

 ### Gaussian

@@ -137,7 +120,7 @@ result = blurrer.anonymize(image, faces, inplace=True)

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

 detector = RetinaFace()
@@ -166,7 +149,7 @@ cv2.destroyAllWindows()

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

 detector = RetinaFace()
@@ -238,7 +221,7 @@ def anonymize_low_confidence(image, faces, blurrer, confidence_threshold=0.8):

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

 detector = RetinaFace()
@@ -259,13 +242,13 @@ for method in methods:

 ```bash
 # Anonymize image with pixelation
-python tools/face_anonymize.py --source photo.jpg
+python tools/anonymize.py --source photo.jpg

 # Real-time webcam
-python tools/face_anonymize.py --source 0 --method gaussian
+python tools/anonymize.py --source 0 --method gaussian

 # Custom blur strength
-python tools/face_anonymize.py --source photo.jpg --method gaussian --blur-strength 5.0
+python tools/anonymize.py --source photo.jpg --method gaussian --blur-strength 5.0
 ```

 ---
--- a/docs/modules/recognition.md
+++ b/docs/modules/recognition.md
@@ -22,7 +22,8 @@ Face recognition using adaptive margin based on image quality.
 ### Basic Usage

 ```python
-from uniface import RetinaFace, AdaFace
+from uniface.detection import RetinaFace
+from uniface.recognition import AdaFace

 detector = RetinaFace()
 recognizer = AdaFace()
@@ -39,7 +40,7 @@ if faces:
 ### Model Variants

 ```python
-from uniface import AdaFace
+from uniface.recognition import AdaFace
 from uniface.constants import AdaFaceWeights

 # Lightweight (default)
@@ -69,7 +70,8 @@ Face recognition using additive angular margin loss.
 ### Basic Usage

 ```python
-from uniface import RetinaFace, ArcFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace

 detector = RetinaFace()
 recognizer = ArcFace()
@@ -86,7 +88,7 @@ if faces:
 ### Model Variants

 ```python
-from uniface import ArcFace
+from uniface.recognition import ArcFace
 from uniface.constants import ArcFaceWeights

 # Lightweight (default)
@@ -118,7 +120,7 @@ Lightweight face recognition models with MobileNet backbones.
 ### Basic Usage

 ```python
-from uniface import MobileFace
+from uniface.recognition import MobileFace

 recognizer = MobileFace()
 embedding = recognizer.get_normalized_embedding(image, landmarks)
@@ -127,7 +129,7 @@ embedding = recognizer.get_normalized_embedding(image, landmarks)
 ### Model Variants

 ```python
-from uniface import MobileFace
+from uniface.recognition import MobileFace
 from uniface.constants import MobileFaceWeights

 # Ultra-lightweight
@@ -156,7 +158,7 @@ Face recognition using angular softmax loss (A-Softmax).
 ### Basic Usage

 ```python
-from uniface import SphereFace
+from uniface.recognition import SphereFace
 from uniface.constants import SphereFaceWeights

 recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20)
@@ -175,7 +177,7 @@ embedding = recognizer.get_normalized_embedding(image, landmarks)
 ### Compute Similarity

 ```python
-from uniface import compute_similarity
+from uniface.face_utils import compute_similarity
 import numpy as np

 # Extract embeddings
@@ -211,7 +213,7 @@ Recognition models require aligned faces. UniFace handles this internally:
 embedding = recognizer.get_normalized_embedding(image, landmarks)

 # Or manually align
-from uniface import face_alignment
+from uniface.face_utils import face_alignment

 aligned_face = face_alignment(image, landmarks)
 # Returns: 112x112 aligned face image
@@ -223,7 +225,8 @@ aligned_face = face_alignment(image, landmarks)

 ```python
 import numpy as np
-from uniface import RetinaFace, ArcFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace

 detector = RetinaFace()
 recognizer = ArcFace()
@@ -282,7 +285,7 @@ else:
 ## Factory Function

 ```python
-from uniface import create_recognizer
+from uniface.recognition import create_recognizer

 # Available methods: 'arcface', 'adaface', 'mobileface', 'sphereface'
 recognizer = create_recognizer('arcface')
--- a/docs/modules/spoofing.md
+++ b/docs/modules/spoofing.md
@@ -17,7 +17,7 @@ Face anti-spoofing detects whether a face is real (live) or fake (photo, video r

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.spoofing import MiniFASNet

 detector = RetinaFace()
@@ -69,20 +69,21 @@ spoofer = MiniFASNet(model_name=MiniFASNetWeights.V1SE)

 ## Confidence Thresholds

-The default threshold is 0.5. Adjust for your use case:
+`result.is_real` is based on the model's top predicted class (argmax). If you want stricter behavior,
+apply your own confidence threshold:

 ```python
 result = spoofer.predict(image, face.bbox)

 # High security (fewer false accepts)
 HIGH_THRESHOLD = 0.7
-if result.confidence > HIGH_THRESHOLD:
+if result.is_real and result.confidence > HIGH_THRESHOLD:
    print("Real (high confidence)")
 else:
    print("Suspicious")

-# Balanced
-if result.is_real:  # Uses default 0.5 threshold
+# Balanced (argmax decision)
+if result.is_real:
    print("Real")
 else:
    print("Fake")
@@ -127,7 +128,7 @@ cv2.imwrite("spoofing_result.jpg", image)

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.spoofing import MiniFASNet

 detector = RetinaFace()
@@ -252,7 +253,7 @@ python tools/spoofing.py --source 0
 ## Factory Function

 ```python
-from uniface import create_spoofer
+from uniface.spoofing import create_spoofer

 spoofer = create_spoofer()  # Returns MiniFASNet
 ```
--- a/docs/modules/tracking.md
+++ b/docs/modules/tracking.md
@@ -0,0 +1,263 @@
+# Tracking
+
+Multi-object tracking using [BYTETracker](https://github.com/yakhyo/bytetrack-tracker) with Kalman filtering and IoU-based association. The tracker assigns persistent IDs to detected objects across video frames using a two-stage association strategy — first matching high-confidence detections, then low-confidence ones.
+
+---
+
+## How It Works
+
+BYTETracker takes detection bounding boxes as input and returns tracked bounding boxes with persistent IDs. It does not depend on any specific detector — any source of `[x1, y1, x2, y2, score]` arrays will work.
+
+Each frame, the tracker:
+
+1. Splits detections into high-confidence and low-confidence groups
+2. Matches high-confidence detections to existing tracks using IoU
+3. Matches remaining tracks to low-confidence detections (second chance)
+4. Starts new tracks for unmatched high-confidence detections
+5. Removes tracks that have been lost for too long
+
+The Kalman filter predicts where each track will be in the next frame, which helps maintain associations even when detections are noisy.
+
+---
+
+## Basic Usage
+
+```python
+import cv2
+import numpy as np
+from uniface.common import xyxy_to_cxcywh
+from uniface.detection import SCRFD
+from uniface.tracking import BYTETracker
+from uniface.draw import draw_tracks
+
+detector = SCRFD()
+tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
+
+cap = cv2.VideoCapture("video.mp4")
+
+while cap.isOpened():
+    ret, frame = cap.read()
+    if not ret:
+        break
+
+    # 1. Detect faces
+    faces = detector.detect(frame)
+
+    # 2. Build detections array: [x1, y1, x2, y2, score]
+    dets = np.array([[*f.bbox, f.confidence] for f in faces])
+    dets = dets if len(dets) > 0 else np.empty((0, 5))
+
+    # 3. Update tracker
+    tracks = tracker.update(dets)
+
+    # 4. Map track IDs back to face objects
+    if len(tracks) > 0 and len(faces) > 0:
+        face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
+        track_ids = tracks[:, 4].astype(int)
+
+        face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
+        track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
+
+        for ti in range(len(tracks)):
+            dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
+            faces[int(np.argmin(dists))].track_id = track_ids[ti]
+
+    # 5. Draw
+    tracked_faces = [f for f in faces if f.track_id is not None]
+    draw_tracks(image=frame, faces=tracked_faces)
+    cv2.imshow("Tracking", frame)
+    if cv2.waitKey(1) & 0xFF == ord('q'):
+        break
+
+cap.release()
+cv2.destroyAllWindows()
+```
+
+Each track ID gets a deterministic color via golden-ratio hue stepping, so the same person keeps the same color across the entire video.
+
+---
+
+## Webcam Tracking
+
+```python
+import cv2
+import numpy as np
+from uniface.common import xyxy_to_cxcywh
+from uniface.detection import SCRFD
+from uniface.tracking import BYTETracker
+from uniface.draw import draw_tracks
+
+detector = SCRFD()
+tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
+cap = cv2.VideoCapture(0)
+
+while True:
+    ret, frame = cap.read()
+    if not ret:
+        break
+
+    faces = detector.detect(frame)
+    dets = np.array([[*f.bbox, f.confidence] for f in faces])
+    dets = dets if len(dets) > 0 else np.empty((0, 5))
+
+    tracks = tracker.update(dets)
+
+    if len(tracks) > 0 and len(faces) > 0:
+        face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
+        track_ids = tracks[:, 4].astype(int)
+
+        face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
+        track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
+
+        for ti in range(len(tracks)):
+            dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
+            faces[int(np.argmin(dists))].track_id = track_ids[ti]
+
+    draw_tracks(image=frame, faces=[f for f in faces if f.track_id is not None])
+    cv2.imshow("Face Tracking - Press 'q' to quit", frame)
+    if cv2.waitKey(1) & 0xFF == ord('q'):
+        break
+
+cap.release()
+cv2.destroyAllWindows()
+```
+
+---
+
+## Parameters
+
+```python
+from uniface.tracking import BYTETracker
+
+tracker = BYTETracker(
+    track_thresh=0.5,
+    track_buffer=30,
+    match_thresh=0.8,
+    low_thresh=0.1,
+)
+```
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `track_thresh` | 0.5 | Detections above this score go through first-pass association |
+| `track_buffer` | 30 | How many frames to keep a lost track before removing it |
+| `match_thresh` | 0.8 | IoU threshold for matching tracks to detections |
+| `low_thresh` | 0.1 | Detections below this score are discarded entirely |
+
+---
+
+## Input / Output
+
+**Input** — `(N, 5)` numpy array with `[x1, y1, x2, y2, confidence]` per detection:
+
+```python
+detections = np.array([
+    [100, 50, 200, 160, 0.95],
+    [300, 80, 380, 200, 0.87],
+])
+```
+
+**Output** — `(M, 5)` numpy array with `[x1, y1, x2, y2, track_id]` per active track:
+
+```python
+tracks = tracker.update(detections)
+# array([[101.2, 51.3, 199.8, 159.8, 1.],
+#        [300.5, 80.2, 379.7, 200.1, 2.]])
+```
+
+The output bounding boxes come from the Kalman filter prediction, so they may differ slightly from the input. Track IDs are integers that persist across frames for the same object.
+
+---
+
+## Resetting the Tracker
+
+When switching to a different video or scene, reset the tracker to clear all internal state:
+
+```python
+tracker.reset()
+```
+
+This clears all active, lost, and removed tracks, resets the frame counter, and resets the ID counter back to zero.
+
+---
+
+## Visualization
+
+`draw_tracks` draws bounding boxes color-coded by track ID:
+
+```python
+from uniface.draw import draw_tracks
+
+draw_tracks(
+    image=frame,
+    faces=tracked_faces,
+    draw_landmarks=True,
+    draw_id=True,
+    corner_bbox=True,
+)
+```
+
+---
+
+## Small Face Performance
+
+!!! warning "Tracking performance with small faces"
+    The tracker relies on IoU (Intersection over Union) to match detections across
+    frames. When faces occupy a small portion of the image — for example in
+    surveillance footage or wide-angle cameras — even slight movement between frames
+    can cause a large drop in IoU. This makes it harder for the tracker to maintain
+    consistent IDs, and you may see IDs switching or resetting more often than expected.
+
+    This is not specific to BYTETracker; it applies to any IoU-based tracker. A few
+    things that can help:
+
+    - **Lower `match_thresh`** (e.g. `0.5` or `0.6`) so the tracker accepts lower
+      overlap as a valid match.
+    - **Increase `track_buffer`** (e.g. `60` or higher) to hold onto lost tracks
+      longer before discarding them.
+    - **Use a higher-resolution input** if possible, so face bounding boxes are
+      larger in pixel terms.
+
+    ```python
+    tracker = BYTETracker(
+        track_thresh=0.4,
+        track_buffer=60,
+        match_thresh=0.6,
+    )
+    ```
+
+---
+
+## CLI Tool
+
+```bash
+# Track faces in a video
+python tools/track.py --source video.mp4
+
+# Webcam
+python tools/track.py --source 0
+
+# Save output
+python tools/track.py --source video.mp4 --output tracked.mp4
+
+# Use RetinaFace instead of SCRFD
+python tools/track.py --source video.mp4 --detector retinaface
+
+# Keep lost tracks longer
+python tools/track.py --source video.mp4 --track-buffer 60
+```
+
+---
+
+## References
+
+- [yakhyo/bytetrack-tracker](https://github.com/yakhyo/bytetrack-tracker) — standalone BYTETracker implementation used in UniFace
+- [ByteTrack paper](https://arxiv.org/abs/2110.06864) — Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"
+
+---
+
+## See Also
+
+- [Detection](detection.md) — face detection models
+- [Video & Webcam](../recipes/video-webcam.md) — video processing patterns
+- [Inputs & Outputs](../concepts/inputs-outputs.md) — data types and formats
--- a/docs/notebooks.md
+++ b/docs/notebooks.md
@@ -16,6 +16,9 @@ Run UniFace examples directly in your browser with Google Colab, or download and
 | [Face Parsing](https://github.com/yakhyo/uniface/blob/main/examples/06_face_parsing.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/06_face_parsing.ipynb) | Semantic face segmentation |
 | [Face Anonymization](https://github.com/yakhyo/uniface/blob/main/examples/07_face_anonymization.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/07_face_anonymization.ipynb) | Privacy-preserving blur |
 | [Gaze Estimation](https://github.com/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/08_gaze_estimation.ipynb) | Gaze direction estimation |
+| [Face Segmentation](https://github.com/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/09_face_segmentation.ipynb) | Face segmentation with XSeg |
+| [Face Vector Store](https://github.com/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
+| [Head Pose Estimation](https://github.com/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | 3D head orientation estimation |

 ---

--- a/docs/overrides/main.html
+++ b/docs/overrides/main.html
@@ -0,0 +1,7 @@
+{% extends "base.html" %}
+
+{% block announce %}
+  <a href="https://github.com/yakhyo/uniface" target="_blank" rel="noopener">
+    Support our work &mdash; give UniFace a <span class="twemoji">{% include ".icons/octicons/star-fill-16.svg" %}</span> on <strong>GitHub</strong> and help us reach more developers!
+  </a>
+{% endblock %}
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -10,7 +10,7 @@ Detect faces in an image:

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 # Load image
 image = cv2.imread("photo.jpg")
@@ -46,8 +46,8 @@ Draw bounding boxes and landmarks:

 ```python
 import cv2
-from uniface import RetinaFace
-from uniface.visualization import draw_detections
+from uniface.detection import RetinaFace
+from uniface.draw import draw_detections

 # Detect faces
 detector = RetinaFace()
@@ -80,8 +80,8 @@ Compare two faces:

 ```python
 import cv2
-import numpy as np
-from uniface import RetinaFace, ArcFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace

 # Initialize models
 detector = RetinaFace()
@@ -96,12 +96,13 @@ faces1 = detector.detect(image1)
 faces2 = detector.detect(image2)

 if faces1 and faces2:
-    # Extract embeddings
+    # Extract embeddings (normalized 1-D vectors)
    emb1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
    emb2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)

-    # Compute similarity (cosine similarity)
-    similarity = np.dot(emb1, emb2.T)[0][0]
+    # Compute cosine similarity
+    from uniface import compute_similarity
+    similarity = compute_similarity(emb1, emb2, normalized=True)

    # Interpret result
    if similarity > 0.6:
@@ -121,7 +122,8 @@ if faces1 and faces2:

 ```python
 import cv2
-from uniface import RetinaFace, AgeGender
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace

 # Initialize models
 detector = RetinaFace()
@@ -133,7 +135,7 @@ faces = detector.detect(image)

 # Predict attributes
 for i, face in enumerate(faces):
-    result = age_gender.predict(image, face.bbox)
+    result = age_gender.predict(image, face)
    print(f"Face {i+1}: {result.sex}, {result.age} years old")
 ```

@@ -152,7 +154,8 @@ Detect race, gender, and age group:

 ```python
 import cv2
-from uniface import RetinaFace, FairFace
+from uniface.attribute import FairFace
+from uniface.detection import RetinaFace

 detector = RetinaFace()
 fairface = FairFace()
@@ -161,7 +164,7 @@ image = cv2.imread("photo.jpg")
 faces = detector.detect(image)

 for i, face in enumerate(faces):
-    result = fairface.predict(image, face.bbox)
+    result = fairface.predict(image, face)
    print(f"Face {i+1}: {result.sex}, {result.age_group}, {result.race}")
 ```

@@ -178,7 +181,8 @@ Face 2: Female, 20-29, White

 ```python
 import cv2
-from uniface import RetinaFace, Landmark106
+from uniface.detection import RetinaFace
+from uniface.landmark import Landmark106

 detector = RetinaFace()
 landmarker = Landmark106()
@@ -204,8 +208,9 @@ if faces:
 ```python
 import cv2
 import numpy as np
-from uniface import RetinaFace, MobileGaze
-from uniface.visualization import draw_gaze
+from uniface.detection import RetinaFace
+from uniface.gaze import MobileGaze
+from uniface.draw import draw_gaze

 detector = RetinaFace()
 gaze_estimator = MobileGaze()
@@ -229,6 +234,36 @@ cv2.imwrite("gaze_output.jpg", image)

 ---

+## Head Pose Estimation
+
+```python
+import cv2
+from uniface.detection import RetinaFace
+from uniface.headpose import HeadPose
+from uniface.draw import draw_head_pose
+
+detector = RetinaFace()
+head_pose = HeadPose()
+
+image = cv2.imread("photo.jpg")
+faces = detector.detect(image)
+
+for i, face in enumerate(faces):
+    x1, y1, x2, y2 = map(int, face.bbox[:4])
+    face_crop = image[y1:y2, x1:x2]
+
+    if face_crop.size > 0:
+        result = head_pose.estimate(face_crop)
+        print(f"Face {i+1}: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°")
+
+        # Draw 3D cube visualization
+        draw_head_pose(image, face.bbox, result.pitch, result.yaw, result.roll)
+
+cv2.imwrite("headpose_output.jpg", image)
+```
+
+---
+
 ## Face Parsing

 Segment face into semantic components:
@@ -237,7 +272,7 @@ Segment face into semantic components:
 import cv2
 import numpy as np
 from uniface.parsing import BiSeNet
-from uniface.visualization import vis_parsing_maps
+from uniface.draw import vis_parsing_maps

 parser = BiSeNet()

@@ -261,26 +296,24 @@ print(f"Detected {len(np.unique(mask))} facial components")
 Blur faces for privacy protection:

 ```python
-from uniface.privacy import anonymize_faces
 import cv2
-
-# One-liner: automatic detection and blurring
-image = cv2.imread("group_photo.jpg")
-anonymized = anonymize_faces(image, method='pixelate')
-cv2.imwrite("anonymized.jpg", anonymized)
-```
-
-**Manual control:**
-
-```python
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

 detector = RetinaFace()
-blurrer = BlurFace(method='gaussian', blur_strength=5.0)
+blurrer = BlurFace(method='pixelate')

+image = cv2.imread("group_photo.jpg")
 faces = detector.detect(image)
 anonymized = blurrer.anonymize(image, faces)
+cv2.imwrite("anonymized.jpg", anonymized)
+```
+
+**Custom blur settings:**
+
+```python
+blurrer = BlurFace(method='gaussian', blur_strength=5.0)
+anonymized = blurrer.anonymize(image, faces)
 ```

 **Available methods:**
@@ -301,7 +334,7 @@ Detect real vs. fake faces:

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.spoofing import MiniFASNet

 detector = RetinaFace()
@@ -324,8 +357,8 @@ Real-time face detection:

 ```python
 import cv2
-from uniface import RetinaFace
-from uniface.visualization import draw_detections
+from uniface.detection import RetinaFace
+from uniface.draw import draw_detections

 detector = RetinaFace()
 cap = cv2.VideoCapture(0)
@@ -355,6 +388,60 @@ cv2.destroyAllWindows()

 ---

+## Face Tracking
+
+Track faces across video frames with persistent IDs:
+
+```python
+import cv2
+import numpy as np
+from uniface.common import xyxy_to_cxcywh
+from uniface.detection import SCRFD
+from uniface.tracking import BYTETracker
+from uniface.draw import draw_tracks
+
+detector = SCRFD()
+tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
+
+cap = cv2.VideoCapture("video.mp4")
+
+while cap.isOpened():
+    ret, frame = cap.read()
+    if not ret:
+        break
+
+    faces = detector.detect(frame)
+    dets = np.array([[*f.bbox, f.confidence] for f in faces])
+    dets = dets if len(dets) > 0 else np.empty((0, 5))
+
+    tracks = tracker.update(dets)
+
+    # Assign track IDs to faces
+    if len(tracks) > 0 and len(faces) > 0:
+        face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
+        track_ids = tracks[:, 4].astype(int)
+
+        face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
+        track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
+
+        for ti in range(len(tracks)):
+            dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
+            faces[int(np.argmin(dists))].track_id = track_ids[ti]
+
+    tracked_faces = [f for f in faces if f.track_id is not None]
+    draw_tracks(image=frame, faces=tracked_faces)
+    cv2.imshow("Tracking", frame)
+    if cv2.waitKey(1) & 0xFF == ord('q'):
+        break
+
+cap.release()
+cv2.destroyAllWindows()
+```
+
+For more details, see the [Tracking module](modules/tracking.md).
+
+---
+
 ## Model Selection

 For detailed model comparisons and benchmarks, see the [Model Zoo](models.md).
@@ -365,7 +452,9 @@ For detailed model comparisons and benchmarks, see the [Model Zoo](models.md).
 |------|------------------|
 | Detection | `RetinaFace`, `SCRFD`, `YOLOv5Face`, `YOLOv8Face` |
 | Recognition | `ArcFace`, `AdaFace`, `MobileFace`, `SphereFace` |
+| Tracking | `BYTETracker` |
 | Gaze | `MobileGaze` (ResNet18/34/50, MobileNetV2, MobileOneS0) |
+| Head Pose | `HeadPose` (ResNet18/34/50, MobileNetV2/V3) |
 | Parsing | `BiSeNet` (ResNet18/34) |
 | Attributes | `AgeGender`, `FairFace`, `Emotion` |
 | Anti-Spoofing | `MiniFASNet` (V1SE, V2) |
@@ -407,13 +496,19 @@ python -c "import platform; print(platform.machine())"
 ### Import Errors

 ```python
-# Correct imports
-from uniface.detection import RetinaFace
-from uniface.recognition import ArcFace
+from uniface.detection import RetinaFace, SCRFD
+from uniface.recognition import ArcFace, AdaFace
+from uniface.attribute import AgeGender, FairFace
 from uniface.landmark import Landmark106
-
-# Also works (re-exported at package level)
-from uniface import RetinaFace, ArcFace, Landmark106
+from uniface.gaze import MobileGaze
+from uniface.headpose import HeadPose
+from uniface.parsing import BiSeNet, XSeg
+from uniface.privacy import BlurFace
+from uniface.spoofing import MiniFASNet
+from uniface.tracking import BYTETracker
+from uniface.analyzer import FaceAnalyzer
+from uniface.indexing import FAISS  # pip install faiss-cpu
+from uniface.draw import draw_detections, draw_tracks
 ```

 ---
--- a/docs/recipes/anonymize-stream.md
+++ b/docs/recipes/anonymize-stream.md
@@ -11,7 +11,7 @@ Blur faces in real-time video streams for privacy protection.

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

 detector = RetinaFace()
@@ -40,7 +40,7 @@ cv2.destroyAllWindows()

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

 detector = RetinaFace()
@@ -67,14 +67,19 @@ out.release()

 ---

-## One-Liner for Images
+## Single Image

 ```python
-from uniface.privacy import anonymize_faces
 import cv2
+from uniface.detection import RetinaFace
+from uniface.privacy import BlurFace
+
+detector = RetinaFace()
+blurrer = BlurFace(method='pixelate')

 image = cv2.imread("photo.jpg")
-result = anonymize_faces(image, method='pixelate')
+faces = detector.detect(image)
+result = blurrer.anonymize(image, faces)
 cv2.imwrite("anonymized.jpg", result)
 ```

@@ -84,7 +89,7 @@ cv2.imwrite("anonymized.jpg", result)

 | Method | Usage |
 |--------|-------|
-| Pixelate | `BlurFace(method='pixelate', pixel_blocks=10)` |
+| Pixelate | `BlurFace(method='pixelate', pixel_blocks=15)` |
 | Gaussian | `BlurFace(method='gaussian', blur_strength=3.0)` |
 | Blackout | `BlurFace(method='blackout', color=(0,0,0))` |
 | Elliptical | `BlurFace(method='elliptical', margin=20)` |
--- a/docs/recipes/batch-processing.md
+++ b/docs/recipes/batch-processing.md
@@ -12,7 +12,7 @@ Process multiple images efficiently.
 ```python
 import cv2
 from pathlib import Path
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 detector = RetinaFace()

@@ -54,7 +54,8 @@ for image_path in tqdm(image_files, desc="Processing"):
 ## Extract Embeddings

 ```python
-from uniface import RetinaFace, ArcFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
 import numpy as np

 detector = RetinaFace()
--- a/docs/recipes/custom-models.md
+++ b/docs/recipes/custom-models.md
@@ -27,29 +27,29 @@ import numpy as np

 class MyDetector(BaseDetector):
    def __init__(self, model_path: str, confidence_threshold: float = 0.5):
+        super().__init__(confidence_threshold=confidence_threshold)
        self.session = create_onnx_session(model_path)
        self.threshold = confidence_threshold

+    def preprocess(self, image: np.ndarray) -> np.ndarray:
+        # Your preprocessing logic
+        # e.g., resize, normalize, transpose
+        raise NotImplementedError
+
+    def postprocess(self, outputs, shape) -> list[Face]:
+        # Your postprocessing logic
+        # e.g., decode boxes, apply NMS, create Face objects
+        raise NotImplementedError
+
    def detect(self, image: np.ndarray) -> list[Face]:
        # 1. Preprocess image
-        input_tensor = self._preprocess(image)
+        input_tensor = self.preprocess(image)

        # 2. Run inference
        outputs = self.session.run(None, {'input': input_tensor})

        # 3. Postprocess outputs to Face objects
-        faces = self._postprocess(outputs, image.shape)
-        return faces
-
-    def _preprocess(self, image):
-        # Your preprocessing logic
-        # e.g., resize, normalize, transpose
-        pass
-
-    def _postprocess(self, outputs, shape):
-        # Your postprocessing logic
-        # e.g., decode boxes, apply NMS, create Face objects
-        pass
+        return self.postprocess(outputs, image.shape)
 ```

 ---
@@ -57,36 +57,14 @@ class MyDetector(BaseDetector):
 ## Add Custom Recognition Model

 ```python
-from uniface.recognition.base import BaseRecognizer
-from uniface.onnx_utils import create_onnx_session
-from uniface import face_alignment
-import numpy as np
+from uniface.recognition.base import BaseRecognizer, PreprocessConfig

 class MyRecognizer(BaseRecognizer):
-    def __init__(self, model_path: str):
-        self.session = create_onnx_session(model_path)
+    def __init__(self, model_path: str, providers=None):
+        preprocessing = PreprocessConfig(input_mean=127.5, input_std=127.5, input_size=(112, 112))
+        super().__init__(model_path, preprocessing, providers=providers)

-    def get_normalized_embedding(
-        self,
-        image: np.ndarray,
-        landmarks: np.ndarray
-    ) -> np.ndarray:
-        # 1. Align face
-        aligned = face_alignment(image, landmarks)
-
-        # 2. Preprocess
-        input_tensor = self._preprocess(aligned)
-
-        # 3. Run inference
-        embedding = self.session.run(None, {'input': input_tensor})[0]
-
-        # 4. Normalize
-        embedding = embedding / np.linalg.norm(embedding)
-        return embedding
-
-    def _preprocess(self, image):
-        # Your preprocessing logic
-        pass
+    # Optional: override preprocess() if your model expects custom normalization.
 ```

 ---
--- a/docs/recipes/face-search.md
+++ b/docs/recipes/face-search.md
@@ -1,178 +1,166 @@
 # Face Search

-Build a face search system for finding people in images.
+Find and identify people in images and video streams.

-!!! note "Work in Progress"
-    This page contains example code patterns. Test thoroughly before using in production.
+UniFace supports two search approaches:
+
+| Approach             | Use case                                         | Tool                    |
+| -------------------- | ------------------------------------------------ | ----------------------- |
+| **Reference search** | "Is this specific person in the video?"          | `tools/search.py`       |
+| **Vector search**    | "Who is this?" against a database of known faces | `tools/faiss_search.py` |

 ---

-## Basic Face Database
+## Reference Search (single image)
+
+Compare every detected face against a single reference photo:

 ```python
+import cv2
 import numpy as np
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+from uniface.face_utils import compute_similarity
+
+detector = RetinaFace()
+recognizer = ArcFace()
+
+ref_image = cv2.imread("reference.jpg")
+ref_faces = detector.detect(ref_image)
+ref_embedding = recognizer.get_normalized_embedding(ref_image, ref_faces[0].landmarks)
+
+query_image = cv2.imread("group_photo.jpg")
+faces = detector.detect(query_image)
+
+for face in faces:
+    embedding = recognizer.get_normalized_embedding(query_image, face.landmarks)
+    sim = compute_similarity(ref_embedding, embedding)
+
+    label = f"Match ({sim:.2f})" if sim > 0.4 else f"Unknown ({sim:.2f})"
+    print(label)
+```
+
+**CLI tool:**
+
+```bash
+python tools/search.py --reference ref.jpg --source video.mp4
+python tools/search.py --reference ref.jpg --source 0  # webcam
+```
+
+---
+
+## Vector Search (FAISS index)
+
+For identifying faces against a database of many known people, use the
+[`FAISS`](../modules/indexing.md) vector store.
+
+!!! info "Install extra"
+`bash
+    pip install faiss-cpu
+    `
+
+### Build an index
+
+Organise face images in person sub-folders:
+
+```
+dataset/
+├── alice/
+│   ├── 001.jpg
+│   └── 002.jpg
+├── bob/
+│   └── 001.jpg
+└── charlie/
+    ├── 001.jpg
+    └── 002.jpg
+```
+
+```python
 import cv2
 from pathlib import Path
-from uniface import RetinaFace, ArcFace
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+from uniface.indexing import FAISS

-class FaceDatabase:
-    def __init__(self):
-        self.detector = RetinaFace()
-        self.recognizer = ArcFace()
-        self.embeddings = {}
+detector = RetinaFace()
+recognizer = ArcFace()
+store = FAISS(db_path="./my_index")

-    def add_face(self, person_id, image):
-        """Add a face to the database."""
-        faces = self.detector.detect(image)
-        if not faces:
-            raise ValueError(f"No face found for {person_id}")
+for person_dir in sorted(Path("dataset").iterdir()):
+    if not person_dir.is_dir():
+        continue
+    for img_path in person_dir.glob("*.jpg"):
+        image = cv2.imread(str(img_path))
+        faces = detector.detect(image)
+        if faces:
+            emb = recognizer.get_normalized_embedding(image, faces[0].landmarks)
+            store.add(emb, {"person_id": person_dir.name, "source": str(img_path)})

-        face = max(faces, key=lambda f: f.confidence)
-        embedding = self.recognizer.get_normalized_embedding(image, face.landmarks)
-        self.embeddings[person_id] = embedding
-        return True
-
-    def search(self, image, threshold=0.6):
-        """Search for faces in an image."""
-        faces = self.detector.detect(image)
-        results = []
-
-        for face in faces:
-            embedding = self.recognizer.get_normalized_embedding(image, face.landmarks)
-
-            best_match = None
-            best_similarity = -1
-
-            for person_id, db_embedding in self.embeddings.items():
-                similarity = np.dot(embedding, db_embedding.T)[0][0]
-                if similarity > best_similarity:
-                    best_similarity = similarity
-                    best_match = person_id
-
-            results.append({
-                'bbox': face.bbox,
-                'match': best_match if best_similarity >= threshold else None,
-                'similarity': best_similarity
-            })
-
-        return results
-
-    def save(self, path):
-        """Save database to file."""
-        np.savez(path, embeddings=dict(self.embeddings))
-
-    def load(self, path):
-        """Load database from file."""
-        data = np.load(path, allow_pickle=True)
-        self.embeddings = data['embeddings'].item()
-
-# Usage
-db = FaceDatabase()
-
-# Add faces
-for image_path in Path("known_faces/").glob("*.jpg"):
-    person_id = image_path.stem
-    image = cv2.imread(str(image_path))
-    try:
-        db.add_face(person_id, image)
-        print(f"Added: {person_id}")
-    except ValueError as e:
-        print(f"Skipped: {e}")
-
-# Save database
-db.save("face_database.npz")
-
-# Search
-query_image = cv2.imread("group_photo.jpg")
-results = db.search(query_image)
-
-for r in results:
-    if r['match']:
-        print(f"Found: {r['match']} (similarity: {r['similarity']:.3f})")
+store.save()
+print(f"Index saved: {store}")
 ```

---
+**CLI tool:**

-## Visualization
+```bash
+python tools/faiss_search.py build --faces-dir dataset/ --db-path ./my_index
+```
+
+### Search against the index

 ```python
 import cv2
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+from uniface.indexing import FAISS

-def visualize_search_results(image, results):
-    """Draw search results on image."""
-    for r in results:
-        x1, y1, x2, y2 = map(int, r['bbox'])
+detector = RetinaFace()
+recognizer = ArcFace()

-        if r['match']:
-            color = (0, 255, 0)  # Green for match
-            label = f"{r['match']} ({r['similarity']:.2f})"
-        else:
-            color = (0, 0, 255)  # Red for unknown
-            label = f"Unknown ({r['similarity']:.2f})"
+store = FAISS(db_path="./my_index")
+store.load()

-        cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
-        cv2.putText(image, label, (x1, y1 - 10),
-                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
+image = cv2.imread("query.jpg")
+faces = detector.detect(image)

-    return image
+for face in faces:
+    embedding = recognizer.get_normalized_embedding(image, face.landmarks)
+    result, similarity = store.search(embedding, threshold=0.4)

-# Usage
-results = db.search(image)
-annotated = visualize_search_results(image.copy(), results)
-cv2.imwrite("search_result.jpg", annotated)
+    if result:
+        print(f"Matched: {result['person_id']} ({similarity:.2f})")
+    else:
+        print(f"Unknown ({similarity:.2f})")
 ```

---
+**CLI tool:**

-## Real-Time Search
+```bash
+python tools/faiss_search.py run --db-path ./my_index --source video.mp4
+python tools/faiss_search.py run --db-path ./my_index --source 0  # webcam
+```
+
+### Manage the index

 ```python
-import cv2
+from uniface.indexing import FAISS

-def realtime_search(db):
-    """Real-time face search from webcam."""
-    cap = cv2.VideoCapture(0)
+store = FAISS(db_path="./my_index")
+store.load()

-    while True:
-        ret, frame = cap.read()
-        if not ret:
-            break
+print(f"Total vectors: {len(store)}")

-        results = db.search(frame, threshold=0.5)
+removed = store.remove("person_id", "bob")
+print(f"Removed {removed} entries")

-        for r in results:
-            x1, y1, x2, y2 = map(int, r['bbox'])
-
-            if r['match']:
-                color = (0, 255, 0)
-                label = r['match']
-            else:
-                color = (0, 0, 255)
-                label = "Unknown"
-
-            cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
-            cv2.putText(frame, label, (x1, y1 - 10),
-                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
-
-        cv2.imshow("Face Search", frame)
-
-        if cv2.waitKey(1) & 0xFF == ord('q'):
-            break
-
-    cap.release()
-    cv2.destroyAllWindows()
-
-# Usage
-db = FaceDatabase()
-db.load("face_database.npz")
-realtime_search(db)
+store.save()
 ```

 ---

 ## See Also

+- [Indexing Module](../modules/indexing.md) - Full `FAISS` API reference
 - [Recognition Module](../modules/recognition.md) - Face recognition details
- [Batch Processing](batch-processing.md) - Process multiple files
 - [Video & Webcam](video-webcam.md) - Real-time processing
 - [Concepts: Thresholds](../concepts/thresholds-calibration.md) - Tuning similarity thresholds
--- a/docs/recipes/image-pipeline.md
+++ b/docs/recipes/image-pipeline.md
@@ -8,8 +8,10 @@ A complete pipeline for processing images with detection, recognition, and attri

 ```python
 import cv2
-from uniface import RetinaFace, ArcFace, AgeGender
-from uniface.visualization import draw_detections
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
+from uniface.draw import draw_detections

 # Initialize models
 detector = RetinaFace()
@@ -32,7 +34,7 @@ def process_image(image_path):
        embedding = recognizer.get_normalized_embedding(image, face.landmarks)

        # Step 3: Predict attributes
-        attrs = age_gender.predict(image, face.bbox)
+        attrs = age_gender.predict(image, face)

        results.append({
            'face_id': i,
@@ -67,14 +69,21 @@ cv2.imwrite("result.jpg", result_image)
 For convenience, use the built-in `FaceAnalyzer`:

 ```python
-from uniface import FaceAnalyzer
+from uniface.analyzer import FaceAnalyzer
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace
+from uniface.recognition import ArcFace
 import cv2

 # Initialize with desired modules
+detector = RetinaFace()
+recognizer = ArcFace()
+age_gender = AgeGender()
+
 analyzer = FaceAnalyzer(
-    detect=True,
-    recognize=True,
-    attributes=True
+    detector,
+    recognizer=recognizer,
+    attributes=[age_gender],
 )

 # Process image
@@ -97,13 +106,15 @@ Complete pipeline with all modules:
 ```python
 import cv2
 import numpy as np
-from uniface import (
-    RetinaFace, ArcFace, AgeGender, FairFace,
-    Landmark106, MobileGaze
-)
+from uniface.attribute import AgeGender, FairFace
+from uniface.detection import RetinaFace
+from uniface.gaze import MobileGaze
+from uniface.headpose import HeadPose
+from uniface.landmark import Landmark106
+from uniface.recognition import ArcFace
 from uniface.parsing import BiSeNet
 from uniface.spoofing import MiniFASNet
-from uniface.visualization import draw_detections, draw_gaze
+from uniface.draw import draw_detections, draw_gaze, draw_head_pose

 class FaceAnalysisPipeline:
    def __init__(self):
@@ -114,6 +125,7 @@ class FaceAnalysisPipeline:
        self.fairface = FairFace()
        self.landmarker = Landmark106()
        self.gaze = MobileGaze()
+        self.head_pose = HeadPose()
        self.parser = BiSeNet()
        self.spoofer = MiniFASNet()

@@ -135,12 +147,12 @@ class FaceAnalysisPipeline:
            )

            # Attributes
-            ag_result = self.age_gender.predict(image, face.bbox)
+            ag_result = self.age_gender.predict(image, face)
            result['age'] = ag_result.age
            result['gender'] = ag_result.sex

            # FairFace attributes
-            ff_result = self.fairface.predict(image, face.bbox)
+            ff_result = self.fairface.predict(image, face)
            result['age_group'] = ff_result.age_group
            result['race'] = ff_result.race

@@ -157,6 +169,13 @@ class FaceAnalysisPipeline:
                result['gaze_pitch'] = gaze_result.pitch
                result['gaze_yaw'] = gaze_result.yaw

+            # Head pose estimation
+            if face_crop.size > 0:
+                hp_result = self.head_pose.estimate(face_crop)
+                result['head_pitch'] = hp_result.pitch
+                result['head_yaw'] = hp_result.yaw
+                result['head_roll'] = hp_result.roll
+
            # Face parsing
            if face_crop.size > 0:
                result['parsing_mask'] = self.parser.parse(face_crop)
@@ -179,6 +198,7 @@ for i, r in enumerate(results):
    print(f"  Gender: {r['gender']}, Age: {r['age']}")
    print(f"  Race: {r['race']}, Age Group: {r['age_group']}")
    print(f"  Gaze: pitch={np.degrees(r['gaze_pitch']):.1f}°")
+    print(f"  Head Pose: P={r['head_pitch']:.1f}° Y={r['head_yaw']:.1f}° R={r['head_roll']:.1f}°")
    print(f"  Real: {r['is_real']} ({r['spoof_confidence']:.1%})")
 ```

@@ -189,8 +209,10 @@ for i, r in enumerate(results):
 ```python
 import cv2
 import numpy as np
-from uniface import RetinaFace, AgeGender, MobileGaze
-from uniface.visualization import draw_detections, draw_gaze
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace
+from uniface.gaze import MobileGaze
+from uniface.draw import draw_detections, draw_gaze

 def visualize_analysis(image_path, output_path):
    """Create annotated visualization of face analysis."""
@@ -208,7 +230,7 @@ def visualize_analysis(image_path, output_path):
        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Age and gender
-        attrs = age_gender.predict(image, face.bbox)
+        attrs = age_gender.predict(image, face)
        label = f"{attrs.sex}, {attrs.age}y"
        cv2.putText(image, label, (x1, y1 - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
@@ -256,6 +278,11 @@ def results_to_json(results):
            'gaze': {
                'pitch_deg': float(np.degrees(r['gaze_pitch'])) if 'gaze_pitch' in r else None,
                'yaw_deg': float(np.degrees(r['gaze_yaw'])) if 'gaze_yaw' in r else None
+            },
+            'head_pose': {
+                'pitch': float(r['head_pitch']) if 'head_pitch' in r else None,
+                'yaw': float(r['head_yaw']) if 'head_yaw' in r else None,
+                'roll': float(r['head_roll']) if 'head_roll' in r else None
            }
        }
        output.append(item)
@@ -279,3 +306,4 @@ with open('results.json', 'w') as f:
 - [Face Search](face-search.md) - Build a search system
 - [Detection Module](../modules/detection.md) - Detection options
 - [Recognition Module](../modules/recognition.md) - Recognition details
+- [Head Pose Module](../modules/headpose.md) - Head orientation estimation
--- a/docs/recipes/video-webcam.md
+++ b/docs/recipes/video-webcam.md
@@ -11,8 +11,8 @@ Real-time face analysis for video streams.

 ```python
 import cv2
-from uniface import RetinaFace
-from uniface.visualization import draw_detections
+from uniface.detection import RetinaFace
+from uniface.draw import draw_detections

 detector = RetinaFace()
 cap = cv2.VideoCapture(0)
@@ -48,7 +48,7 @@ cv2.destroyAllWindows()

 ```python
 import cv2
-from uniface import RetinaFace
+from uniface.detection import RetinaFace

 def process_video(input_path, output_path):
    """Process a video file."""
@@ -83,6 +83,57 @@ process_video("input.mp4", "output.mp4")

 ---

+## Webcam Tracking
+
+To track faces across frames with persistent IDs, pair a detector with `BYTETracker`:
+
+```python
+import cv2
+import numpy as np
+from uniface.common import xyxy_to_cxcywh
+from uniface.detection import SCRFD
+from uniface.tracking import BYTETracker
+from uniface.draw import draw_tracks
+
+detector = SCRFD()
+tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
+cap = cv2.VideoCapture(0)
+
+while True:
+    ret, frame = cap.read()
+    if not ret:
+        break
+
+    faces = detector.detect(frame)
+    dets = np.array([[*f.bbox, f.confidence] for f in faces])
+    dets = dets if len(dets) > 0 else np.empty((0, 5))
+
+    tracks = tracker.update(dets)
+
+    if len(tracks) > 0 and len(faces) > 0:
+        face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
+        track_ids = tracks[:, 4].astype(int)
+
+        face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
+        track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
+
+        for ti in range(len(tracks)):
+            dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
+            faces[int(np.argmin(dists))].track_id = track_ids[ti]
+
+    draw_tracks(image=frame, faces=[f for f in faces if f.track_id is not None])
+    cv2.imshow("Face Tracking", frame)
+    if cv2.waitKey(1) & 0xFF == ord('q'):
+        break
+
+cap.release()
+cv2.destroyAllWindows()
+```
+
+For more details on tracker parameters and tuning, see [Tracking](../modules/tracking.md).
+
+---
+
 ## Performance Tips

 ### Skip Frames
@@ -119,7 +170,9 @@ while True:

 ## See Also

+- [Tracking Module](../modules/tracking.md) - Face tracking with BYTETracker
 - [Anonymize Stream](anonymize-stream.md) - Privacy protection in video
 - [Batch Processing](batch-processing.md) - Process multiple files
 - [Detection Module](../modules/detection.md) - Detection options
- [Gaze Module](../modules/gaze.md) - Gaze tracking
+- [Gaze Module](../modules/gaze.md) - Gaze estimation
+- [Head Pose Module](../modules/headpose.md) - Head orientation estimation
--- a/examples/01_face_detection.ipynb
+++ b/examples/01_face_detection.ipynb
@@ -51,7 +51,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "2.0.0\n"
+      "3.2.0\n"
     ]
    }
   ],
@@ -62,7 +62,7 @@
    "\n",
    "import uniface\n",
    "from uniface.detection import RetinaFace\n",
-    "from uniface.visualization import draw_detections\n",
+    "from uniface.draw import draw_detections\n",
    "\n",
    "print(uniface.__version__)"
   ]
@@ -162,7 +162,7 @@
    "landmarks = [f.landmarks for f in faces]\n",
    "\n",
    "# Draw detections\n",
-    "draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
+    "draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, corner_bbox=True)\n",
    "\n",
    "# Display result\n",
    "output_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
@@ -214,7 +214,7 @@
    "scores = [f.confidence for f in faces]\n",
    "landmarks = [f.landmarks for f in faces]\n",
    "\n",
-    "draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
+    "draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, corner_bbox=True)\n",
    "\n",
    "output_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
    "display.display(Image.fromarray(output_image))"
@@ -261,7 +261,7 @@
    "scores = [f.confidence for f in faces]\n",
    "landmarks = [f.landmarks for f in faces]\n",
    "\n",
-    "draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
+    "draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, corner_bbox=True)\n",
    "\n",
    "output_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
    "display.display(Image.fromarray(output_image))"
--- a/examples/02_face_alignment.ipynb
+++ b/examples/02_face_alignment.ipynb
@@ -55,7 +55,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "2.0.0\n"
+      "3.2.0\n"
     ]
    }
   ],
@@ -67,7 +67,7 @@
    "import uniface\n",
    "from uniface.detection import RetinaFace\n",
    "from uniface.face_utils import face_alignment\n",
-    "from uniface.visualization import draw_detections\n",
+    "from uniface.draw import draw_detections\n",
    "\n",
    "print(uniface.__version__)"
   ]
@@ -142,7 +142,7 @@
    "    bboxes = [f.bbox for f in faces]\n",
    "    scores = [f.confidence for f in faces]\n",
    "    landmarks = [f.landmarks for f in faces]\n",
-    "    draw_detections(image=bbox_image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
+    "    draw_detections(image=bbox_image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, corner_bbox=True)\n",
    "\n",
    "    # Align first detected face (returns aligned image and inverse transform matrix)\n",
    "    first_landmarks = faces[0].landmarks\n",
--- a/examples/03_face_verification.ipynb
+++ b/examples/03_face_verification.ipynb
@@ -44,7 +44,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "2.0.0\n"
+      "3.2.0\n"
     ]
    }
   ],
@@ -53,7 +53,7 @@
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import uniface\n",
-    "from uniface import FaceAnalyzer\n",
+    "from uniface.analyzer import FaceAnalyzer\n",
    "from uniface.detection import RetinaFace\n",
    "from uniface.recognition import ArcFace\n",
    "\n",
--- a/examples/04_face_search.ipynb
+++ b/examples/04_face_search.ipynb
--- a/examples/05_face_analyzer.ipynb
+++ b/examples/05_face_analyzer.ipynb
--- a/examples/06_face_parsing.ipynb
+++ b/examples/06_face_parsing.ipynb
@@ -15,7 +15,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
@@ -53,7 +53,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "UniFace version: 2.0.0\n"
+      "UniFace version: 3.2.0\n"
     ]
    }
   ],
@@ -66,7 +66,7 @@
    "import uniface\n",
    "from uniface.parsing import BiSeNet\n",
    "from uniface.constants import ParsingWeights\n",
-    "from uniface.visualization import vis_parsing_maps\n",
+    "from uniface.draw import vis_parsing_maps\n",
    "\n",
    "print(f\"UniFace version: {uniface.__version__}\")"
   ]
@@ -82,15 +82,7 @@
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "✓ Model loaded (CoreML (Apple Silicon))\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "# Initialize face parser (uses ResNet18 by default)\n",
    "parser = BiSeNet(model_name=ParsingWeights.RESNET34) # use resnet34 for better accuracy"
--- a/examples/07_face_anonymization.ipynb
+++ b/examples/07_face_anonymization.ipynb
--- a/examples/08_gaze_estimation.ipynb
+++ b/examples/08_gaze_estimation.ipynb
@@ -51,7 +51,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "UniFace version: 2.0.0\n"
+      "UniFace version: 3.2.0\n"
     ]
    }
   ],
@@ -65,7 +65,7 @@
    "import uniface\n",
    "from uniface.detection import RetinaFace\n",
    "from uniface.gaze import MobileGaze\n",
-    "from uniface.visualization import draw_gaze\n",
+    "from uniface.draw import draw_gaze\n",
    "\n",
    "print(f\"UniFace version: {uniface.__version__}\")"
   ]
@@ -110,19 +110,19 @@
     "text": [
      "Processing: image0.jpg\n",
      "  Detected 1 face(s)\n",
-      "    Face 1: pitch=-0.0°, yaw=7.1°\n",
+      "    Face 1: pitch=7.1°, yaw=-0.0°\n",
      "Processing: image1.jpg\n",
      "  Detected 1 face(s)\n",
-      "    Face 1: pitch=-3.3°, yaw=-5.6°\n",
+      "    Face 1: pitch=-5.6°, yaw=-3.3°\n",
      "Processing: image2.jpg\n",
      "  Detected 1 face(s)\n",
-      "    Face 1: pitch=-3.9°, yaw=-0.3°\n",
+      "    Face 1: pitch=-0.3°, yaw=-3.9°\n",
      "Processing: image3.jpg\n",
      "  Detected 1 face(s)\n",
-      "    Face 1: pitch=-22.1°, yaw=1.0°\n",
+      "    Face 1: pitch=1.0°, yaw=-22.1°\n",
      "Processing: image4.jpg\n",
      "  Detected 1 face(s)\n",
-      "    Face 1: pitch=2.1°, yaw=5.0°\n",
+      "    Face 1: pitch=5.0°, yaw=2.1°\n",
      "\n",
      "Processed 5 images\n"
     ]
--- a/examples/09_face_segmentation.ipynb
+++ b/examples/09_face_segmentation.ipynb
--- a/examples/10_face_vector_store.ipynb
+++ b/examples/10_face_vector_store.ipynb
--- a/examples/11_head_pose_estimation.ipynb
+++ b/examples/11_head_pose_estimation.ipynb
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -48,6 +48,7 @@ theme:
    - content.action.edit
    - content.action.view
    - content.tabs.link
+    - announce.dismiss
    - toc.follow

  icon:
@@ -134,6 +135,7 @@ nav:
      - Quickstart: quickstart.md
      - Notebooks: notebooks.md
      - Model Zoo: models.md
+      - Datasets: datasets.md
  - Tutorials:
      - Image Pipeline: recipes/image-pipeline.md
      - Video & Webcam: recipes/video-webcam.md
@@ -144,12 +146,15 @@ nav:
  - API Reference:
      - Detection: modules/detection.md
      - Recognition: modules/recognition.md
+      - Tracking: modules/tracking.md
      - Landmarks: modules/landmarks.md
      - Attributes: modules/attributes.md
      - Parsing: modules/parsing.md
      - Gaze: modules/gaze.md
+      - Head Pose: modules/headpose.md
      - Anti-Spoofing: modules/spoofing.md
      - Privacy: modules/privacy.md
+      - Indexing: modules/indexing.md
  - Guides:
      - Overview: concepts/overview.md
      - Inputs & Outputs: concepts/inputs-outputs.md
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "uniface"
-version = "2.2.1"
-description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
+version = "3.2.0"
+description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Tracking, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
 readme = "README.md"
 license = "MIT"
 authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
@@ -9,10 +9,11 @@ maintainers = [
    { name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" },
 ]

-requires-python = ">=3.10,<3.14"
+requires-python = ">=3.11,<3.15"
 keywords = [
    "face-detection",
    "face-recognition",
+    "face-tracking",
    "facial-landmarks",
    "face-parsing",
    "face-segmentation",
@@ -28,23 +29,23 @@ keywords = [
 ]

 classifiers = [
-    "Development Status :: 4 - Beta",
+    "Development Status :: 5 - Production/Stable",
    "Intended Audience :: Developers",
    "Intended Audience :: Science/Research",
    "Operating System :: OS Independent",
    "Programming Language :: Python :: 3",
-    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: 3.13",
+    "Programming Language :: Python :: 3.14",
 ]

 dependencies = [
    "numpy>=1.21.0",
    "opencv-python>=4.5.0",
-    "onnx>=1.12.0",
    "onnxruntime>=1.16.0",
-    "scikit-image>=0.19.0",
+    "scikit-image>=0.26.0",
+    "scipy>=1.7.0",
    "requests>=2.28.0",
    "tqdm>=4.64.0",
 ]
@@ -56,9 +57,9 @@ gpu = ["onnxruntime-gpu>=1.16.0"]
 [project.urls]
 Homepage = "https://github.com/yakhyo/uniface"
 Repository = "https://github.com/yakhyo/uniface"
-Documentation = "https://github.com/yakhyo/uniface/blob/main/README.md"
-"Quick Start" = "https://github.com/yakhyo/uniface/blob/main/QUICKSTART.md"
-"Model Zoo" = "https://github.com/yakhyo/uniface/blob/main/MODELS.md"
+Documentation = "https://yakhyo.github.io/uniface"
+"Quick Start" = "https://yakhyo.github.io/uniface/quickstart/"
+"Model Zoo" = "https://yakhyo.github.io/uniface/models/"

 [build-system]
 requires = ["setuptools>=64", "wheel"]
@@ -72,7 +73,7 @@ uniface = ["py.typed"]

 [tool.ruff]
 line-length = 120
-target-version = "py310"
+target-version = "py311"
 exclude = [
    ".git",
    ".ruff_cache",
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,8 +1,7 @@
 numpy>=1.21.0
 opencv-python>=4.5.0
-onnx>=1.12.0
 onnxruntime>=1.16.0
-scikit-image>=0.19.0
+scikit-image>=0.26.0
+scipy>=1.7.0
 requests>=2.28.0
-pytest>=7.0.0
 tqdm>=4.64.0
--- a/tests/test_age_gender.py
+++ b/tests/test_age_gender.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for AgeGender attribute predictor."""

 from __future__ import annotations

@@ -10,6 +9,14 @@ import numpy as np
 import pytest

 from uniface.attribute import AgeGender, AttributeResult
+from uniface.types import Face
+
+
+def _make_face(bbox: list[int] | np.ndarray) -> Face:
+    """Helper: build a minimal Face from a bounding box."""
+    bbox = np.asarray(bbox)
+    landmarks = np.zeros((5, 2), dtype=np.float32)
+    return Face(bbox=bbox, confidence=0.99, landmarks=landmarks)


@pytest.fixture
@@ -23,30 +30,30 @@ def mock_image():


@pytest.fixture
-def mock_bbox():
-    return [100, 100, 300, 300]
+def mock_face():
+    return _make_face([100, 100, 300, 300])


 def test_model_initialization(age_gender_model):
    assert age_gender_model is not None, 'AgeGender model initialization failed.'


-def test_prediction_output_format(age_gender_model, mock_image, mock_bbox):
-    result = age_gender_model.predict(mock_image, mock_bbox)
+def test_prediction_output_format(age_gender_model, mock_image, mock_face):
+    result = age_gender_model.predict(mock_image, mock_face)
    assert isinstance(result, AttributeResult), f'Result should be AttributeResult, got {type(result)}'
    assert isinstance(result.gender, int), f'Gender should be int, got {type(result.gender)}'
    assert isinstance(result.age, int), f'Age should be int, got {type(result.age)}'
    assert isinstance(result.sex, str), f'Sex should be str, got {type(result.sex)}'


-def test_gender_values(age_gender_model, mock_image, mock_bbox):
-    result = age_gender_model.predict(mock_image, mock_bbox)
+def test_gender_values(age_gender_model, mock_image, mock_face):
+    result = age_gender_model.predict(mock_image, mock_face)
    assert result.gender in [0, 1], f'Gender should be 0 (Female) or 1 (Male), got {result.gender}'
    assert result.sex in ['Female', 'Male'], f'Sex should be Female or Male, got {result.sex}'


-def test_age_range(age_gender_model, mock_image, mock_bbox):
-    result = age_gender_model.predict(mock_image, mock_bbox)
+def test_age_range(age_gender_model, mock_image, mock_face):
+    result = age_gender_model.predict(mock_image, mock_face)
    assert 0 <= result.age <= 120, f'Age should be between 0 and 120, got {result.age}'


@@ -58,39 +65,52 @@ def test_different_bbox_sizes(age_gender_model, mock_image):
    ]

    for bbox in test_bboxes:
-        result = age_gender_model.predict(mock_image, bbox)
+        face = _make_face(bbox)
+        result = age_gender_model.predict(mock_image, face)
        assert result.gender in [0, 1], f'Failed for bbox {bbox}'
        assert 0 <= result.age <= 120, f'Age out of range for bbox {bbox}'


-def test_different_image_sizes(age_gender_model, mock_bbox):
+def test_different_image_sizes(age_gender_model):
    test_sizes = [(480, 640, 3), (720, 1280, 3), (1080, 1920, 3)]
+    face = _make_face([100, 100, 300, 300])

    for size in test_sizes:
        mock_image = np.random.randint(0, 255, size, dtype=np.uint8)
-        result = age_gender_model.predict(mock_image, mock_bbox)
+        result = age_gender_model.predict(mock_image, face)
        assert result.gender in [0, 1], f'Failed for image size {size}'
        assert 0 <= result.age <= 120, f'Age out of range for image size {size}'


-def test_consistency(age_gender_model, mock_image, mock_bbox):
-    result1 = age_gender_model.predict(mock_image, mock_bbox)
-    result2 = age_gender_model.predict(mock_image, mock_bbox)
+def test_consistency(age_gender_model, mock_image, mock_face):
+    result1 = age_gender_model.predict(mock_image, mock_face)
+    result2 = age_gender_model.predict(mock_image, mock_face)

    assert result1.gender == result2.gender, 'Same input should produce same gender prediction'
    assert result1.age == result2.age, 'Same input should produce same age prediction'


+def test_face_enrichment(age_gender_model, mock_image, mock_face):
+    """predict() must write gender & age back to the Face object."""
+    assert mock_face.gender is None
+    assert mock_face.age is None
+
+    result = age_gender_model.predict(mock_image, mock_face)
+
+    assert mock_face.gender == result.gender
+    assert mock_face.age == result.age
+
+
 def test_bbox_list_format(age_gender_model, mock_image):
-    bbox_list = [100, 100, 300, 300]
-    result = age_gender_model.predict(mock_image, bbox_list)
+    face = _make_face([100, 100, 300, 300])
+    result = age_gender_model.predict(mock_image, face)
    assert result.gender in [0, 1], 'Should work with bbox as list'
    assert 0 <= result.age <= 120, 'Age should be in valid range'


 def test_bbox_array_format(age_gender_model, mock_image):
-    bbox_array = np.array([100, 100, 300, 300])
-    result = age_gender_model.predict(mock_image, bbox_array)
+    face = _make_face(np.array([100, 100, 300, 300]))
+    result = age_gender_model.predict(mock_image, face)
    assert result.gender in [0, 1], 'Should work with bbox as numpy array'
    assert 0 <= result.age <= 120, 'Age should be in valid range'

@@ -104,7 +124,8 @@ def test_multiple_predictions(age_gender_model, mock_image):

    results = []
    for bbox in bboxes:
-        result = age_gender_model.predict(mock_image, bbox)
+        face = _make_face(bbox)
+        result = age_gender_model.predict(mock_image, face)
        results.append(result)

    assert len(results) == 3, 'Should have 3 predictions'
@@ -113,28 +134,26 @@ def test_multiple_predictions(age_gender_model, mock_image):
        assert 0 <= result.age <= 120


-def test_age_is_positive(age_gender_model, mock_image, mock_bbox):
+def test_age_is_positive(age_gender_model, mock_image, mock_face):
    for _ in range(5):
-        result = age_gender_model.predict(mock_image, mock_bbox)
+        result = age_gender_model.predict(mock_image, mock_face)
        assert result.age >= 0, f'Age should be non-negative, got {result.age}'


-def test_output_format_for_visualization(age_gender_model, mock_image, mock_bbox):
-    result = age_gender_model.predict(mock_image, mock_bbox)
+def test_output_format_for_visualization(age_gender_model, mock_image, mock_face):
+    result = age_gender_model.predict(mock_image, mock_face)
    text = f'{result.sex}, {result.age}y'
    assert isinstance(text, str), 'Should be able to format as string'
    assert 'Male' in text or 'Female' in text, 'Text should contain gender'
    assert 'y' in text, "Text should contain 'y' for years"


-def test_attribute_result_fields(age_gender_model, mock_image, mock_bbox):
+def test_attribute_result_fields(age_gender_model, mock_image, mock_face):
    """Test that AttributeResult has correct fields for AgeGender model."""
-    result = age_gender_model.predict(mock_image, mock_bbox)
+    result = age_gender_model.predict(mock_image, mock_face)

-    # AgeGender should set gender and age
    assert result.gender is not None
    assert result.age is not None

-    # AgeGender should NOT set race and age_group (FairFace only)
    assert result.race is None
    assert result.age_group is None
--- a/tests/test_draw.py
+++ b/tests/test_draw.py
@@ -0,0 +1,61 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+from __future__ import annotations
+
+import numpy as np
+
+from uniface.draw import draw_gaze
+
+
+def _compute_gaze_delta(bbox: np.ndarray, pitch: float, yaw: float) -> tuple[int, int]:
+    """Replicate draw_gaze dx/dy math for verification."""
+    x_min, _, x_max, _ = map(int, bbox[:4])
+    length = x_max - x_min
+    dx = int(-length * np.sin(yaw) * np.cos(pitch))
+    dy = int(-length * np.sin(pitch))
+    return dx, dy
+
+
+def test_draw_gaze_yaw_only_moves_horizontally():
+    """Yaw-only input (pitch=0) should produce horizontal displacement only."""
+    image = np.zeros((200, 200, 3), dtype=np.uint8)
+    bbox = np.array([50, 50, 150, 150], dtype=np.float32)
+
+    yaw = 0.5
+    pitch = 0.0
+    dx, dy = _compute_gaze_delta(bbox, pitch, yaw)
+
+    assert dx != 0, 'Yaw-only should produce horizontal displacement'
+    assert dy == 0, 'Yaw-only should produce zero vertical displacement'
+
+    # Should not raise
+    draw_gaze(image, bbox, pitch, yaw, draw_bbox=False, draw_angles=False)
+
+
+def test_draw_gaze_pitch_only_moves_vertically():
+    """Pitch-only input (yaw=0) should produce vertical displacement only."""
+    image = np.zeros((200, 200, 3), dtype=np.uint8)
+    bbox = np.array([50, 50, 150, 150], dtype=np.float32)
+
+    yaw = 0.0
+    pitch = 0.5
+    dx, dy = _compute_gaze_delta(bbox, pitch, yaw)
+
+    assert dx == 0, 'Pitch-only should produce zero horizontal displacement'
+    assert dy != 0, 'Pitch-only should produce vertical displacement'
+
+    # Should not raise
+    draw_gaze(image, bbox, pitch, yaw, draw_bbox=False, draw_angles=False)
+
+
+def test_draw_gaze_modifies_image():
+    """draw_gaze should modify the image in place."""
+    image = np.zeros((200, 200, 3), dtype=np.uint8)
+    bbox = np.array([50, 50, 150, 150], dtype=np.float32)
+
+    original = image.copy()
+    draw_gaze(image, bbox, 0.3, 0.3)
+
+    assert not np.array_equal(image, original), 'draw_gaze should modify the image'
--- a/tests/test_factory.py
+++ b/tests/test_factory.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for factory functions (create_detector, create_recognizer, etc.)."""

 from __future__ import annotations

@@ -10,13 +9,15 @@ import numpy as np
 import pytest

 from uniface import (
+    create_attribute_predictor,
    create_detector,
    create_landmarker,
    create_recognizer,
-    detect_faces,
    list_available_detectors,
 )
-from uniface.constants import RetinaFaceWeights, SCRFDWeights
+from uniface.attribute import AgeGender, FairFace
+from uniface.constants import AgeGenderWeights, FairFaceWeights, RetinaFaceWeights, SCRFDWeights
+from uniface.spoofing import MiniFASNet, create_spoofer


 # create_detector tests
@@ -123,62 +124,6 @@ def test_create_landmarker_invalid_method():
        create_landmarker('invalid_method')


-# detect_faces tests
-def test_detect_faces_retinaface():
-    """
-    Test high-level detect_faces function with RetinaFace.
-    """
-    mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
-    faces = detect_faces(mock_image, method='retinaface')
-
-    assert isinstance(faces, list), 'detect_faces should return a list'
-
-
-def test_detect_faces_scrfd():
-    """
-    Test high-level detect_faces function with SCRFD.
-    """
-    mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
-    faces = detect_faces(mock_image, method='scrfd')
-
-    assert isinstance(faces, list), 'detect_faces should return a list'
-
-
-def test_detect_faces_with_threshold():
-    """
-    Test detect_faces with custom confidence threshold.
-    """
-    mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
-    faces = detect_faces(mock_image, method='retinaface', confidence_threshold=0.8)
-
-    assert isinstance(faces, list), 'detect_faces should return a list'
-
-    # All detections should respect threshold
-    for face in faces:
-        assert face.confidence >= 0.8, 'All detections should meet confidence threshold'
-
-
-def test_detect_faces_default_method():
-    """
-    Test detect_faces with default method (should use retinaface).
-    """
-    mock_image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
-    faces = detect_faces(mock_image)  # No method specified
-
-    assert isinstance(faces, list), 'detect_faces should return a list with default method'
-
-
-def test_detect_faces_empty_image():
-    """
-    Test detect_faces on a blank image.
-    """
-    empty_image = np.zeros((640, 640, 3), dtype=np.uint8)
-    faces = detect_faces(empty_image, method='retinaface')
-
-    assert isinstance(faces, list), 'Should return a list even for empty image'
-    assert len(faces) == 0, 'Should detect no faces in blank image'
-
-
 # list_available_detectors tests
 def test_list_available_detectors():
    """
@@ -222,7 +167,7 @@ def test_recognizer_inference_from_factory():

    embedding = recognizer.get_embedding(mock_image)
    assert embedding is not None, 'Recognizer should return embedding'
-    assert embedding.shape[1] == 512, 'Should return 512-dimensional embedding'
+    assert embedding.shape == (1, 512), 'get_embedding should return (1, 512) with batch dimension'


 def test_landmarker_inference_from_factory():
@@ -280,3 +225,32 @@ def test_factory_returns_correct_types():
    assert isinstance(detector, RetinaFace), 'Should return RetinaFace instance'
    assert isinstance(recognizer, ArcFace), 'Should return ArcFace instance'
    assert isinstance(landmarker, Landmark106), 'Should return Landmark106 instance'
+
+
+# create_spoofer tests
+def test_create_spoofer_default():
+    """Test creating a spoofer with default parameters."""
+    spoofer = create_spoofer()
+    assert isinstance(spoofer, MiniFASNet), 'Should return MiniFASNet instance'
+
+
+def test_create_spoofer_with_providers():
+    """Test that create_spoofer forwards providers kwarg without TypeError."""
+    spoofer = create_spoofer(providers=['CPUExecutionProvider'])
+    assert isinstance(spoofer, MiniFASNet), 'Should return MiniFASNet instance'
+
+
+# create_attribute_predictor tests
+def test_create_attribute_predictor_age_gender():
+    predictor = create_attribute_predictor(AgeGenderWeights.DEFAULT)
+    assert isinstance(predictor, AgeGender), 'Should return AgeGender instance'
+
+
+def test_create_attribute_predictor_fairface():
+    predictor = create_attribute_predictor(FairFaceWeights.DEFAULT)
+    assert isinstance(predictor, FairFace), 'Should return FairFace instance'
+
+
+def test_create_attribute_predictor_invalid():
+    with pytest.raises(ValueError, match='Unsupported attribute model'):
+        create_attribute_predictor('invalid_model')
--- a/tests/test_headpose.py
+++ b/tests/test_headpose.py
@@ -0,0 +1,115 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+
+from __future__ import annotations
+
+import numpy as np
+import pytest
+
+from uniface import HeadPose, HeadPoseResult, create_head_pose_estimator
+from uniface.headpose import BaseHeadPoseEstimator
+from uniface.headpose.models import HeadPose as HeadPoseModel
+
+
+def test_create_head_pose_estimator_default():
+    """Test creating a head pose estimator with default parameters."""
+    estimator = create_head_pose_estimator()
+    assert isinstance(estimator, HeadPose), 'Should return HeadPose instance'
+
+
+def test_create_head_pose_estimator_aliases():
+    """Test that factory accepts all documented aliases."""
+    for alias in ('headpose', 'head_pose', '6drepnet'):
+        estimator = create_head_pose_estimator(alias)
+        assert isinstance(estimator, HeadPose), f"Alias '{alias}' should return HeadPose"
+
+
+def test_create_head_pose_estimator_invalid():
+    """Test that invalid method raises ValueError."""
+    with pytest.raises(ValueError, match='Unsupported head pose estimation method'):
+        create_head_pose_estimator('invalid_method')
+
+
+def test_head_pose_inference():
+    """Test that HeadPose can run inference on a mock image."""
+    estimator = HeadPose()
+    mock_image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
+    result = estimator.estimate(mock_image)
+
+    assert isinstance(result, HeadPoseResult), 'Should return HeadPoseResult'
+    assert isinstance(result.pitch, float), 'pitch should be float'
+    assert isinstance(result.yaw, float), 'yaw should be float'
+    assert isinstance(result.roll, float), 'roll should be float'
+
+
+def test_head_pose_callable():
+    """Test that HeadPose is callable via __call__."""
+    estimator = HeadPose()
+    mock_image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
+    result = estimator(mock_image)
+
+    assert isinstance(result, HeadPoseResult), '__call__ should return HeadPoseResult'
+
+
+def test_head_pose_result_repr():
+    """Test HeadPoseResult repr formatting."""
+    result = HeadPoseResult(pitch=10.5, yaw=-20.3, roll=5.1)
+    repr_str = repr(result)
+    assert 'HeadPoseResult' in repr_str
+    assert '10.5' in repr_str
+    assert '-20.3' in repr_str
+    assert '5.1' in repr_str
+
+
+def test_head_pose_result_frozen():
+    """Test that HeadPoseResult is immutable."""
+    result = HeadPoseResult(pitch=1.0, yaw=2.0, roll=3.0)
+    with pytest.raises(AttributeError):
+        result.pitch = 99.0  # type: ignore[misc]
+
+
+def test_rotation_matrix_to_euler_identity():
+    """Test that identity rotation matrix gives zero angles."""
+    identity = np.eye(3).reshape(1, 3, 3)
+    euler = HeadPoseModel.rotation_matrix_to_euler(identity)
+
+    assert euler.shape == (1, 3), 'Should return (1, 3) shaped array'
+    np.testing.assert_allclose(euler[0], [0.0, 0.0, 0.0], atol=1e-5)
+
+
+def test_rotation_matrix_to_euler_90deg_yaw():
+    """Test 90-degree yaw rotation."""
+    angle = np.radians(90)
+    R = np.array(
+        [
+            [np.cos(angle), 0, np.sin(angle)],
+            [0, 1, 0],
+            [-np.sin(angle), 0, np.cos(angle)],
+        ]
+    ).reshape(1, 3, 3)
+    euler = HeadPoseModel.rotation_matrix_to_euler(R)
+
+    np.testing.assert_allclose(euler[0, 1], 90.0, atol=1e-3)
+
+
+def test_rotation_matrix_to_euler_batch():
+    """Test batch processing of rotation matrices."""
+    batch = np.stack([np.eye(3), np.eye(3), np.eye(3)], axis=0)
+    euler = HeadPoseModel.rotation_matrix_to_euler(batch)
+
+    assert euler.shape == (3, 3), 'Batch of 3 should return (3, 3)'
+    np.testing.assert_allclose(euler, 0.0, atol=1e-5)
+
+
+def test_factory_returns_correct_type():
+    """Test that factory function returns BaseHeadPoseEstimator subclass."""
+    estimator = create_head_pose_estimator()
+    assert isinstance(estimator, BaseHeadPoseEstimator), 'Should be BaseHeadPoseEstimator subclass'
+
+
+def test_head_pose_with_providers():
+    """Test that HeadPose accepts providers kwarg."""
+    estimator = HeadPose(providers=['CPUExecutionProvider'])
+    assert isinstance(estimator, HeadPose), 'Should create with explicit providers'
--- a/tests/test_landmark.py
+++ b/tests/test_landmark.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for 106-point facial landmark detector."""

 from __future__ import annotations

--- a/tests/test_parsing.py
+++ b/tests/test_parsing.py
@@ -2,15 +2,14 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for BiSeNet face parsing model."""

 from __future__ import annotations

 import numpy as np
 import pytest

-from uniface.constants import ParsingWeights
-from uniface.parsing import BiSeNet, create_face_parser
+from uniface.constants import ParsingWeights, XSegWeights
+from uniface.parsing import BiSeNet, XSeg, create_face_parser


 def test_bisenet_initialization():
@@ -120,3 +119,151 @@ def test_bisenet_different_input_sizes():

        assert mask.shape == (h, w), f'Failed for size {h}x{w}'
        assert mask.dtype == np.uint8
+
+
+# XSeg Tests
+
+
+def test_xseg_initialization():
+    """Test XSeg initialization."""
+    parser = XSeg()
+    assert parser is not None
+    assert parser.input_size == (256, 256)
+    assert parser.align_size == 256
+    assert parser.blur_sigma == 0
+
+
+def test_xseg_with_custom_params():
+    """Test XSeg with custom parameters."""
+    parser = XSeg(align_size=512, blur_sigma=5)
+    assert parser.align_size == 512
+    assert parser.blur_sigma == 5
+
+
+def test_xseg_preprocess():
+    """Test XSeg preprocessing."""
+    parser = XSeg()
+
+    # Create a dummy aligned face crop
+    face_crop = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
+
+    # Preprocess
+    preprocessed = parser.preprocess(face_crop)
+
+    assert preprocessed.shape == (1, 256, 256, 3)  # NHWC format
+    assert preprocessed.dtype == np.float32
+    assert preprocessed.min() >= 0
+    assert preprocessed.max() <= 1
+
+
+def test_xseg_postprocess():
+    """Test XSeg postprocessing."""
+    parser = XSeg()
+
+    # Create dummy model output (NHWC format)
+    dummy_output = np.random.rand(1, 256, 256, 1).astype(np.float32)
+
+    # Postprocess
+    mask = parser.postprocess(dummy_output, crop_size=(256, 256))
+
+    assert mask.shape == (256, 256)
+    assert mask.dtype == np.float32
+    assert mask.min() >= 0
+    assert mask.max() <= 1
+
+
+def test_xseg_parse_aligned():
+    """Test XSeg parse_aligned method."""
+    parser = XSeg()
+
+    # Create a dummy aligned face crop
+    face_crop = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
+
+    # Parse
+    mask = parser.parse_aligned(face_crop)
+
+    assert mask.shape == (256, 256)
+    assert mask.dtype == np.float32
+    assert mask.min() >= 0
+    assert mask.max() <= 1
+
+
+def test_xseg_parse_with_landmarks():
+    """Test XSeg parse method with landmarks."""
+    parser = XSeg()
+
+    # Create a dummy image
+    image = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
+
+    # Create dummy 5-point landmarks
+    landmarks = np.array(
+        [
+            [250, 200],  # left eye
+            [390, 200],  # right eye
+            [320, 280],  # nose
+            [260, 350],  # left mouth
+            [380, 350],  # right mouth
+        ],
+        dtype=np.float32,
+    )
+
+    # Parse
+    mask = parser.parse(image, landmarks=landmarks)
+
+    assert mask.shape == (480, 640)
+    assert mask.dtype == np.float32
+    assert mask.min() >= 0
+    assert mask.max() <= 1
+
+
+def test_xseg_parse_invalid_landmarks():
+    """Test XSeg parse with invalid landmarks shape."""
+    parser = XSeg()
+    image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
+
+    # Wrong shape
+    invalid_landmarks = np.array([[0, 0], [1, 1], [2, 2]])
+
+    with pytest.raises(ValueError, match='Landmarks must have shape'):
+        parser.parse(image, landmarks=invalid_landmarks)
+
+
+def test_xseg_parse_with_inverse():
+    """Test XSeg parse_with_inverse method."""
+    parser = XSeg()
+
+    # Create a dummy image
+    image = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
+
+    # Create dummy 5-point landmarks
+    landmarks = np.array(
+        [
+            [250, 200],
+            [390, 200],
+            [320, 280],
+            [260, 350],
+            [380, 350],
+        ],
+        dtype=np.float32,
+    )
+
+    # Parse with inverse
+    mask, face_crop, inverse_matrix = parser.parse_with_inverse(image, landmarks)
+
+    assert mask.shape == (256, 256)
+    assert face_crop.shape == (256, 256, 3)
+    assert inverse_matrix.shape == (2, 3)
+
+
+def test_create_face_parser_xseg_enum():
+    """Test factory function with XSeg enum."""
+    parser = create_face_parser(XSegWeights.DEFAULT)
+    assert parser is not None
+    assert isinstance(parser, XSeg)
+
+
+def test_create_face_parser_xseg_string():
+    """Test factory function with XSeg string."""
+    parser = create_face_parser('xseg')
+    assert parser is not None
+    assert isinstance(parser, XSeg)
--- a/tests/test_recognition.py
+++ b/tests/test_recognition.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for face recognition models (ArcFace, MobileFace, SphereFace)."""

 from __future__ import annotations

@@ -75,7 +74,7 @@ def test_arcface_embedding_shape(arcface_model, mock_aligned_face):
    """
    embedding = arcface_model.get_embedding(mock_aligned_face)

-    # ArcFace typically produces 512-dimensional embeddings
+    # ArcFace get_embedding returns raw ONNX output with batch dimension
    assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
    assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'

@@ -89,7 +88,8 @@ def test_arcface_normalized_embedding(arcface_model, mock_landmarks):

    embedding = arcface_model.get_normalized_embedding(mock_image, mock_landmarks)

-    # Check that embedding is normalized (L2 norm ≈ 1.0)
+    # Check shape and normalization
+    assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
    norm = np.linalg.norm(embedding)
    assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'

@@ -126,7 +126,7 @@ def test_mobileface_embedding_shape(mobileface_model, mock_aligned_face):
    """
    embedding = mobileface_model.get_embedding(mock_aligned_face)

-    # MobileFace typically produces 512-dimensional embeddings
+    # MobileFace get_embedding returns raw ONNX output with batch dimension
    assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
    assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'

@@ -139,6 +139,7 @@ def test_mobileface_normalized_embedding(mobileface_model, mock_landmarks):

    embedding = mobileface_model.get_normalized_embedding(mock_image, mock_landmarks)

+    assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
    norm = np.linalg.norm(embedding)
    assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'

@@ -157,7 +158,7 @@ def test_sphereface_embedding_shape(sphereface_model, mock_aligned_face):
    """
    embedding = sphereface_model.get_embedding(mock_aligned_face)

-    # SphereFace typically produces 512-dimensional embeddings
+    # SphereFace get_embedding returns raw ONNX output with batch dimension
    assert embedding.shape[1] == 512, f'Expected 512-dim embedding, got {embedding.shape[1]}'
    assert embedding.shape[0] == 1, 'Embedding should have batch dimension of 1'

@@ -170,6 +171,7 @@ def test_sphereface_normalized_embedding(sphereface_model, mock_landmarks):

    embedding = sphereface_model.get_normalized_embedding(mock_image, mock_landmarks)

+    assert embedding.shape == (512,), f'Expected shape (512,), got {embedding.shape}'
    norm = np.linalg.norm(embedding)
    assert np.isclose(norm, 1.0, atol=1e-5), f'Normalized embedding should have norm 1.0, got {norm}'

--- a/tests/test_retinaface.py
+++ b/tests/test_retinaface.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for RetinaFace detector."""

 from __future__ import annotations

--- a/tests/test_scrfd.py
+++ b/tests/test_scrfd.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for SCRFD detector."""

 from __future__ import annotations

--- a/tests/test_types.py
+++ b/tests/test_types.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for UniFace type definitions (dataclasses)."""

 from __future__ import annotations

--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@@ -2,7 +2,6 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Tests for utility functions (compute_similarity, face_alignment, etc.)."""

 from __future__ import annotations

--- a/tools/README.md
+++ b/tools/README.md
@@ -6,26 +6,29 @@ CLI utilities for testing and running UniFace features.

 | Tool | Description |
 |------|-------------|
-| `detection.py` | Face detection on image, video, or webcam |
-| `face_anonymize.py` | Face anonymization/blurring for privacy |
-| `age_gender.py` | Age and gender prediction |
-| `face_emotion.py` | Emotion detection (7 or 8 emotions) |
-| `gaze_estimation.py` | Gaze direction estimation |
+| `detect.py` | Face detection on image, video, or webcam |
+| `track.py` | Face tracking on video with ByteTrack |
+| `analyze.py` | Complete face analysis (detection + recognition + attributes) |
+| `anonymize.py` | Face anonymization/blurring for privacy |
+| `emotion.py` | Emotion detection (7 or 8 emotions) |
+| `gaze.py` | Gaze direction estimation |
+| `headpose.py` | Head pose estimation (pitch, yaw, roll) |
 | `landmarks.py` | 106-point facial landmark detection |
-| `recognition.py` | Face embedding extraction and comparison |
-| `face_analyzer.py` | Complete face analysis (detection + recognition + attributes) |
-| `face_search.py` | Real-time face matching against reference |
+| `recognize.py` | Face embedding extraction and comparison |
+| `search.py` | Real-time face matching against reference |
+| `faiss_search.py` | FAISS index build and multi-identity face search |
 | `fairface.py` | FairFace attribute prediction (race, gender, age) |
+| `attribute.py` | Age and gender prediction |
 | `spoofing.py` | Face anti-spoofing detection |
-| `face_parsing.py` | Face semantic segmentation |
-| `video_detection.py` | Face detection on video files with progress bar |
+| `parse.py` | Face semantic segmentation (BiSeNet) |
+| `xseg.py` | Face segmentation (XSeg) |
 | `batch_process.py` | Batch process folder of images |
 | `download_model.py` | Download model weights |
 | `sha256_generate.py` | Generate SHA256 hash for model files |

 ## Unified `--source` Pattern

-All tools use a unified `--source` argument that accepts:
+Most tools use a unified `--source` argument that accepts:
 - **Image path**: `--source photo.jpg`
 - **Video path**: `--source video.mp4`
 - **Camera ID**: `--source 0` (default webcam), `--source 1` (external camera)
@@ -34,26 +37,36 @@ All tools use a unified `--source` argument that accepts:

 ```bash
 # Face detection
-python tools/detection.py --source assets/test.jpg           # image
-python tools/detection.py --source video.mp4                 # video
-python tools/detection.py --source 0                         # webcam
+python tools/detect.py --source assets/test.jpg           # image
+python tools/detect.py --source video.mp4                  # video
+python tools/detect.py --source 0                          # webcam
+
+# Face tracking
+python tools/track.py --source video.mp4
+python tools/track.py --source video.mp4 --output tracked.mp4
+python tools/track.py --source 0                           # webcam

 # Face anonymization
-python tools/face_anonymize.py --source assets/test.jpg --method pixelate
-python tools/face_anonymize.py --source video.mp4 --method gaussian
-python tools/face_anonymize.py --source 0 --method pixelate
+python tools/anonymize.py --source assets/test.jpg --method pixelate
+python tools/anonymize.py --source video.mp4 --method gaussian
+python tools/anonymize.py --source 0 --method pixelate

 # Age and gender
-python tools/age_gender.py --source assets/test.jpg
-python tools/age_gender.py --source 0
+python tools/attribute.py --source assets/test.jpg
+python tools/attribute.py --source 0

 # Emotion detection
-python tools/face_emotion.py --source assets/test.jpg
-python tools/face_emotion.py --source 0
+python tools/emotion.py --source assets/test.jpg
+python tools/emotion.py --source 0

 # Gaze estimation
-python tools/gaze_estimation.py --source assets/test.jpg
-python tools/gaze_estimation.py --source 0
+python tools/gaze.py --source assets/test.jpg
+python tools/gaze.py --source 0
+
+# Head pose estimation
+python tools/headpose.py --source assets/test.jpg
+python tools/headpose.py --source 0
+python tools/headpose.py --source 0 --draw-type axis

 # Landmarks
 python tools/landmarks.py --source assets/test.jpg
@@ -63,31 +76,31 @@ python tools/landmarks.py --source 0
 python tools/fairface.py --source assets/test.jpg
 python tools/fairface.py --source 0

-# Face parsing
-python tools/face_parsing.py --source assets/test.jpg
-python tools/face_parsing.py --source 0
+# Face parsing (BiSeNet)
+python tools/parse.py --source assets/test.jpg
+python tools/parse.py --source 0
+
+# Face segmentation (XSeg)
+python tools/xseg.py --source assets/test.jpg
+python tools/xseg.py --source 0

 # Face anti-spoofing
 python tools/spoofing.py --source assets/test.jpg
 python tools/spoofing.py --source 0

 # Face analyzer
-python tools/face_analyzer.py --source assets/test.jpg
-python tools/face_analyzer.py --source 0
+python tools/analyze.py --source assets/test.jpg
+python tools/analyze.py --source 0

 # Face recognition (extract embedding)
-python tools/recognition.py --image assets/test.jpg
+python tools/recognize.py --image assets/test.jpg

 # Face comparison
-python tools/recognition.py --image1 face1.jpg --image2 face2.jpg
+python tools/recognize.py --image1 face1.jpg --image2 face2.jpg

 # Face search (match against reference)
-python tools/face_search.py --reference person.jpg --source 0
-python tools/face_search.py --reference person.jpg --source video.mp4
-
-# Video processing with progress bar
-python tools/video_detection.py --source video.mp4
-python tools/video_detection.py --source video.mp4 --output output.mp4
+python tools/search.py --reference person.jpg --source 0
+python tools/search.py --reference person.jpg --source video.mp4

 # Batch processing
 python tools/batch_process.py --input images/ --output results/
@@ -102,7 +115,7 @@ python tools/download_model.py  # downloads all
 | Option | Description |
 |--------|-------------|
 | `--source` | Input source: image/video path or camera ID (0, 1, ...) |
-| `--detector` | Choose detector: `retinaface`, `scrfd`, `yolov5face` |
+| `--detector` | Choose detector: `retinaface`, `scrfd`, `yolov5face`, `yolov8face` |
 | `--threshold` | Visualization confidence threshold (default: varies) |
 | `--save-dir` | Output directory (default: `outputs`) |

@@ -117,5 +130,5 @@ python tools/download_model.py  # downloads all
 ## Quick Test

 ```bash
-python tools/detection.py --source assets/test.jpg
+python tools/detect.py --source assets/test.jpg
 ```
--- a/tools/_common.py
+++ b/tools/_common.py
@@ -0,0 +1,29 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+from __future__ import annotations
+
+from pathlib import Path
+
+IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
+VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
+
+
+def get_source_type(source: str) -> str:
+    """Determine if source is image, video, or camera.
+
+    Args:
+        source: File path or camera ID string (e.g. ``"0"``).
+
+    Returns:
+        One of ``"image"``, ``"video"``, ``"camera"``, or ``"unknown"``.
+    """
+    if source.isdigit():
+        return 'camera'
+    suffix = Path(source).suffix.lower()
+    if suffix in IMAGE_EXTENSIONS:
+        return 'image'
+    if suffix in VIDEO_EXTENSIONS:
+        return 'video'
+    return 'unknown'
--- a/tools/face_analyzer.py
+++ b/tools/face_analyzer.py
@@ -5,9 +5,9 @@
 """Face analysis using FaceAnalyzer.

 Usage:
-    python tools/face_analyzer.py --source path/to/image.jpg
-    python tools/face_analyzer.py --source path/to/video.mp4
-    python tools/face_analyzer.py --source 0  # webcam
+    python tools/analyze.py --source path/to/image.jpg
+    python tools/analyze.py --source path/to/video.mp4
+    python tools/analyze.py --source 0  # webcam
 """

 from __future__ import annotations
@@ -16,28 +16,15 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2
 import numpy as np

-from uniface import AgeGender, ArcFace, FaceAnalyzer, RetinaFace
-from uniface.visualization import draw_detections
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
+from uniface.analyzer import FaceAnalyzer
+from uniface.attribute import AgeGender
+from uniface.detection import RetinaFace
+from uniface.draw import draw_detections
+from uniface.recognition import ArcFace


 def draw_face_info(image, face, face_id):
@@ -111,7 +98,7 @@ def process_image(analyzer, image_path: str, save_dir: str = 'outputs', show_sim
    bboxes = [f.bbox for f in faces]
    scores = [f.confidence for f in faces]
    landmarks = [f.landmarks for f in faces]
-    draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, fancy_bbox=True)
+    draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, corner_bbox=True)

    for i, face in enumerate(faces, 1):
        draw_face_info(image, face, i)
@@ -153,7 +140,7 @@ def process_video(analyzer, video_path: str, save_dir: str = 'outputs'):
        bboxes = [f.bbox for f in faces]
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
-        draw_detections(image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, fancy_bbox=True)
+        draw_detections(image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, corner_bbox=True)

        for i, face in enumerate(faces, 1):
            draw_face_info(frame, face, i)
@@ -180,16 +167,16 @@ def run_camera(analyzer, camera_id: int = 0):

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = analyzer.analyze(frame)

        bboxes = [f.bbox for f in faces]
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
-        draw_detections(image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, fancy_bbox=True)
+        draw_detections(image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, corner_bbox=True)

        for i, face in enumerate(faces, 1):
            draw_face_info(frame, face, i)
@@ -214,7 +201,7 @@ def main():
    detector = RetinaFace()
    recognizer = ArcFace()
    age_gender = AgeGender()
-    analyzer = FaceAnalyzer(detector, recognizer, age_gender)
+    analyzer = FaceAnalyzer(detector, recognizer=recognizer, attributes=[age_gender])

    source_type = get_source_type(args.source)

--- a/tools/face_anonymize.py
+++ b/tools/face_anonymize.py
@@ -5,9 +5,9 @@
 """Face anonymization/blurring for privacy.

 Usage:
-    python tools/face_anonymize.py --source path/to/image.jpg --method pixelate
-    python tools/face_anonymize.py --source path/to/video.mp4 --method gaussian
-    python tools/face_anonymize.py --source 0 --method pixelate  # webcam
+    python tools/anonymize.py --source path/to/image.jpg --method pixelate
+    python tools/anonymize.py --source path/to/video.mp4 --method gaussian
+    python tools/anonymize.py --source 0 --method pixelate  # webcam
 """

 from __future__ import annotations
@@ -16,28 +16,12 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2

-from uniface import RetinaFace
+from uniface.detection import RetinaFace
 from uniface.privacy import BlurFace

-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
-

 def process_image(
    detector,
@@ -56,7 +40,7 @@ def process_image(
    print(f'Detected {len(faces)} face(s)')

    if show_detections and faces:
-        from uniface.visualization import draw_detections
+        from uniface.draw import draw_detections

        preview = image.copy()
        bboxes = [face.bbox for face in faces]
@@ -137,9 +121,9 @@ def run_camera(detector, blurrer: BlurFace, camera_id: int = 0):

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = detector.detect(frame)
        if faces:
@@ -171,19 +155,19 @@ def main():
        epilog="""
 Examples:
  # Anonymize image with pixelation (default)
-  python run_anonymization.py --source photo.jpg
+  python tools/anonymize.py --source photo.jpg

  # Use Gaussian blur with custom strength
-  python run_anonymization.py --source photo.jpg --method gaussian --blur-strength 5.0
+  python tools/anonymize.py --source photo.jpg --method gaussian --blur-strength 5.0

  # Real-time webcam anonymization
-  python run_anonymization.py --source 0 --method pixelate
+  python tools/anonymize.py --source 0 --method pixelate

  # Black boxes for maximum privacy
-  python run_anonymization.py --source photo.jpg --method blackout
+  python tools/anonymize.py --source photo.jpg --method blackout

  # Custom pixelation intensity
-  python run_anonymization.py --source photo.jpg --method pixelate --pixel-blocks 5
+  python tools/anonymize.py --source photo.jpg --method pixelate --pixel-blocks 5
        """,
    )

--- a/tools/age_gender.py
+++ b/tools/age_gender.py
@@ -5,9 +5,9 @@
 """Age and gender prediction on detected faces.

 Usage:
-    python tools/age_gender.py --source path/to/image.jpg
-    python tools/age_gender.py --source path/to/video.mp4
-    python tools/age_gender.py --source 0  # webcam
+    python tools/attribute.py --source path/to/image.jpg
+    python tools/attribute.py --source path/to/video.mp4
+    python tools/attribute.py --source 0  # webcam
 """

 from __future__ import annotations
@@ -16,27 +16,12 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2

-from uniface import SCRFD, AgeGender, RetinaFace
-from uniface.visualization import draw_detections
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
+from uniface.attribute import AgeGender
+from uniface.detection import SCRFD, RetinaFace
+from uniface.draw import draw_detections


 def draw_age_gender_label(image, bbox, sex: str, age: int):
@@ -71,11 +56,11 @@ def process_image(
    scores = [f.confidence for f in faces]
    landmarks = [f.landmarks for f in faces]
    draw_detections(
-        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
    )

    for i, face in enumerate(faces):
-        result = age_gender.predict(image, face.bbox)
+        result = age_gender.predict(image, face)
        print(f'  Face {i + 1}: {result.sex}, {result.age} years old')
        draw_age_gender_label(image, face.bbox, result.sex, result.age)

@@ -123,11 +108,11 @@ def process_video(
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
        )

        for face in faces:
-            result = age_gender.predict(frame, face.bbox)
+            result = age_gender.predict(frame, face)
            draw_age_gender_label(frame, face.bbox, result.sex, result.age)

        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
@@ -152,9 +137,9 @@ def run_camera(detector, age_gender, camera_id: int = 0, threshold: float = 0.6)

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = detector.detect(frame)

@@ -162,11 +147,11 @@ def run_camera(detector, age_gender, camera_id: int = 0, threshold: float = 0.6)
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
        )

        for face in faces:
-            result = age_gender.predict(frame, face.bbox)
+            result = age_gender.predict(frame, face)
            draw_age_gender_label(frame, face.bbox, result.sex, result.age)

        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
--- a/tools/batch_process.py
+++ b/tools/batch_process.py
@@ -14,8 +14,8 @@ from pathlib import Path
 import cv2
 from tqdm import tqdm

-from uniface import SCRFD, RetinaFace
-from uniface.visualization import draw_detections
+from uniface.detection import SCRFD, RetinaFace
+from uniface.draw import draw_detections


 def get_image_files(input_dir: Path, extensions: tuple) -> list:
@@ -39,7 +39,7 @@ def process_image(detector, image_path: Path, output_path: Path, threshold: floa
    scores = [f.confidence for f in faces]
    landmarks = [f.landmarks for f in faces]
    draw_detections(
-        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
    )

    cv2.putText(
--- a/tools/detection.py
+++ b/tools/detection.py
@@ -5,9 +5,9 @@
 """Face detection on image, video, or webcam.

 Usage:
-    python tools/detection.py --source path/to/image.jpg
-    python tools/detection.py --source path/to/video.mp4
-    python tools/detection.py --source 0  # webcam
+    python tools/detect.py --source path/to/image.jpg
+    python tools/detect.py --source path/to/video.mp4
+    python tools/detect.py --source 0  # webcam
 """

 from __future__ import annotations
@@ -15,28 +15,14 @@ from __future__ import annotations
 import argparse
 import os
 from pathlib import Path
+import time

+from _common import get_source_type
 import cv2
+from tqdm import tqdm

 from uniface.detection import SCRFD, RetinaFace, YOLOv5Face, YOLOv8Face
-from uniface.visualization import draw_detections
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
+from uniface.draw import draw_detections


 def process_image(detector, image_path: str, threshold: float = 0.6, save_dir: str = 'outputs'):
@@ -52,7 +38,7 @@ def process_image(detector, image_path: str, threshold: float = 0.6, save_dir: s
        bboxes = [face.bbox for face in faces]
        scores = [face.confidence for face in faces]
        landmarks = [face.landmarks for face in faces]
-        draw_detections(image, bboxes, scores, landmarks, vis_threshold=threshold)
+        draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold)

    os.makedirs(save_dir, exist_ok=True)
    output_path = os.path.join(save_dir, f'{os.path.splitext(os.path.basename(image_path))[0]}_out.jpg')
@@ -60,34 +46,48 @@ def process_image(detector, image_path: str, threshold: float = 0.6, save_dir: s
    print(f'Detected {len(faces)} face(s). Output saved: {output_path}')


-def process_video(detector, video_path: str, threshold: float = 0.6, save_dir: str = 'outputs'):
-    """Process a video file."""
-    cap = cv2.VideoCapture(video_path)
+def process_video(
+    detector,
+    input_path: str,
+    output_path: str,
+    threshold: float = 0.6,
+    show_preview: bool = False,
+):
+    """Process a video file with progress bar."""
+    cap = cv2.VideoCapture(input_path)
    if not cap.isOpened():
-        print(f"Error: Cannot open video file '{video_path}'")
+        print(f"Error: Cannot open video file '{input_path}'")
        return

-    # Get video properties
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
-    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

-    os.makedirs(save_dir, exist_ok=True)
-    output_path = os.path.join(save_dir, f'{Path(video_path).stem}_out.mp4')
+    print(f'Input: {input_path} ({width}x{height}, {fps:.1f} fps, {total_frames} frames)')
+    print(f'Output: {output_path}')
+
+    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

-    print(f'Processing video: {video_path} ({total_frames} frames)')
-    frame_count = 0
+    if not out.isOpened():
+        print(f"Error: Cannot create output video '{output_path}'")
+        cap.release()
+        return

-    while True:
+    frame_count = 0
+    total_faces = 0
+
+    for _ in tqdm(range(total_frames), desc='Processing', unit='frames'):
        ret, frame = cap.read()
        if not ret:
            break

+        t0 = time.perf_counter()
        frame_count += 1
        faces = detector.detect(frame)
+        total_faces += len(faces)

        bboxes = [f.bbox for f in faces]
        scores = [f.confidence for f in faces]
@@ -99,19 +99,28 @@ def process_video(detector, video_path: str, threshold: float = 0.6, save_dir: s
            landmarks=landmarks,
            vis_threshold=threshold,
            draw_score=True,
-            fancy_bbox=True,
+            corner_bbox=True,
        )

-        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        inference_fps = 1.0 / max(time.perf_counter() - t0, 1e-9)
+        cv2.putText(frame, f'FPS: {inference_fps:.1f}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        cv2.putText(frame, f'Faces: {len(faces)}', (10, 65), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        out.write(frame)

-        # Show progress
-        if frame_count % 100 == 0:
-            print(f'  Processed {frame_count}/{total_frames} frames...')
+        if show_preview:
+            cv2.imshow("Processing - Press 'q' to cancel", frame)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                print('\nCancelled by user')
+                break

    cap.release()
    out.release()
-    print(f'Done! Output saved: {output_path}')
+    if show_preview:
+        cv2.destroyAllWindows()
+
+    avg_faces = total_faces / frame_count if frame_count > 0 else 0
+    print(f'\nDone! {frame_count} frames, {total_faces} faces ({avg_faces:.1f} avg/frame)')
+    print(f'Saved: {output_path}')


 def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):
@@ -123,11 +132,12 @@ def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):

    print("Press 'q' to quit")

+    prev_time = time.perf_counter()
    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)  # mirror for natural interaction
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = detector.detect(frame)

@@ -141,10 +151,14 @@ def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):
            landmarks=landmarks,
            vis_threshold=threshold,
            draw_score=True,
-            fancy_bbox=True,
+            corner_bbox=True,
        )

-        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        curr_time = time.perf_counter()
+        fps = 1.0 / max(curr_time - prev_time, 1e-9)
+        prev_time = curr_time
+        cv2.putText(frame, f'FPS: {fps:.1f}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        cv2.putText(frame, f'Faces: {len(faces)}', (10, 65), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        cv2.imshow('Face Detection', frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
@@ -158,18 +172,24 @@ def main():
    parser = argparse.ArgumentParser(description='Run face detection')
    parser.add_argument('--source', type=str, required=True, help='Image/video path or camera ID (0, 1, ...)')
    parser.add_argument(
-        '--method', type=str, default='retinaface', choices=['retinaface', 'scrfd', 'yolov5face', 'yolov8face']
+        '--detector',
+        '--method',
+        type=str,
+        default='retinaface',
+        choices=['retinaface', 'scrfd', 'yolov5face', 'yolov8face'],
    )
    parser.add_argument('--threshold', type=float, default=0.25, help='Visualization threshold')
+    parser.add_argument('--preview', action='store_true', help='Show live preview during video processing')
    parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
+    parser.add_argument('--output', type=str, default=None, help='Output video path (auto-generated if not specified)')
    args = parser.parse_args()

    # Initialize detector
-    if args.method == 'retinaface':
+    if args.detector == 'retinaface':
        detector = RetinaFace()
-    elif args.method == 'scrfd':
+    elif args.detector == 'scrfd':
        detector = SCRFD()
-    elif args.method == 'yolov5face':
+    elif args.detector == 'yolov5face':
        from uniface.constants import YOLOv5FaceWeights

        detector = YOLOv5Face(model_name=YOLOv5FaceWeights.YOLOV5M)
@@ -178,7 +198,6 @@ def main():

        detector = YOLOv8Face(model_name=YOLOv8FaceWeights.YOLOV8N)

-    # Determine source type and process
    source_type = get_source_type(args.source)

    if source_type == 'camera':
@@ -192,7 +211,12 @@ def main():
        if not os.path.exists(args.source):
            print(f'Error: Video not found: {args.source}')
            return
-        process_video(detector, args.source, args.threshold, args.save_dir)
+        if args.output:
+            output_path = args.output
+        else:
+            os.makedirs(args.save_dir, exist_ok=True)
+            output_path = os.path.join(args.save_dir, f'{Path(args.source).stem}_detected.mp4')
+        process_video(detector, args.source, output_path, args.threshold, args.preview)
    else:
        print(f"Error: Unknown source type for '{args.source}'")
        print('Supported formats: images (.jpg, .png, ...), videos (.mp4, .avi, ...), or camera ID (0, 1, ...)')
--- a/tools/download_model.py
+++ b/tools/download_model.py
@@ -4,6 +4,7 @@ from uniface.constants import (
    AgeGenderWeights,
    ArcFaceWeights,
    DDAMFNWeights,
+    HeadPoseWeights,
    LandmarkWeights,
    MobileFaceWeights,
    RetinaFaceWeights,
@@ -21,6 +22,7 @@ MODEL_TYPES = {
    'ddamfn': DDAMFNWeights,
    'agegender': AgeGenderWeights,
    'landmark': LandmarkWeights,
+    'headpose': HeadPoseWeights,
 }


--- a/tools/face_emotion.py
+++ b/tools/face_emotion.py
@@ -5,9 +5,9 @@
 """Emotion detection on detected faces.

 Usage:
-    python tools/face_emotion.py --source path/to/image.jpg
-    python tools/face_emotion.py --source path/to/video.mp4
-    python tools/face_emotion.py --source 0  # webcam
+    python tools/emotion.py --source path/to/image.jpg
+    python tools/emotion.py --source path/to/video.mp4
+    python tools/emotion.py --source 0  # webcam
 """

 from __future__ import annotations
@@ -16,27 +16,12 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2

-from uniface import SCRFD, Emotion, RetinaFace
-from uniface.visualization import draw_detections
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
+from uniface.attribute import Emotion
+from uniface.detection import SCRFD, RetinaFace
+from uniface.draw import draw_detections


 def draw_emotion_label(image, bbox, emotion: str, confidence: float):
@@ -71,11 +56,11 @@ def process_image(
    scores = [f.confidence for f in faces]
    landmarks = [f.landmarks for f in faces]
    draw_detections(
-        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
    )

    for i, face in enumerate(faces):
-        result = emotion_predictor.predict(image, face.landmarks)
+        result = emotion_predictor.predict(image, face)
        print(f'  Face {i + 1}: {result.emotion} (confidence: {result.confidence:.3f})')
        draw_emotion_label(image, face.bbox, result.emotion, result.confidence)

@@ -123,11 +108,11 @@ def process_video(
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
        )

        for face in faces:
-            result = emotion_predictor.predict(frame, face.landmarks)
+            result = emotion_predictor.predict(frame, face)
            draw_emotion_label(frame, face.bbox, result.emotion, result.confidence)

        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
@@ -152,9 +137,9 @@ def run_camera(detector, emotion_predictor, camera_id: int = 0, threshold: float

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = detector.detect(frame)

@@ -162,11 +147,11 @@ def run_camera(detector, emotion_predictor, camera_id: int = 0, threshold: float
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
        )

        for face in faces:
-            result = emotion_predictor.predict(frame, face.landmarks)
+            result = emotion_predictor.predict(frame, face)
            draw_emotion_label(frame, face.bbox, result.emotion, result.confidence)

        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
--- a/tools/fairface.py
+++ b/tools/fairface.py
@@ -16,28 +16,12 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2

-from uniface import SCRFD, RetinaFace
 from uniface.attribute import FairFace
-from uniface.visualization import draw_detections
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
+from uniface.detection import SCRFD, RetinaFace
+from uniface.draw import draw_detections


 def draw_fairface_label(image, bbox, sex: str, age_group: str, race: str):
@@ -72,11 +56,11 @@ def process_image(
    scores = [f.confidence for f in faces]
    landmarks = [f.landmarks for f in faces]
    draw_detections(
-        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+        image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
    )

    for i, face in enumerate(faces):
-        result = fairface.predict(image, face.bbox)
+        result = fairface.predict(image, face)
        print(f'  Face {i + 1}: {result.sex}, {result.age_group}, {result.race}')
        draw_fairface_label(image, face.bbox, result.sex, result.age_group, result.race)

@@ -124,11 +108,11 @@ def process_video(
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
        )

        for face in faces:
-            result = fairface.predict(frame, face.bbox)
+            result = fairface.predict(frame, face)
            draw_fairface_label(frame, face.bbox, result.sex, result.age_group, result.race)

        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
@@ -153,9 +137,9 @@ def run_camera(detector, fairface, camera_id: int = 0, threshold: float = 0.6):

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = detector.detect(frame)

@@ -163,11 +147,11 @@ def run_camera(detector, fairface, camera_id: int = 0, threshold: float = 0.6):
        scores = [f.confidence for f in faces]
        landmarks = [f.landmarks for f in faces]
        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
+            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, corner_bbox=True
        )

        for face in faces:
-            result = fairface.predict(frame, face.bbox)
+            result = fairface.predict(frame, face)
            draw_fairface_label(frame, face.bbox, result.sex, result.age_group, result.race)

        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
--- a/tools/faiss_search.py
+++ b/tools/faiss_search.py
@@ -0,0 +1,208 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+"""FAISS index build and multi-identity face search.
+
+Build a vector index from a directory of person sub-folders, then search
+against it in a video or webcam stream.
+
+Usage:
+    python tools/faiss_search.py build --faces-dir dataset/ --db-path ./vector_index
+    python tools/faiss_search.py run   --db-path ./vector_index --source video.mp4
+    python tools/faiss_search.py run   --db-path ./vector_index --source 0  # webcam
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+from pathlib import Path
+
+from _common import IMAGE_EXTENSIONS, get_source_type
+import cv2
+
+from uniface import create_detector, create_recognizer
+from uniface.draw import draw_corner_bbox, draw_text_label
+from uniface.indexing import FAISS
+
+
+def _draw_face(image, bbox, text: str, color: tuple[int, int, int]) -> None:
+    x1, y1, x2, y2 = map(int, bbox[:4])
+    thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
+    font_scale = max(0.4, min(0.7, (y2 - y1) / 200))
+    draw_corner_bbox(image, (x1, y1, x2, y2), color=color, thickness=thickness)
+    draw_text_label(image, text, x1, y1, bg_color=color, font_scale=font_scale)
+
+
+def process_frame(frame, detector, recognizer, store: FAISS, threshold: float = 0.4):
+    faces = detector.detect(frame)
+    if not faces:
+        return frame
+
+    for face in faces:
+        embedding = recognizer.get_normalized_embedding(frame, face.landmarks)
+        result, sim = store.search(embedding, threshold=threshold)
+
+        text = f'{result["person_id"]} ({sim:.2f})' if result else f'Unknown ({sim:.2f})'
+        color = (0, 255, 0) if result else (0, 0, 255)
+        _draw_face(frame, face.bbox, text, color)
+
+    return frame
+
+
+def process_video(detector, recognizer, store: FAISS, video_path: str, save_dir: str, threshold: float = 0.4):
+    cap = cv2.VideoCapture(video_path)
+    if not cap.isOpened():
+        print(f"Error: Cannot open video file '{video_path}'")
+        return
+
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+
+    os.makedirs(save_dir, exist_ok=True)
+    output_path = os.path.join(save_dir, f'{Path(video_path).stem}_faiss_search.mp4')
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+
+    print(f'Processing video: {video_path} ({total_frames} frames)')
+    frame_count = 0
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+
+        frame_count += 1
+        frame = process_frame(frame, detector, recognizer, store, threshold)
+        out.write(frame)
+
+        if frame_count % 100 == 0:
+            print(f'  Processed {frame_count}/{total_frames} frames...')
+
+    cap.release()
+    out.release()
+    print(f'Done! Output saved: {output_path}')
+
+
+def run_camera(detector, recognizer, store: FAISS, camera_id: int = 0, threshold: float = 0.4):
+    cap = cv2.VideoCapture(camera_id)
+    if not cap.isOpened():
+        print(f'Cannot open camera {camera_id}')
+        return
+
+    print("Press 'q' to quit")
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+        frame = cv2.flip(frame, 1)
+
+        frame = process_frame(frame, detector, recognizer, store, threshold)
+
+        cv2.imshow('Vector Search', frame)
+        if cv2.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    cap.release()
+    cv2.destroyAllWindows()
+
+
+def build(args: argparse.Namespace) -> None:
+    faces_dir = Path(args.faces_dir)
+    if not faces_dir.is_dir():
+        print(f"Error: '{faces_dir}' is not a directory")
+        return
+
+    detector = create_detector()
+    recognizer = create_recognizer()
+    store = FAISS(db_path=args.db_path)
+
+    persons = sorted(p.name for p in faces_dir.iterdir() if p.is_dir())
+    if not persons:
+        print(f"Error: No sub-folders found in '{faces_dir}'")
+        return
+
+    print(f'Found {len(persons)} persons: {", ".join(persons)}')
+
+    total_added = 0
+    for person_id in persons:
+        person_dir = faces_dir / person_id
+        images = [f for f in person_dir.iterdir() if f.suffix.lower() in IMAGE_EXTENSIONS]
+
+        added = 0
+        for img_path in images:
+            image = cv2.imread(str(img_path))
+            if image is None:
+                print(f'  Warning: Failed to read {img_path}, skipping')
+                continue
+
+            faces = detector.detect(image)
+            if not faces:
+                print(f'  Warning: No face detected in {img_path}, skipping')
+                continue
+
+            embedding = recognizer.get_normalized_embedding(image, faces[0].landmarks)
+            store.add(embedding, {'person_id': person_id, 'source': str(img_path)})
+            added += 1
+
+        total_added += added
+        if added:
+            print(f'  {person_id}: {added} embeddings added')
+        else:
+            print(f'  {person_id}: no valid faces found')
+
+    store.save()
+    print(f'\nIndex saved to {args.db_path} ({total_added} vectors, {len(persons)} persons)')
+
+
+def run(args: argparse.Namespace) -> None:
+    detector = create_detector()
+    recognizer = create_recognizer()
+
+    store = FAISS(db_path=args.db_path)
+    if not store.load():
+        print(f"Error: No index found at '{args.db_path}'")
+        return
+    print(f'Loaded FAISS index: {store}')
+
+    source_type = get_source_type(args.source)
+
+    if source_type == 'camera':
+        run_camera(detector, recognizer, store, int(args.source), args.threshold)
+    elif source_type == 'video':
+        if not os.path.exists(args.source):
+            print(f'Error: Video not found: {args.source}')
+            return
+        process_video(detector, recognizer, store, args.source, args.save_dir, args.threshold)
+    else:
+        print(f"Error: Source must be a video file or camera ID, not '{args.source}'")
+
+
+def main():
+    parser = argparse.ArgumentParser(description='FAISS vector search')
+    sub = parser.add_subparsers(dest='command', required=True)
+
+    build_p = sub.add_parser('build', help='Build a FAISS index from person sub-folders')
+    build_p.add_argument('--faces-dir', type=str, required=True, help='Directory with person sub-folders')
+    build_p.add_argument('--db-path', type=str, default='./vector_index', help='Where to save the index')
+
+    run_p = sub.add_parser('run', help='Search faces against a FAISS index')
+    run_p.add_argument('--db-path', type=str, required=True, help='Path to saved FAISS index')
+    run_p.add_argument('--source', type=str, required=True, help='Video path or camera ID')
+    run_p.add_argument('--threshold', type=float, default=0.4, help='Similarity threshold')
+    run_p.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
+
+    args = parser.parse_args()
+
+    if args.command == 'build':
+        build(args)
+    elif args.command == 'run':
+        run(args)
+
+
+if __name__ == '__main__':
+    main()
--- a/tools/gaze_estimation.py
+++ b/tools/gaze_estimation.py
@@ -5,9 +5,9 @@
 """Gaze estimation on detected faces.

 Usage:
-    python tools/gaze_estimation.py --source path/to/image.jpg
-    python tools/gaze_estimation.py --source path/to/video.mp4
-    python tools/gaze_estimation.py --source 0  # webcam
+    python tools/gaze.py --source path/to/image.jpg
+    python tools/gaze.py --source path/to/video.mp4
+    python tools/gaze.py --source 0  # webcam
 """

 from __future__ import annotations
@@ -16,29 +16,13 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2
 import numpy as np

-from uniface import RetinaFace
+from uniface.detection import RetinaFace
+from uniface.draw import draw_gaze
 from uniface.gaze import MobileGaze
-from uniface.visualization import draw_gaze
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'


 def process_image(detector, gaze_estimator, image_path: str, save_dir: str = 'outputs'):
--- a/tools/headpose.py
+++ b/tools/headpose.py
@@ -0,0 +1,181 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+"""Head pose estimation on detected faces.
+
+Usage:
+    python tools/headpose.py --source path/to/image.jpg
+    python tools/headpose.py --source path/to/video.mp4
+    python tools/headpose.py --source 0  # webcam
+    python tools/headpose.py --source path/to/image.jpg --draw-type axis
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+from pathlib import Path
+
+from _common import get_source_type
+import cv2
+
+from uniface.detection import RetinaFace
+from uniface.draw import draw_head_pose
+from uniface.headpose import HeadPose
+
+
+def process_image(detector, head_pose_estimator, image_path: str, save_dir: str = 'outputs', draw_type: str = 'cube'):
+    """Process a single image."""
+    image = cv2.imread(image_path)
+    if image is None:
+        print(f"Error: Failed to load image from '{image_path}'")
+        return
+
+    faces = detector.detect(image)
+    print(f'Detected {len(faces)} face(s)')
+
+    for i, face in enumerate(faces):
+        bbox = face.bbox
+        x1, y1, x2, y2 = map(int, bbox[:4])
+        face_crop = image[y1:y2, x1:x2]
+
+        if face_crop.size == 0:
+            continue
+
+        result = head_pose_estimator.estimate(face_crop)
+        print(f'  Face {i + 1}: pitch={result.pitch:.1f}°, yaw={result.yaw:.1f}°, roll={result.roll:.1f}°')
+
+        draw_head_pose(image, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
+
+    os.makedirs(save_dir, exist_ok=True)
+    output_path = os.path.join(save_dir, f'{Path(image_path).stem}_headpose.jpg')
+    cv2.imwrite(output_path, image)
+    print(f'Output saved: {output_path}')
+
+
+def process_video(detector, head_pose_estimator, video_path: str, save_dir: str = 'outputs', draw_type: str = 'cube'):
+    """Process a video file."""
+    cap = cv2.VideoCapture(video_path)
+    if not cap.isOpened():
+        print(f"Error: Cannot open video file '{video_path}'")
+        return
+
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+
+    os.makedirs(save_dir, exist_ok=True)
+    output_path = os.path.join(save_dir, f'{Path(video_path).stem}_headpose.mp4')
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+
+    print(f'Processing video: {video_path} ({total_frames} frames)')
+    frame_count = 0
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+
+        frame_count += 1
+        faces = detector.detect(frame)
+
+        for face in faces:
+            bbox = face.bbox
+            x1, y1, x2, y2 = map(int, bbox[:4])
+            face_crop = frame[y1:y2, x1:x2]
+
+            if face_crop.size == 0:
+                continue
+
+            result = head_pose_estimator.estimate(face_crop)
+            draw_head_pose(frame, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
+
+        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        out.write(frame)
+
+        if frame_count % 100 == 0:
+            print(f'  Processed {frame_count}/{total_frames} frames...')
+
+    cap.release()
+    out.release()
+    print(f'Done! Output saved: {output_path}')
+
+
+def run_camera(detector, head_pose_estimator, camera_id: int = 0, draw_type: str = 'cube'):
+    """Run real-time detection on webcam."""
+    cap = cv2.VideoCapture(camera_id)
+    if not cap.isOpened():
+        print(f'Cannot open camera {camera_id}')
+        return
+
+    print("Press 'q' to quit")
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+
+        frame = cv2.flip(frame, 1)
+        faces = detector.detect(frame)
+
+        for face in faces:
+            bbox = face.bbox
+            x1, y1, x2, y2 = map(int, bbox[:4])
+            face_crop = frame[y1:y2, x1:x2]
+
+            if face_crop.size == 0:
+                continue
+
+            result = head_pose_estimator.estimate(face_crop)
+            draw_head_pose(frame, bbox, result.pitch, result.yaw, result.roll, draw_type=draw_type)
+
+        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        cv2.imshow('Head Pose Estimation', frame)
+
+        if cv2.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    cap.release()
+    cv2.destroyAllWindows()
+
+
+def main():
+    parser = argparse.ArgumentParser(description='Run head pose estimation')
+    parser.add_argument('--source', type=str, required=True, help='Image/video path or camera ID (0, 1, ...)')
+    parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
+    parser.add_argument(
+        '--draw-type',
+        type=str,
+        default='cube',
+        choices=['cube', 'axis'],
+        help='Visualization type: cube (default) or axis',
+    )
+    args = parser.parse_args()
+
+    detector = RetinaFace()
+    head_pose_estimator = HeadPose()
+
+    source_type = get_source_type(args.source)
+
+    if source_type == 'camera':
+        run_camera(detector, head_pose_estimator, int(args.source), args.draw_type)
+    elif source_type == 'image':
+        if not os.path.exists(args.source):
+            print(f'Error: Image not found: {args.source}')
+            return
+        process_image(detector, head_pose_estimator, args.source, args.save_dir, args.draw_type)
+    elif source_type == 'video':
+        if not os.path.exists(args.source):
+            print(f'Error: Video not found: {args.source}')
+            return
+        process_video(detector, head_pose_estimator, args.source, args.save_dir, args.draw_type)
+    else:
+        print(f"Error: Unknown source type for '{args.source}'")
+        print('Supported formats: images (.jpg, .png, ...), videos (.mp4, .avi, ...), or camera ID (0, 1, ...)')
+
+
+if __name__ == '__main__':
+    main()
--- a/tools/landmarks.py
+++ b/tools/landmarks.py
@@ -16,26 +16,11 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2

-from uniface import SCRFD, Landmark106, RetinaFace
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
+from uniface.detection import SCRFD, RetinaFace
+from uniface.landmark import Landmark106


 def process_image(detector, landmarker, image_path: str, save_dir: str = 'outputs'):
@@ -129,9 +114,9 @@ def run_camera(detector, landmarker, camera_id: int = 0):

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        faces = detector.detect(frame)

--- a/tools/face_parsing.py
+++ b/tools/face_parsing.py
@@ -5,9 +5,9 @@
 """Face parsing on detected faces.

 Usage:
-    python tools/face_parsing.py --source path/to/image.jpg
-    python tools/face_parsing.py --source path/to/video.mp4
-    python tools/face_parsing.py --source 0  # webcam
+    python tools/parse.py --source path/to/image.jpg
+    python tools/parse.py --source path/to/video.mp4
+    python tools/parse.py --source 0  # webcam
 """

 from __future__ import annotations
@@ -16,30 +16,14 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2
 import numpy as np

-from uniface import RetinaFace
 from uniface.constants import ParsingWeights
+from uniface.detection import RetinaFace
+from uniface.draw import vis_parsing_maps
 from uniface.parsing import BiSeNet
-from uniface.visualization import vis_parsing_maps
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'


 def expand_bbox(
@@ -225,7 +209,7 @@ def main():
    args = parser_arg.parse_args()

    detector = RetinaFace()
-    parser = BiSeNet(model_name=ParsingWeights.RESNET34)
+    parser = BiSeNet(model_name=args.model)

    source_type = get_source_type(args.source)

--- a/tools/recognition.py
+++ b/tools/recognition.py
@@ -5,8 +5,8 @@
 """Face recognition: extract embeddings or compare two faces.

 Usage:
-    python tools/recognition.py --image path/to/image.jpg
-    python tools/recognition.py --image1 face1.jpg --image2 face2.jpg
+    python tools/recognize.py --image path/to/image.jpg
+    python tools/recognize.py --image1 face1.jpg --image2 face2.jpg
 """

 import argparse
@@ -41,12 +41,13 @@ def run_inference(detector, recognizer, image_path: str):

    print(f'Detected {len(faces)} face(s). Extracting embedding for the first face...')

-    landmarks = faces[0]['landmarks']  # 5-point landmarks for alignment (already np.ndarray)
+    landmarks = faces[0].landmarks
    embedding = recognizer.get_embedding(image, landmarks)
-    norm_embedding = recognizer.get_normalized_embedding(image, landmarks)  # L2 normalized
+    raw_norm = np.linalg.norm(embedding)
+    norm_embedding = embedding.ravel() / raw_norm if raw_norm > 0 else embedding.ravel()

    print(f'  Embedding shape: {embedding.shape}')
-    print(f'  L2 norm (raw): {np.linalg.norm(embedding):.4f}')
+    print(f'  L2 norm (raw): {raw_norm:.4f}')
    print(f'  L2 norm (normalized): {np.linalg.norm(norm_embedding):.4f}')


@@ -65,8 +66,8 @@ def compare_faces(detector, recognizer, image1_path: str, image2_path: str, thre
        print('Error: No faces detected in one or both images')
        return

-    landmarks1 = faces1[0]['landmarks']
-    landmarks2 = faces2[0]['landmarks']
+    landmarks1 = faces1[0].landmarks
+    landmarks2 = faces2[0].landmarks

    embedding1 = recognizer.get_normalized_embedding(img1, landmarks1)
    embedding2 = recognizer.get_normalized_embedding(img2, landmarks2)
--- a/tools/face_search.py
+++ b/tools/face_search.py
@@ -2,11 +2,14 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

-"""Real-time face search: match faces against a reference image.
+"""Single-reference face search on video or webcam.
+
+Given a reference face image, detects faces in the source and shows
+whether each face matches the reference.

 Usage:
-    python tools/face_search.py --reference person.jpg --source 0  # webcam
-    python tools/face_search.py --reference person.jpg --source video.mp4
+    python tools/search.py --reference ref.jpg --source video.mp4
+    python tools/search.py --reference ref.jpg --source 0  # webcam
 """

 from __future__ import annotations
@@ -15,43 +18,16 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2
 import numpy as np

-from uniface.detection import SCRFD, RetinaFace
+from uniface import create_detector, create_recognizer
+from uniface.draw import draw_corner_bbox, draw_text_label
 from uniface.face_utils import compute_similarity
-from uniface.recognition import ArcFace, MobileFace, SphereFace
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
-
-
-def get_recognizer(name: str):
-    """Get recognizer by name."""
-    if name == 'arcface':
-        return ArcFace()
-    elif name == 'mobileface':
-        return MobileFace()
-    else:
-        return SphereFace()


 def extract_reference_embedding(detector, recognizer, image_path: str) -> np.ndarray:
-    """Extract embedding from reference image."""
    image = cv2.imread(image_path)
    if image is None:
        raise RuntimeError(f'Failed to load image: {image_path}')
@@ -60,33 +36,34 @@ def extract_reference_embedding(detector, recognizer, image_path: str) -> np.nda
    if not faces:
        raise RuntimeError('No faces found in reference image.')

-    landmarks = faces[0].landmarks
-    return recognizer.get_normalized_embedding(image, landmarks)
+    return recognizer.get_normalized_embedding(image, faces[0].landmarks)
+
+
+def _draw_face(image, bbox, text: str, color: tuple[int, int, int]) -> None:
+    x1, y1, x2, y2 = map(int, bbox[:4])
+    thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
+    font_scale = max(0.4, min(0.7, (y2 - y1) / 200))
+    draw_corner_bbox(image, (x1, y1, x2, y2), color=color, thickness=thickness)
+    draw_text_label(image, text, x1, y1, bg_color=color, font_scale=font_scale)


 def process_frame(frame, detector, recognizer, ref_embedding: np.ndarray, threshold: float = 0.4):
-    """Process a single frame and return annotated frame."""
    faces = detector.detect(frame)

    for face in faces:
-        bbox = face.bbox
-        landmarks = face.landmarks
-        x1, y1, x2, y2 = map(int, bbox)
-
-        embedding = recognizer.get_normalized_embedding(frame, landmarks)
+        embedding = recognizer.get_normalized_embedding(frame, face.landmarks)
        sim = compute_similarity(ref_embedding, embedding)

-        label = f'Match ({sim:.2f})' if sim > threshold else f'Unknown ({sim:.2f})'
+        text = f'Match ({sim:.2f})' if sim > threshold else f'Unknown ({sim:.2f})'
        color = (0, 255, 0) if sim > threshold else (0, 0, 255)
-
-        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
-        cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
+        _draw_face(frame, face.bbox, text, color)

    return frame


-def process_video(detector, recognizer, ref_embedding: np.ndarray, video_path: str, save_dir: str, threshold: float):
-    """Process a video file."""
+def process_video(
+    detector, recognizer, video_path: str, save_dir: str, ref_embedding: np.ndarray, threshold: float = 0.4
+):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: Cannot open video file '{video_path}'")
@@ -123,7 +100,6 @@ def process_video(detector, recognizer, ref_embedding: np.ndarray, video_path: s


 def run_camera(detector, recognizer, ref_embedding: np.ndarray, camera_id: int = 0, threshold: float = 0.4):
-    """Run real-time face search on webcam."""
    cap = cv2.VideoCapture(camera_id)
    if not cap.isOpened():
        print(f'Cannot open camera {camera_id}')
@@ -133,13 +109,13 @@ def run_camera(detector, recognizer, ref_embedding: np.ndarray, camera_id: int =

    while True:
        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
        if not ret:
            break
+        frame = cv2.flip(frame, 1)

        frame = process_frame(frame, detector, recognizer, ref_embedding, threshold)

-        cv2.imshow('Face Recognition', frame)
+        cv2.imshow('Face Search', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

@@ -148,17 +124,10 @@ def run_camera(detector, recognizer, ref_embedding: np.ndarray, camera_id: int =


 def main():
-    parser = argparse.ArgumentParser(description='Face search using a reference image')
+    parser = argparse.ArgumentParser(description='Single-reference face search')
    parser.add_argument('--reference', type=str, required=True, help='Reference face image')
-    parser.add_argument('--source', type=str, required=True, help='Video path or camera ID (0, 1, ...)')
-    parser.add_argument('--threshold', type=float, default=0.4, help='Match threshold')
-    parser.add_argument('--detector', type=str, default='scrfd', choices=['retinaface', 'scrfd'])
-    parser.add_argument(
-        '--recognizer',
-        type=str,
-        default='arcface',
-        choices=['arcface', 'mobileface', 'sphereface'],
-    )
+    parser.add_argument('--source', type=str, required=True, help='Video path or camera ID')
+    parser.add_argument('--threshold', type=float, default=0.4, help='Similarity threshold')
    parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
    args = parser.parse_args()

@@ -166,8 +135,8 @@ def main():
        print(f'Error: Reference image not found: {args.reference}')
        return

-    detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
-    recognizer = get_recognizer(args.recognizer)
+    detector = create_detector()
+    recognizer = create_recognizer()

    print(f'Loading reference: {args.reference}')
    ref_embedding = extract_reference_embedding(detector, recognizer, args.reference)
@@ -180,10 +149,9 @@ def main():
        if not os.path.exists(args.source):
            print(f'Error: Video not found: {args.source}')
            return
-        process_video(detector, recognizer, ref_embedding, args.source, args.save_dir, args.threshold)
+        process_video(detector, recognizer, args.source, args.save_dir, ref_embedding, args.threshold)
    else:
        print(f"Error: Source must be a video file or camera ID, not '{args.source}'")
-        print('Supported formats: videos (.mp4, .avi, ...) or camera ID (0, 1, ...)')


 if __name__ == '__main__':
--- a/tools/spoofing.py
+++ b/tools/spoofing.py
@@ -16,30 +16,14 @@ import argparse
 import os
 from pathlib import Path

+from _common import get_source_type
 import cv2
 import numpy as np

-from uniface import RetinaFace
 from uniface.constants import MiniFASNetWeights
+from uniface.detection import RetinaFace
 from uniface.spoofing import create_spoofer

-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
-

 def draw_spoofing_result(
    image: np.ndarray,
--- a/tools/track.py
+++ b/tools/track.py
@@ -0,0 +1,199 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+"""Face tracking on video files using ByteTrack.
+
+Usage:
+    python tools/track.py --source video.mp4
+    python tools/track.py --source video.mp4 --output outputs/tracked.mp4
+    python tools/track.py --source 0  # webcam
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+from pathlib import Path
+
+from _common import VIDEO_EXTENSIONS
+import cv2
+import numpy as np
+from tqdm import tqdm
+
+from uniface.common import xyxy_to_cxcywh
+from uniface.detection import SCRFD, RetinaFace
+from uniface.draw import draw_tracks
+from uniface.tracking import BYTETracker
+
+
+def _assign_track_ids(faces, tracks) -> list:
+    """Match tracker outputs back to Face objects by center distance."""
+    if len(tracks) == 0 or len(faces) == 0:
+        return []
+
+    face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
+    track_ids = tracks[:, 4].astype(int)
+
+    face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]  # (N, 2) -> [cx, cy]
+    track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]  # (M, 2) -> [cx, cy]
+
+    for ti in range(len(tracks)):
+        dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
+        faces[int(np.argmin(dists))].track_id = track_ids[ti]
+
+    return [f for f in faces if f.track_id is not None]
+
+
+def process_video(
+    detector,
+    tracker: BYTETracker,
+    input_path: str,
+    output_path: str,
+    threshold: float = 0.5,
+    show_preview: bool = False,
+):
+    """Process a video file with face tracking."""
+    cap = cv2.VideoCapture(input_path)
+    if not cap.isOpened():
+        print(f"Error: Cannot open video file '{input_path}'")
+        return
+
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+    print(f'Input: {input_path} ({width}x{height}, {fps:.1f} fps, {total_frames} frames)')
+    print(f'Output: {output_path}')
+
+    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+
+    if not out.isOpened():
+        print(f"Error: Cannot create output video '{output_path}'")
+        cap.release()
+        return
+
+    frame_count = 0
+    total_tracks = 0
+
+    for _ in tqdm(range(total_frames), desc='Tracking', unit='frames'):
+        ret, frame = cap.read()
+        if not ret:
+            break
+
+        frame_count += 1
+
+        # Detect faces
+        faces = detector.detect(frame)
+        dets = np.array([[*f.bbox, f.confidence] for f in faces if f.confidence >= threshold])
+        dets = dets if len(dets) > 0 else np.empty((0, 5))
+
+        # Update tracker
+        tracks = tracker.update(dets)
+        tracked_faces = _assign_track_ids(faces, tracks)
+        total_tracks += len(tracked_faces)
+
+        # Draw tracked faces
+        draw_tracks(image=frame, faces=tracked_faces)
+
+        cv2.putText(frame, f'Tracks: {len(tracked_faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        out.write(frame)
+
+        if show_preview:
+            cv2.imshow("Tracking - Press 'q' to cancel", frame)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                print('\nCancelled by user')
+                break
+
+    cap.release()
+    out.release()
+    if show_preview:
+        cv2.destroyAllWindows()
+
+    avg_tracks = total_tracks / frame_count if frame_count > 0 else 0
+    print(f'\nDone! {frame_count} frames, {total_tracks} tracks ({avg_tracks:.1f} avg/frame)')
+    print(f'Saved: {output_path}')
+
+
+def run_camera(
+    detector,
+    tracker: BYTETracker,
+    camera_id: int = 0,
+    threshold: float = 0.5,
+):
+    """Run real-time face tracking on webcam."""
+    cap = cv2.VideoCapture(camera_id)
+    if not cap.isOpened():
+        print(f'Cannot open camera {camera_id}')
+        return
+
+    print("Press 'q' to quit")
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+        frame = cv2.flip(frame, 1)
+
+        # Detect faces
+        faces = detector.detect(frame)
+        dets = np.array([[*f.bbox, f.confidence] for f in faces if f.confidence >= threshold])
+        dets = dets if len(dets) > 0 else np.empty((0, 5))
+
+        # Update tracker
+        tracks = tracker.update(dets)
+        tracked_faces = _assign_track_ids(faces, tracks)
+
+        # Draw tracked faces
+        draw_tracks(image=frame, faces=tracked_faces)
+
+        cv2.putText(frame, f'Tracks: {len(tracked_faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        cv2.imshow('Face Tracking', frame)
+
+        if cv2.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    cap.release()
+    cv2.destroyAllWindows()
+
+
+def main():
+    parser = argparse.ArgumentParser(description='Face tracking on video using ByteTrack')
+    parser.add_argument('--source', type=str, required=True, help='Video path or camera ID (0, 1, ...)')
+    parser.add_argument('--output', type=str, default=None, help='Output video path')
+    parser.add_argument('--detector', type=str, default='scrfd', choices=['retinaface', 'scrfd'])
+    parser.add_argument('--threshold', type=float, default=0.5, help='Detection confidence threshold')
+    parser.add_argument('--track-buffer', type=int, default=30, help='Max frames to keep lost tracks')
+    parser.add_argument('--preview', action='store_true', help='Show live preview')
+    parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
+    args = parser.parse_args()
+
+    detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
+    tracker = BYTETracker(track_thresh=args.threshold, track_buffer=args.track_buffer)
+
+    if args.source.isdigit():
+        run_camera(detector, tracker, int(args.source), args.threshold)
+    else:
+        if not os.path.exists(args.source):
+            print(f'Error: Video not found: {args.source}')
+            return
+
+        ext = Path(args.source).suffix.lower()
+        if ext not in VIDEO_EXTENSIONS:
+            print(f"Error: Unsupported format '{ext}'. Supported: {VIDEO_EXTENSIONS}")
+            return
+
+        if args.output:
+            output_path = args.output
+        else:
+            os.makedirs(args.save_dir, exist_ok=True)
+            output_path = os.path.join(args.save_dir, f'{Path(args.source).stem}_tracked.mp4')
+
+        process_video(detector, tracker, args.source, output_path, args.threshold, args.preview)
+
+
+if __name__ == '__main__':
+    main()
--- a/tools/video_detection.py
+++ b/tools/video_detection.py
@@ -1,180 +0,0 @@
-# Copyright 2025-2026 Yakhyokhuja Valikhujaev
-# Author: Yakhyokhuja Valikhujaev
-# GitHub: https://github.com/yakhyo
-
-"""Face detection on video files with progress tracking.
-
-Usage:
-    python tools/video_detection.py --source video.mp4
-    python tools/video_detection.py --source video.mp4 --output output.mp4
-    python tools/video_detection.py --source 0  # webcam
-"""
-
-from __future__ import annotations
-
-import argparse
-import os
-from pathlib import Path
-
-import cv2
-from tqdm import tqdm
-
-from uniface import SCRFD, RetinaFace
-from uniface.visualization import draw_detections
-
-IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.webp', '.tiff'}
-VIDEO_EXTENSIONS = {'.mp4', '.avi', '.mov', '.mkv', '.webm', '.flv'}
-
-
-def get_source_type(source: str) -> str:
-    """Determine if source is image, video, or camera."""
-    if source.isdigit():
-        return 'camera'
-    path = Path(source)
-    suffix = path.suffix.lower()
-    if suffix in IMAGE_EXTENSIONS:
-        return 'image'
-    elif suffix in VIDEO_EXTENSIONS:
-        return 'video'
-    else:
-        return 'unknown'
-
-
-def process_video(
-    detector,
-    input_path: str,
-    output_path: str,
-    threshold: float = 0.6,
-    show_preview: bool = False,
-):
-    """Process a video file with progress bar."""
-    cap = cv2.VideoCapture(input_path)
-    if not cap.isOpened():
-        print(f"Error: Cannot open video file '{input_path}'")
-        return
-
-    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
-    fps = cap.get(cv2.CAP_PROP_FPS)
-    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
-    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
-
-    print(f'Input: {input_path} ({width}x{height}, {fps:.1f} fps, {total_frames} frames)')
-    print(f'Output: {output_path}')
-
-    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
-    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
-    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
-
-    if not out.isOpened():
-        print(f"Error: Cannot create output video '{output_path}'")
-        cap.release()
-        return
-
-    frame_count = 0
-    total_faces = 0
-
-    for _ in tqdm(range(total_frames), desc='Processing', unit='frames'):
-        ret, frame = cap.read()
-        if not ret:
-            break
-
-        frame_count += 1
-        faces = detector.detect(frame)
-        total_faces += len(faces)
-
-        bboxes = [f.bbox for f in faces]
-        scores = [f.confidence for f in faces]
-        landmarks = [f.landmarks for f in faces]
-        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
-        )
-
-        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
-        out.write(frame)
-
-        if show_preview:
-            cv2.imshow("Processing - Press 'q' to cancel", frame)
-            if cv2.waitKey(1) & 0xFF == ord('q'):
-                print('\nCancelled by user')
-                break
-
-    cap.release()
-    out.release()
-    if show_preview:
-        cv2.destroyAllWindows()
-
-    avg_faces = total_faces / frame_count if frame_count > 0 else 0
-    print(f'\nDone! {frame_count} frames, {total_faces} faces ({avg_faces:.1f} avg/frame)')
-    print(f'Saved: {output_path}')
-
-
-def run_camera(detector, camera_id: int = 0, threshold: float = 0.6):
-    """Run real-time detection on webcam."""
-    cap = cv2.VideoCapture(camera_id)
-    if not cap.isOpened():
-        print(f'Cannot open camera {camera_id}')
-        return
-
-    print("Press 'q' to quit")
-
-    while True:
-        ret, frame = cap.read()
-        frame = cv2.flip(frame, 1)
-        if not ret:
-            break
-
-        faces = detector.detect(frame)
-
-        bboxes = [f.bbox for f in faces]
-        scores = [f.confidence for f in faces]
-        landmarks = [f.landmarks for f in faces]
-        draw_detections(
-            image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
-        )
-
-        cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
-        cv2.imshow('Face Detection', frame)
-
-        if cv2.waitKey(1) & 0xFF == ord('q'):
-            break
-
-    cap.release()
-    cv2.destroyAllWindows()
-
-
-def main():
-    parser = argparse.ArgumentParser(description='Process video with face detection')
-    parser.add_argument('--source', type=str, required=True, help='Video path or camera ID (0, 1, ...)')
-    parser.add_argument('--output', type=str, default=None, help='Output video path (auto-generated if not specified)')
-    parser.add_argument('--detector', type=str, default='retinaface', choices=['retinaface', 'scrfd'])
-    parser.add_argument('--threshold', type=float, default=0.6, help='Visualization threshold')
-    parser.add_argument('--preview', action='store_true', help='Show live preview')
-    parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory (if --output not specified)')
-    args = parser.parse_args()
-
-    detector = RetinaFace() if args.detector == 'retinaface' else SCRFD()
-
-    source_type = get_source_type(args.source)
-
-    if source_type == 'camera':
-        run_camera(detector, int(args.source), args.threshold)
-    elif source_type == 'video':
-        if not os.path.exists(args.source):
-            print(f'Error: Video not found: {args.source}')
-            return
-
-        # Determine output path
-        if args.output:
-            output_path = args.output
-        else:
-            os.makedirs(args.save_dir, exist_ok=True)
-            output_path = os.path.join(args.save_dir, f'{Path(args.source).stem}_detected.mp4')
-
-        process_video(detector, args.source, output_path, args.threshold, args.preview)
-    else:
-        print(f"Error: Unknown source type for '{args.source}'")
-        print('Supported formats: videos (.mp4, .avi, ...) or camera ID (0, 1, ...)')
-
-
-if __name__ == '__main__':
-    main()
--- a/tools/xseg.py
+++ b/tools/xseg.py
@@ -0,0 +1,235 @@
+# Copyright 2025-2026 Yakhyokhuja Valikhujaev
+# Author: Yakhyokhuja Valikhujaev
+# GitHub: https://github.com/yakhyo
+
+"""XSeg face segmentation on detected faces.
+
+Usage:
+    python tools/xseg.py --source path/to/image.jpg
+    python tools/xseg.py --source path/to/video.mp4
+    python tools/xseg.py --source 0  # webcam
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+from pathlib import Path
+
+from _common import get_source_type
+import cv2
+import numpy as np
+
+from uniface.detection import RetinaFace
+from uniface.parsing import XSeg
+
+
+def apply_mask_visualization(image: np.ndarray, mask: np.ndarray, alpha: float = 0.5) -> np.ndarray:
+    """Apply colored mask overlay for visualization."""
+    overlay = image.copy().astype(np.float32)
+    mask_3ch = np.stack([mask * 0.3, mask * 0.7, mask * 0.3], axis=-1)
+    overlay = overlay * (1 - mask[..., None] * alpha) + mask_3ch * 255 * alpha
+
+    return overlay.clip(0, 255).astype(np.uint8)
+
+
+def process_image(
+    detector: RetinaFace,
+    parser: XSeg,
+    image_path: str,
+    save_dir: str = 'outputs',
+) -> None:
+    """Process a single image."""
+    image = cv2.imread(image_path)
+    if image is None:
+        print(f"Error: Failed to load image from '{image_path}'")
+        return
+
+    faces = detector.detect(image)
+    print(f'Detected {len(faces)} face(s)')
+
+    if len(faces) == 0:
+        print('No faces detected.')
+        return
+
+    # Accumulate masks from all faces
+    full_mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.float32)
+    for i, face in enumerate(faces):
+        if face.landmarks is None:
+            print(f'  Face {i + 1}: skipped (no landmarks)')
+            continue
+
+        mask = parser.parse(image, landmarks=face.landmarks)
+        full_mask = np.maximum(full_mask, mask)
+        print(f'  Face {i + 1}: done')
+
+    # Apply visualization
+    result_image = apply_mask_visualization(image, full_mask)
+
+    # Draw bounding boxes
+    for face in faces:
+        x1, y1, x2, y2 = map(int, face.bbox[:4])
+        cv2.rectangle(result_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
+
+    # Save results
+    os.makedirs(save_dir, exist_ok=True)
+    output_path = os.path.join(save_dir, f'{Path(image_path).stem}_xseg.jpg')
+    cv2.imwrite(output_path, result_image)
+    print(f'Output saved: {output_path}')
+
+    mask_path = os.path.join(save_dir, f'{Path(image_path).stem}_xseg_mask.png')
+    mask_uint8 = (full_mask * 255).astype(np.uint8)
+    cv2.imwrite(mask_path, mask_uint8)
+    print(f'Mask saved: {mask_path}')
+
+
+def process_video(
+    detector: RetinaFace,
+    parser: XSeg,
+    video_path: str,
+    save_dir: str = 'outputs',
+) -> None:
+    """Process a video file."""
+    cap = cv2.VideoCapture(video_path)
+    if not cap.isOpened():
+        print(f"Error: Cannot open video file '{video_path}'")
+        return
+
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+
+    os.makedirs(save_dir, exist_ok=True)
+    output_path = os.path.join(save_dir, f'{Path(video_path).stem}_xseg.mp4')
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+
+    print(f'Processing video: {video_path} ({total_frames} frames)')
+    frame_count = 0
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+
+        frame_count += 1
+        faces = detector.detect(frame)
+
+        # Accumulate masks from all faces
+        full_mask = np.zeros((frame.shape[0], frame.shape[1]), dtype=np.float32)
+        for face in faces:
+            if face.landmarks is None:
+                continue
+            mask = parser.parse(frame, landmarks=face.landmarks)
+            full_mask = np.maximum(full_mask, mask)
+
+        # Apply visualization
+        result_frame = apply_mask_visualization(frame, full_mask)
+
+        # Draw bounding boxes
+        for face in faces:
+            x1, y1, x2, y2 = map(int, face.bbox[:4])
+            cv2.rectangle(result_frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
+
+        cv2.putText(result_frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        out.write(result_frame)
+
+        if frame_count % 100 == 0:
+            print(f'  Processed {frame_count}/{total_frames} frames...')
+
+    cap.release()
+    out.release()
+    print(f'Done! Output saved: {output_path}')
+
+
+def run_camera(
+    detector: RetinaFace,
+    parser: XSeg,
+    camera_id: int = 0,
+) -> None:
+    """Run real-time detection on webcam."""
+    cap = cv2.VideoCapture(camera_id)
+    if not cap.isOpened():
+        print(f'Cannot open camera {camera_id}')
+        return
+
+    print("Press 'q' to quit")
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            break
+
+        frame = cv2.flip(frame, 1)
+        faces = detector.detect(frame)
+
+        # Accumulate masks from all faces
+        full_mask = np.zeros((frame.shape[0], frame.shape[1]), dtype=np.float32)
+        for face in faces:
+            if face.landmarks is None:
+                continue
+            mask = parser.parse(frame, landmarks=face.landmarks)
+            full_mask = np.maximum(full_mask, mask)
+
+        # Apply visualization
+        result_frame = apply_mask_visualization(frame, full_mask)
+
+        # Draw bounding boxes
+        for face in faces:
+            x1, y1, x2, y2 = map(int, face.bbox[:4])
+            cv2.rectangle(result_frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
+
+        cv2.putText(result_frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+        cv2.imshow('XSeg Face Segmentation', result_frame)
+
+        if cv2.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    cap.release()
+    cv2.destroyAllWindows()
+
+
+def main() -> None:
+    arg_parser = argparse.ArgumentParser(description='XSeg face segmentation')
+    arg_parser.add_argument('--source', type=str, required=True, help='Image/video path or camera ID (0, 1, ...)')
+    arg_parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory')
+    arg_parser.add_argument(
+        '--blur',
+        type=float,
+        default=0,
+        help='Gaussian blur sigma for mask smoothing (default: 0 = raw)',
+    )
+    arg_parser.add_argument(
+        '--align-size',
+        type=int,
+        default=256,
+        help='Face alignment size (default: 256)',
+    )
+    args = arg_parser.parse_args()
+
+    # Initialize models
+    detector = RetinaFace()
+    parser = XSeg(blur_sigma=args.blur, align_size=args.align_size)
+
+    source_type = get_source_type(args.source)
+
+    if source_type == 'camera':
+        run_camera(detector, parser, int(args.source))
+    elif source_type == 'image':
+        if not os.path.exists(args.source):
+            print(f'Error: Image not found: {args.source}')
+            return
+        process_image(detector, parser, args.source, args.save_dir)
+    elif source_type == 'video':
+        if not os.path.exists(args.source):
+            print(f'Error: Video not found: {args.source}')
+            return
+        process_video(detector, parser, args.source, args.save_dir)
+    else:
+        print(f"Error: Unknown source type for '{args.source}'")
+        print('Supported formats: images (.jpg, .png, ...), videos (.mp4, .avi, ...), or camera ID (0, 1, ...)')
+
+
+if __name__ == '__main__':
+    main()
--- a/uniface/init.py
+++ b/uniface/init.py
@@ -16,9 +16,11 @@
 This library provides unified APIs for:
 - Face detection (RetinaFace, SCRFD, YOLOv5Face, YOLOv8Face)
 - Face recognition (AdaFace, ArcFace, MobileFace, SphereFace)
+- Face tracking (ByteTrack with Kalman filtering)
 - Facial landmarks (106-point detection)
 - Face parsing (semantic segmentation)
 - Gaze estimation
+- Head pose estimation
 - Age, gender, and emotion prediction
 - Face anti-spoofing
 - Privacy/anonymization
@@ -28,38 +30,37 @@ from __future__ import annotations

 __license__ = 'MIT'
 __author__ = 'Yakhyokhuja Valikhujaev'
-__version__ = '2.2.1'
+__version__ = '3.2.0'
+
+import contextlib

 from uniface.face_utils import compute_similarity, face_alignment
 from uniface.log import Logger, enable_logging
-from uniface.model_store import verify_model_weights
-from uniface.visualization import draw_detections, vis_parsing_maps
+from uniface.model_store import download_models, get_cache_dir, set_cache_dir, verify_model_weights

 from .analyzer import FaceAnalyzer
-from .attribute import AgeGender, FairFace
+from .attribute import AgeGender, Emotion, FairFace, create_attribute_predictor
 from .detection import (
    SCRFD,
    RetinaFace,
    YOLOv5Face,
    YOLOv8Face,
    create_detector,
-    detect_faces,
    list_available_detectors,
 )
 from .gaze import MobileGaze, create_gaze_estimator
+from .headpose import HeadPose, create_head_pose_estimator
 from .landmark import Landmark106, create_landmarker
-from .parsing import BiSeNet, create_face_parser
-from .privacy import BlurFace, anonymize_faces
+from .parsing import BiSeNet, XSeg, create_face_parser
+from .privacy import BlurFace
 from .recognition import AdaFace, ArcFace, MobileFace, SphereFace, create_recognizer
 from .spoofing import MiniFASNet, create_spoofer
-from .types import AttributeResult, EmotionResult, Face, GazeResult, SpoofingResult
+from .tracking import BYTETracker
+from .types import AttributeResult, EmotionResult, Face, GazeResult, HeadPoseResult, SpoofingResult

-# Optional: Emotion requires PyTorch
-Emotion: type | None
-try:
-    from .attribute import Emotion
-except ImportError:
-    Emotion = None
+# Optional: FAISS vector store (requires `pip install faiss-cpu`)
+with contextlib.suppress(ImportError):
+    from .indexing import FAISS

 __all__ = [
    # Metadata
@@ -73,10 +74,10 @@ __all__ = [
    'create_detector',
    'create_face_parser',
    'create_gaze_estimator',
+    'create_head_pose_estimator',
    'create_landmarker',
    'create_recognizer',
    'create_spoofer',
-    'detect_faces',
    'list_available_detectors',
    # Detection models
    'RetinaFace',
@@ -93,26 +94,35 @@ __all__ = [
    # Gaze models
    'GazeResult',
    'MobileGaze',
+    # Head pose models
+    'HeadPose',
+    'HeadPoseResult',
    # Parsing models
    'BiSeNet',
+    'XSeg',
    # Attribute models
    'AgeGender',
    'AttributeResult',
+    'create_attribute_predictor',
    'Emotion',
    'EmotionResult',
    'FairFace',
    # Spoofing models
    'MiniFASNet',
    'SpoofingResult',
+    # Tracking
+    'BYTETracker',
    # Privacy
    'BlurFace',
-    'anonymize_faces',
+    # Indexing (optional)
+    'FAISS',
    # Utilities
    'Logger',
    'compute_similarity',
-    'draw_detections',
+    'download_models',
    'enable_logging',
    'face_alignment',
+    'get_cache_dir',
+    'set_cache_dir',
    'verify_model_weights',
-    'vis_parsing_maps',
 ]
--- a/uniface/analyzer.py
+++ b/uniface/analyzer.py
@@ -6,8 +6,7 @@ from __future__ import annotations

 import numpy as np

-from uniface.attribute.age_gender import AgeGender
-from uniface.attribute.fairface import FairFace
+from uniface.attribute.base import Attribute
 from uniface.detection.base import BaseDetector
 from uniface.log import Logger
 from uniface.recognition.base import BaseRecognizer
@@ -21,19 +20,24 @@ class FaceAnalyzer:

    This class provides a high-level interface for face analysis by combining
    multiple components: face detection, recognition (embedding extraction),
-    and attribute prediction (age, gender, race).
+    and an extensible list of attribute predictors (age, gender, race,
+    emotion, etc.).
+
+    Any :class:`~uniface.attribute.base.Attribute` subclass can be passed
+    via the ``attributes`` list.  Each predictor's ``predict(image, face)``
+    is called once per detected face, enriching the :class:`Face` in-place.

    Args:
        detector: Face detector instance for detecting faces in images.
        recognizer: Optional face recognizer for extracting embeddings.
-        age_gender: Optional age/gender predictor.
-        fairface: Optional FairFace predictor for demographics.
+        attributes: Optional list of ``Attribute`` predictors to run on
+            each detected face (e.g. ``[AgeGender(), FairFace(), Emotion()]``).

    Example:
-        >>> from uniface import RetinaFace, ArcFace, FaceAnalyzer
+        >>> from uniface import RetinaFace, ArcFace, AgeGender, FaceAnalyzer
        >>> detector = RetinaFace()
        >>> recognizer = ArcFace()
-        >>> analyzer = FaceAnalyzer(detector, recognizer=recognizer)
+        >>> analyzer = FaceAnalyzer(detector, recognizer=recognizer, attributes=[AgeGender()])
        >>> faces = analyzer.analyze(image)
    """

@@ -41,27 +45,23 @@ class FaceAnalyzer:
        self,
        detector: BaseDetector,
        recognizer: BaseRecognizer | None = None,
-        age_gender: AgeGender | None = None,
-        fairface: FairFace | None = None,
+        attributes: list[Attribute] | None = None,
    ) -> None:
        self.detector = detector
        self.recognizer = recognizer
-        self.age_gender = age_gender
-        self.fairface = fairface
+        self.attributes: list[Attribute] = attributes or []

        Logger.info(f'Initialized FaceAnalyzer with detector={detector.__class__.__name__}')
        if recognizer:
            Logger.info(f'  - Recognition enabled: {recognizer.__class__.__name__}')
-        if age_gender:
-            Logger.info(f'  - Age/Gender enabled: {age_gender.__class__.__name__}')
-        if fairface:
-            Logger.info(f'  - FairFace enabled: {fairface.__class__.__name__}')
+        for attr in self.attributes:
+            Logger.info(f'  - Attribute enabled: {attr.__class__.__name__}')

    def analyze(self, image: np.ndarray) -> list[Face]:
        """Analyze faces in an image.

-        Performs face detection and optionally extracts embeddings and
-        predicts attributes for each detected face.
+        Performs face detection, optionally extracts embeddings, and runs
+        every registered attribute predictor on each detected face.

        Args:
            image: Input image as numpy array with shape (H, W, C) in BGR format.
@@ -80,24 +80,13 @@ class FaceAnalyzer:
                except Exception as e:
                    Logger.warning(f'  Face {idx + 1}: Failed to extract embedding: {e}')

-            if self.age_gender is not None:
+            for attr in self.attributes:
+                attr_name = attr.__class__.__name__
                try:
-                    result = self.age_gender.predict(image, face.bbox)
-                    face.gender = result.gender
-                    face.age = result.age
-                    Logger.debug(f'  Face {idx + 1}: Age={face.age}, Gender={face.sex}')
+                    attr.predict(image, face)
+                    Logger.debug(f'  Face {idx + 1}: {attr_name} prediction succeeded')
                except Exception as e:
-                    Logger.warning(f'  Face {idx + 1}: Failed to predict age/gender: {e}')
-
-            if self.fairface is not None:
-                try:
-                    result = self.fairface.predict(image, face.bbox)
-                    face.gender = result.gender
-                    face.age_group = result.age_group
-                    face.race = result.race
-                    Logger.debug(f'  Face {idx + 1}: AgeGroup={face.age_group}, Gender={face.sex}, Race={face.race}')
-                except Exception as e:
-                    Logger.warning(f'  Face {idx + 1}: Failed to predict FairFace attributes: {e}')
+                    Logger.warning(f'  Face {idx + 1}: {attr_name} prediction failed: {e}')

        Logger.info(f'Analysis complete: {len(faces)} face(s) processed')
        return faces
@@ -106,8 +95,6 @@ class FaceAnalyzer:
        parts = [f'FaceAnalyzer(detector={self.detector.__class__.__name__}']
        if self.recognizer:
            parts.append(f'recognizer={self.recognizer.__class__.__name__}')
-        if self.age_gender:
-            parts.append(f'age_gender={self.age_gender.__class__.__name__}')
-        if self.fairface:
-            parts.append(f'fairface={self.fairface.__class__.__name__}')
+        for attr in self.attributes:
+            parts.append(f'{attr.__class__.__name__}')
        return ', '.join(parts) + ')'
--- a/uniface/attribute/init.py
+++ b/uniface/attribute/init.py
@@ -14,16 +14,25 @@ from uniface.attribute.fairface import FairFace
 from uniface.constants import AgeGenderWeights, DDAMFNWeights, FairFaceWeights
 from uniface.types import AttributeResult, EmotionResult, Face

-# Emotion requires PyTorch - make it optional
 try:
    from uniface.attribute.emotion import Emotion

    _EMOTION_AVAILABLE = True
 except ImportError:
-    Emotion = None
    _EMOTION_AVAILABLE = False

-# Public API for the attribute module
+    class Emotion(Attribute):  # type: ignore[no-redef]
+        """Stub for Emotion when PyTorch is not installed."""
+
+        def __init__(self, *args: Any, **kwargs: Any) -> None:
+            raise ImportError("Emotion requires optional dependency 'torch'. Install with: pip install torch")
+
+        def _initialize_model(self) -> None: ...
+        def preprocess(self, image: np.ndarray, *args: Any) -> Any: ...
+        def postprocess(self, prediction: Any) -> Any: ...
+        def predict(self, image: np.ndarray, face: Face) -> Any: ...
+
+
 __all__ = [
    'AgeGender',
    'AttributeResult',
@@ -31,16 +40,13 @@ __all__ = [
    'EmotionResult',
    'FairFace',
    'create_attribute_predictor',
-    'predict_attributes',
 ]

-# A mapping from model enums to their corresponding attribute classes
 _ATTRIBUTE_MODELS = {
    **dict.fromkeys(AgeGenderWeights, AgeGender),
    **dict.fromkeys(FairFaceWeights, FairFace),
 }

-# Add Emotion models only if PyTorch is available
 if _EMOTION_AVAILABLE:
    _ATTRIBUTE_MODELS.update(dict.fromkeys(DDAMFNWeights, Emotion))

@@ -48,21 +54,16 @@ if _EMOTION_AVAILABLE:
 def create_attribute_predictor(
    model_name: AgeGenderWeights | DDAMFNWeights | FairFaceWeights, **kwargs: Any
 ) -> Attribute:
-    """
-    Factory function to create an attribute predictor instance.
-
-    This high-level API simplifies the creation of attribute models by
-    dynamically selecting the correct class based on the provided model enum.
+    """Factory function to create an attribute predictor instance.

    Args:
        model_name: The enum corresponding to the desired attribute model
-                    (e.g., AgeGenderWeights.DEFAULT, DDAMFNWeights.AFFECNET7,
-                    or FairFaceWeights.DEFAULT).
-        **kwargs: Additional keyword arguments to pass to the model's constructor.
+            (e.g., AgeGenderWeights.DEFAULT, DDAMFNWeights.AFFECNET7,
+            or FairFaceWeights.DEFAULT).
+        **kwargs: Additional keyword arguments passed to the model constructor.

    Returns:
-        An initialized instance of an Attribute predictor class
-        (e.g., AgeGender, FairFace, or Emotion).
+        An initialized Attribute predictor (AgeGender, FairFace, or Emotion).

    Raises:
        ValueError: If the provided model_name is not a supported enum.
@@ -75,40 +76,4 @@ def create_attribute_predictor(
            f'Please choose from AgeGenderWeights, FairFaceWeights, or DDAMFNWeights.'
        )

-    # Pass model_name to the constructor, as some classes might need it
    return model_class(model_name=model_name, **kwargs)
-
-
-def predict_attributes(image: np.ndarray, faces: list[Face], predictor: Attribute) -> list[Face]:
-    """
-    High-level API to predict attributes for multiple detected faces.
-
-    This function iterates through a list of Face objects, runs the
-    specified attribute predictor on each one, and updates the Face
-    objects with the predicted attributes.
-
-    Args:
-        image (np.ndarray): The full input image in BGR format.
-        faces (List[Face]): A list of Face objects from face detection.
-        predictor (Attribute): An initialized attribute predictor instance,
-                               created by `create_attribute_predictor`.
-
-    Returns:
-        List[Face]: The list of Face objects with updated attribute fields.
-    """
-    for face in faces:
-        if isinstance(predictor, AgeGender):
-            result = predictor(image, face.bbox)
-            face.gender = result.gender
-            face.age = result.age
-        elif isinstance(predictor, FairFace):
-            result = predictor(image, face.bbox)
-            face.gender = result.gender
-            face.age_group = result.age_group
-            face.race = result.race
-        elif isinstance(predictor, Emotion):
-            result = predictor(image, face.landmarks)
-            face.emotion = result.emotion
-            face.emotion_confidence = result.confidence
-
-    return faces
--- a/uniface/attribute/age_gender.py
+++ b/uniface/attribute/age_gender.py
@@ -12,7 +12,7 @@ from uniface.face_utils import bbox_center_alignment
 from uniface.log import Logger
 from uniface.model_store import verify_model_weights
 from uniface.onnx_utils import create_onnx_session
-from uniface.types import AttributeResult
+from uniface.types import AttributeResult, Face

 __all__ = ['AgeGender']

@@ -133,17 +133,20 @@ class AgeGender(Attribute):
        age = int(np.round(prediction[2] * 100))
        return AttributeResult(gender=gender, age=age)

-    def predict(self, image: np.ndarray, bbox: list | np.ndarray) -> AttributeResult:
-        """
-        Predicts age and gender for a single face specified by a bounding box.
+    def predict(self, image: np.ndarray, face: Face) -> AttributeResult:
+        """Predict age and gender and enrich the Face in-place.

        Args:
-            image (np.ndarray): The full input image in BGR format.
-            bbox (Union[List, np.ndarray]): The face bounding box coordinates [x1, y1, x2, y2].
+            image: The full input image in BGR format.
+            face: Detected face; ``face.bbox`` is used for alignment.

        Returns:
-            AttributeResult: Result containing gender (0=Female, 1=Male) and age (in years).
+            ``AttributeResult`` with gender (0=Female, 1=Male) and age (years).
        """
-        face_blob = self.preprocess(image, bbox)
+        face_blob = self.preprocess(image, face.bbox)
        prediction = self.session.run(self.output_names, {self.input_name: face_blob})[0][0]
-        return self.postprocess(prediction)
+        result = self.postprocess(prediction)
+
+        face.gender = result.gender
+        face.age = result.age
+        return result
--- a/uniface/attribute/base.py
+++ b/uniface/attribute/base.py
@@ -2,95 +2,78 @@
 # Author: Yakhyokhuja Valikhujaev
 # GitHub: https://github.com/yakhyo

+from __future__ import annotations
+
 from abc import ABC, abstractmethod
 from typing import Any

 import numpy as np

-from uniface.types import AttributeResult, EmotionResult
+from uniface.types import AttributeResult, EmotionResult, Face

 __all__ = ['Attribute', 'AttributeResult', 'EmotionResult']


 class Attribute(ABC):
-    """
-    Abstract base class for face attribute models.
+    """Abstract base class for face attribute models.

-    This class defines the common interface that all attribute models
-    (e.g., age-gender, emotion) must implement. It ensures a consistent API
-    across different attribute prediction modules in the library, making them
-    interchangeable and easy to use.
+    All attribute models (age-gender, emotion, FairFace, etc.) implement this
+    interface so they can be used interchangeably inside ``FaceAnalyzer``.
+
+    The ``predict`` method accepts an image and a :class:`Face` object.  Each
+    subclass extracts what it needs (bbox, landmarks) from the Face, runs
+    inference, writes the results back to the Face **and** returns a typed
+    result dataclass.
    """

    @abstractmethod
    def _initialize_model(self) -> None:
-        """
-        Initializes the underlying model for inference.
-
-        This method should handle loading model weights, creating the
-        inference session (e.g., ONNX Runtime, PyTorch), and any necessary
-        warm-up procedures to prepare the model for prediction.
-        """
+        """Load model weights and create the inference session."""
        raise NotImplementedError('Subclasses must implement the _initialize_model method.')

    @abstractmethod
    def preprocess(self, image: np.ndarray, *args: Any) -> Any:
-        """
-        Preprocesses the input data for the model.
-
-        This method should take a raw image and any other necessary data
-        (like bounding boxes or landmarks) and convert it into the format
-        expected by the model's inference engine (e.g., a blob or tensor).
+        """Preprocess the input data for the model.

        Args:
-            image (np.ndarray): The input image containing the face, typically
-                                in BGR format.
-            *args: Additional arguments required for preprocessing, such as
-                   bounding boxes or facial landmarks.
+            image: The input image in BGR format.
+            *args: Subclass-specific data (bbox, landmarks, etc.).

        Returns:
-            The preprocessed data ready for model inference.
+            Preprocessed data ready for model inference.
        """
        raise NotImplementedError('Subclasses must implement the preprocess method.')

    @abstractmethod
    def postprocess(self, prediction: Any) -> Any:
-        """
-        Postprocesses the raw model output into a human-readable format.
-
-        This method takes the raw output from the model's inference and
-        converts it into a meaningful result, such as an age value, a gender
-        label, or an emotion category.
+        """Convert raw model output into a typed result dataclass.

        Args:
-            prediction (Any): The raw output from the model's inference.
+            prediction: Raw output from the model.

        Returns:
-            The final, processed attributes.
+            An ``AttributeResult`` or ``EmotionResult``.
        """
        raise NotImplementedError('Subclasses must implement the postprocess method.')

    @abstractmethod
-    def predict(self, image: np.ndarray, *args: Any) -> Any:
-        """
-        Performs end-to-end attribute prediction on a given image.
+    def predict(self, image: np.ndarray, face: Face) -> AttributeResult | EmotionResult:
+        """Run end-to-end prediction and enrich the Face in-place.

-        This method orchestrates the full pipeline: it calls the preprocess,
-        inference, and postprocess steps to return the final, user-friendly
-        attribute prediction.
+        Each subclass extracts what it needs from *face* (e.g. ``face.bbox``
+        or ``face.landmarks``), runs the full preprocess-infer-postprocess
+        pipeline, writes relevant fields back to *face*, and returns the
+        result dataclass.

        Args:
-            image (np.ndarray): The input image containing the face.
-            *args: Additional data required for prediction, such as a bounding
-                   box or landmarks.
+            image: The full input image in BGR format.
+            face: Detected face whose attribute fields will be populated.

        Returns:
-            The final predicted attributes.
+            The prediction result (``AttributeResult`` or ``EmotionResult``).
        """
        raise NotImplementedError('Subclasses must implement the predict method.')

-    def __call__(self, *args, **kwargs) -> Any:
-        """
-        Provides a convenient, callable shortcut for the `predict` method.
-        """
-        return self.predict(*args, **kwargs)
+    def __call__(self, image: np.ndarray, face: Face) -> AttributeResult | EmotionResult:
+        """Callable shortcut for :meth:`predict`."""
+        return self.predict(image, face)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Yakhyokhuja Valikhujaev	02c77ce5db	feat: Add head pose estimation model (#99 ) * feat: Add Head Pose Estimation with 6 different models * chore: Update jupyter notebook examples * docs: Update head pose estimation related docs	2026-03-26 22:57:05 +09:00
Yakhyokhuja Valikhujaev	d70d6a254f	ref: Unify attribute/detector base classes and fix tools reliability (#98 ) * refactor: unify attribute API, deduplicate detectors, and fix embedding shape * refactor: unify attribute API and deduplicate detector code * chore: Update docs page build on tags and frame validation before flip	2026-03-25 23:43:56 +09:00
Yakhyokhuja Valikhujaev	7d37633b1a	chore: drop Python 3.10 support, bump scikit-image to >=0.26.0 (#96 )	2026-03-19 10:04:52 +09:00
Yakhyokhuja Valikhujaev	bc413df4a8	docs: Add release changelog markdown file (#92 )	2026-03-19 09:46:16 +09:00
Marc-Antoine BERTIN	8db0577991	feat: Add Python 3.14 support (#95 ) - Relax requires-python upper bound from <3.14 to <3.15 - Add Python 3.14 classifier to pyproject.toml - Add Python 3.14 to CI test matrix (ubuntu-latest) - Fix SimilarityTransform.estimate() deprecation warning (scikit-image >=0.26) by switching to SimilarityTransform.from_estimate() class constructor All 147 tests pass on Python 3.14.3 with no warnings. Co-authored-by: marc-antoine <marcantoine.bertin@storyzy.com>	2026-03-19 09:41:44 +09:00
Yakhyokhuja Valikhujaev	3682a2124f	release: Release UniFace version v3.1.0 (#91 ) * release: Release UniFace version v3.1.0 * docs: Change classifiers to stable from beta	2026-03-11 12:21:33 +09:00
Yakhyokhuja Valikhujaev	2ef6a1ebe8	refactor: Use dataclass-based model info in model management (#90 ) - Refactor model management section: Using data classes for more robust model management.	2026-03-11 12:05:43 +09:00
Yakhyokhuja Valikhujaev	78a2dba7c7	feat: Add FAISS vectore database for fast face search (#88 )	2026-03-05 22:46:03 +09:00
Yakhyokhuja Valikhujaev	87e496d1f5	feat: Add FAISS vector DB support for fast search (#86 ) * feat: Add FAISS: VectorDB for face embedding search * docs: Update Documentation	2026-03-03 12:12:05 +09:00
Yakhyokhuja Valikhujaev	5604ebf4f1	docs: Add datasets information in the docs (#85 )	2026-02-18 16:02:37 +09:00
Yakhyokhuja Valikhujaev	971775b2e8	feat: Update API format and gaze estimation models (#82 ) * docs: Update documentation * fix: Update several missing docs and tests * docs: Clean up and remove redundants * fix: Fix the gaze output formula and change the output order * chore: Update model weights for gaze estimation * release: Update release version to v3.0.0	2026-02-14 23:54:51 +09:00
Yakhyokhuja Valikhujaev	c520ea2df2	faet: Add ByteTrack - Multi-Object Tracking by Associating Every Detection Box (#81 ) * feat: Add BYTETrack for face/person tracking * docs: Update documentation * ref: Update tools folder file naming and imports * docs: Update jupyter notebook examples * ref: Rename the file and remove duplicate codes * docs: Update README.md * chore: Update description in mkdocs, add keywords for face tracking * docs: Add announcement section * feat: Remove expand bbox for tracking and update docs	2026-02-12 00:20:23 +09:00
Yakhyokhuja Valikhujaev	2a8cb54d31	feat: Add get and set for cache dir (#80 )	2026-02-09 23:32:02 +09:00
Yakhyokhuja Valikhujaev	331f46be7c	release: Update release version and docs (#79 )	2026-02-05 21:45:28 +09:00
Yakhyokhuja Valikhujaev	9991fae62a	docs: Update UniFace library documentation and README.md (#78 ) * docs: Update wrong/missing references * docs: Update README.md	2026-02-04 20:45:02 +09:00
Yakhyokhuja Valikhujaev	b74ab95d39	docs: Update UniFace github image (#75 )	2026-01-25 17:07:40 +09:00
Yakhyokhuja Valikhujaev	d2b0303bfe	docs: Add additional badges to README.md (#74 ) * Update badges in README.md * Update ci.yml	2026-01-24 22:25:09 +09:00
Yakhyokhuja Valikhujaev	5f74487eb3	feat: Add XSeg for Face Segmentation (#72 ) * feat: Add XSeg for Face Segmentation DeepFaceLab * docs: Update model inference related reference * chore: Update jupyter notebook example for face segmentation	2026-01-22 22:33:31 +09:00