1 Commits

Author SHA1 Message Date
Yakhyokhuja Valikhujaev
6ce397b811 feat: Add MODNet portrait matting (#108)
* feat: Add MODNet portrait matting

* docs: Update docs and example of portrait matting

* fix: Fix linting issue
2026-04-11 23:30:32 +09:00
17 changed files with 1010 additions and 2 deletions

View File

@@ -30,6 +30,7 @@
- **Face Tracking** — Multi-object tracking with [BYTETracker](https://github.com/yakhyo/bytetrack-tracker) for persistent IDs across video frames
- **Facial Landmarks** — 106-point landmark localization module (separate from 5-point detector landmarks)
- **Face Parsing** — BiSeNet semantic segmentation (19 classes), XSeg face masking
- **Portrait Matting** — Trimap-free alpha matte with MODNet (background removal, green screen, compositing)
- **Gaze Estimation** — Real-time gaze direction with MobileGaze
- **Head Pose Estimation** — 3D head orientation (pitch, yaw, roll) with 6D rotation representation
- **Attribute Analysis** — Age, gender, race (FairFace), and emotion
@@ -63,6 +64,9 @@
<tr>
<td align="center" colspan="2"><b>Face Segmentation</b><br><img src="https://raw.githubusercontent.com/yakhyo/uniface/main/assets/demos/segmentation.jpg" width="80%"></td>
</tr>
<tr>
<td align="center" colspan="2"><b>Portrait Matting</b><br><img src="https://raw.githubusercontent.com/yakhyo/uniface/main/assets/demos/matting.jpg" width="100%"></td>
</tr>
<tr>
<td align="center" colspan="2"><b>Face Anonymization</b><br><img src="https://raw.githubusercontent.com/yakhyo/uniface/main/assets/demos/anonymization.jpg" width="100%"></td>
</tr>
@@ -186,6 +190,32 @@ for face in faces:
---
## Example (Portrait Matting)
```python
import cv2
import numpy as np
from uniface.matting import MODNet
matting = MODNet()
image = cv2.imread("portrait.jpg")
matte = matting.predict(image) # (H, W) float32 in [0, 1]
# Transparent PNG
rgba = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)
rgba[:, :, 3] = (matte * 255).astype(np.uint8)
cv2.imwrite("transparent.png", rgba)
# Green screen
matte_3ch = matte[:, :, np.newaxis]
bg = np.full_like(image, (0, 177, 64), dtype=np.uint8)
result = (image * matte_3ch + bg * (1 - matte_3ch)).astype(np.uint8)
cv2.imwrite("green_screen.jpg", result)
```
---
## Jupyter Notebooks
| Example | Colab | Description |
@@ -202,6 +232,7 @@ for face in faces:
| [10_face_vector_store.ipynb](examples/10_face_vector_store.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
| [11_head_pose_estimation.ipynb](examples/11_head_pose_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | Head pose estimation (pitch, yaw, roll) |
| [12_face_recognition.ipynb](examples/12_face_recognition.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/12_face_recognition.ipynb) | Standalone face recognition pipeline |
| [13_portrait_matting.ipynb](examples/13_portrait_matting.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/13_portrait_matting.ipynb) | Portrait matting with MODNet |
---
@@ -279,6 +310,7 @@ If you plan commercial use, verify model license compatibility.
| Parsing | [face-segmentation](https://github.com/yakhyo/face-segmentation) | - | XSeg Face Segmentation |
| Gaze | [gaze-estimation](https://github.com/yakhyo/gaze-estimation) | ✓ | MobileGaze Training |
| Head Pose | [head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) | ✓ | Head Pose Training (6DRepNet-style) |
| Matting | [modnet](https://github.com/yakhyo/modnet) | - | MODNet Portrait Matting |
| Anti-Spoofing | [face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) | - | MiniFASNet Inference |
| Attributes | [fairface-onnx](https://github.com/yakhyo/fairface-onnx) | - | FairFace ONNX Inference |
@@ -302,3 +334,6 @@ Questions or feedback:
## License
This project is licensed under the [MIT License](LICENSE).
> **Disclaimer:** This project is not affiliated with or related to
> [Uniface](https://uniface.com/) by Rocket Software.

BIN
assets/demos/matting.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 938 KiB

View File

@@ -26,6 +26,7 @@ graph TB
HPOSE[Head Pose]
PARSE[Parsing]
SPOOF[Anti-Spoofing]
MATT[Matting]
PRIV[Privacy]
end
@@ -42,6 +43,7 @@ graph TB
end
IMG --> DET
IMG --> MATT
DET --> REC
DET --> LMK
DET --> ATTR
@@ -120,6 +122,7 @@ uniface/
├── landmark/ # 106-point landmarks
├── attribute/ # Age, gender, emotion, race
├── parsing/ # Face semantic segmentation
├── matting/ # Portrait matting (MODNet)
├── gaze/ # Gaze estimation
├── headpose/ # Head pose estimation
├── spoofing/ # Anti-spoofing

View File

@@ -20,5 +20,6 @@ UniFace is released under the [MIT License](https://opensource.org/licenses/MIT)
| SphereFace | [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) | MIT |
| BiSeNet | [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) | MIT |
| MobileGaze | [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) | MIT |
| MODNet | [yakhyo/modnet](https://github.com/yakhyo/modnet) | Apache-2.0 |
| MiniFASNet | [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) | Apache-2.0 |
| FairFace | [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx) | CC BY 4.0 |

View File

@@ -371,6 +371,36 @@ XSeg from DeepFaceLab outputs masks for face regions. Requires 5-point landmarks
---
## Portrait Matting Models
### MODNet
MODNet (Real-Time Trimap-Free Portrait Matting) produces soft alpha mattes from full images without requiring a trimap. Uses MobileNetV2 backbone with low-resolution, high-resolution, and fusion branches.
| Model Name | Variant | Size | Use Case |
| ---------- | ------- | ---- | -------- |
| `PHOTOGRAPHIC` :material-check-circle: | High-quality | 25 MB | Portrait photos |
| `WEBCAM` | Real-time | 25 MB | Webcam feeds |
!!! info "Model Details"
**Paper**: [MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition](https://arxiv.org/abs/2011.11961) (AAAI 2022)
**Source**: [yakhyo/modnet](https://github.com/yakhyo/modnet) — ported weights and clean inference codebase
**Output**: Alpha matte `(H, W)` in `[0, 1]`
**Applications:**
- Background removal / replacement
- Green screen compositing
- Video conferencing virtual backgrounds
- Portrait editing
!!! note "Input Requirements"
Operates on full images (not face crops). No trimap or face detection required.
---
## Anti-Spoofing Models
### MiniFASNet Family
@@ -420,6 +450,7 @@ See [Model Cache & Offline Use](concepts/model-cache-offline.md) for full detail
- **Head Pose Estimation**: [yakhyo/head-pose-estimation](https://github.com/yakhyo/head-pose-estimation) - 6D rotation head pose estimation training and ONNX models
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
- **Face Segmentation**: [yakhyo/face-segmentation](https://github.com/yakhyo/face-segmentation) - XSeg ONNX Inference
- **Portrait Matting**: [yakhyo/modnet](https://github.com/yakhyo/modnet) - MODNet ported weights and inference (from [ZHKKKe/MODNet](https://github.com/ZHKKKe/MODNet))
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
- **FairFace**: [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx) - FairFace ONNX inference for race, gender, age prediction
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
@@ -432,4 +463,5 @@ See [Model Cache & Offline Use](concepts/model-cache-offline.md) for full detail
- **AdaFace**: [AdaFace: Quality Adaptive Margin for Face Recognition](https://arxiv.org/abs/2204.00964)
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
- **MODNet**: [Real-Time Trimap-Free Portrait Matting via Objective Decomposition](https://arxiv.org/abs/2011.11961)
- **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)

157
docs/modules/matting.md Normal file
View File

@@ -0,0 +1,157 @@
# Portrait Matting
Portrait matting produces a soft alpha matte separating the foreground (person) from the background — no trimap needed.
<figure markdown="span">
![Portrait Matting](https://raw.githubusercontent.com/yakhyo/uniface/main/assets/demos/matting.jpg){ width="100%" }
<figcaption>MODNet: Input → Matte → Green Screen</figcaption>
</figure>
---
## Available Models
| Model | Variant | Size | Use Case |
|-------|---------|------|----------|
| **MODNet Photographic** :material-check-circle: | PHOTOGRAPHIC | 25 MB | High-quality portrait photos |
| MODNet Webcam | WEBCAM | 25 MB | Real-time webcam feeds |
---
## Basic Usage
```python
import cv2
from uniface.matting import MODNet
matting = MODNet()
image = cv2.imread("photo.jpg")
matte = matting.predict(image)
print(f"Matte shape: {matte.shape}") # (H, W)
print(f"Matte dtype: {matte.dtype}") # float32
print(f"Matte range: [{matte.min():.2f}, {matte.max():.2f}]") # [0, 1]
```
---
## Model Variants
```python
from uniface.matting import MODNet
from uniface.constants import MODNetWeights
# Photographic (default) — best for photos
matting = MODNet()
# Webcam — optimized for real-time
matting = MODNet(model_name=MODNetWeights.WEBCAM)
# Custom input size
matting = MODNet(input_size=256)
```
| Parameter | Default | Description |
|-----------|---------|-------------|
| `model_name` | `PHOTOGRAPHIC` | Model variant to load |
| `input_size` | `512` | Target shorter-side size for preprocessing |
| `providers` | `None` | ONNX Runtime execution providers |
---
## Applications
### Transparent Background (RGBA)
```python
import cv2
import numpy as np
matting = MODNet()
image = cv2.imread("photo.jpg")
matte = matting.predict(image)
rgba = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)
rgba[:, :, 3] = (matte * 255).astype(np.uint8)
cv2.imwrite("transparent.png", rgba)
```
### Green Screen
```python
import numpy as np
matte_3ch = matte[:, :, np.newaxis]
bg = np.full_like(image, (0, 177, 64), dtype=np.uint8)
green = (image * matte_3ch + bg * (1 - matte_3ch)).astype(np.uint8)
cv2.imwrite("green_screen.jpg", green)
```
### Custom Background
```python
import cv2
import numpy as np
background = cv2.imread("beach.jpg")
background = cv2.resize(background, (image.shape[1], image.shape[0]))
matte_3ch = matte[:, :, np.newaxis]
result = (image * matte_3ch + background * (1 - matte_3ch)).astype(np.uint8)
cv2.imwrite("custom_bg.jpg", result)
```
### Webcam Matting
```python
import cv2
import numpy as np
from uniface.matting import MODNet
matting = MODNet(model_name="modnet_webcam")
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
matte = matting.predict(frame)
matte_3ch = matte[:, :, np.newaxis]
bg = np.full_like(frame, (0, 177, 64), dtype=np.uint8)
result = (frame * matte_3ch + bg * (1 - matte_3ch)).astype(np.uint8)
cv2.imshow("Matting", np.hstack([frame, result]))
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
```
---
## Factory Function
```python
from uniface.matting import create_matting_model
from uniface.constants import MODNetWeights
# Default (Photographic)
matting = create_matting_model()
# With enum
matting = create_matting_model(MODNetWeights.WEBCAM)
# With string
matting = create_matting_model("modnet_webcam")
```
---
## Next Steps
- [Parsing](parsing.md) - Face semantic segmentation
- [Privacy](privacy.md) - Face anonymization
- [Detection](detection.md) - Face detection

View File

@@ -20,6 +20,7 @@ Run UniFace examples directly in your browser with Google Colab, or download and
| [Face Vector Store](https://github.com/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/10_face_vector_store.ipynb) | FAISS-backed face database |
| [Head Pose Estimation](https://github.com/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/11_head_pose_estimation.ipynb) | 3D head orientation estimation |
| [Face Recognition](https://github.com/yakhyo/uniface/blob/main/examples/12_face_recognition.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/12_face_recognition.ipynb) | Standalone face recognition pipeline |
| [Portrait Matting](https://github.com/yakhyo/uniface/blob/main/examples/13_portrait_matting.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yakhyo/uniface/blob/main/examples/13_portrait_matting.ipynb) | Portrait matting with MODNet |
---

View File

@@ -280,6 +280,34 @@ print(f"Detected {len(np.unique(mask))} facial components")
---
## Portrait Matting
Remove backgrounds without a trimap:
```python
import cv2
import numpy as np
from uniface.matting import MODNet
matting = MODNet()
image = cv2.imread("portrait.jpg")
matte = matting.predict(image) # (H, W) float32 in [0, 1]
# Transparent PNG
rgba = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)
rgba[:, :, 3] = (matte * 255).astype(np.uint8)
cv2.imwrite("transparent.png", rgba)
# Green screen
matte_3ch = matte[:, :, np.newaxis]
bg = np.full_like(image, (0, 177, 64), dtype=np.uint8)
result = (image * matte_3ch + bg * (1 - matte_3ch)).astype(np.uint8)
cv2.imwrite("green_screen.jpg", result)
```
---
## Face Anonymization
Blur faces for privacy protection:

View File

@@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Portrait Matting with MODNet\n",
"\n",
"<div style=\"display:flex; flex-wrap:wrap; align-items:center;\">\n",
" <a style=\"margin-right:10px; margin-bottom:6px;\" href=\"https://pepy.tech/projects/uniface\"><img alt=\"PyPI Downloads\" src=\"https://static.pepy.tech/personalized-badge/uniface?period=total&units=international_system&left_color=grey&right_color=blue&left_text=Downloads\"></a>\n",
" <a style=\"margin-right:10px; margin-bottom:6px;\" href=\"https://pypi.org/project/uniface/\"><img alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/uniface.svg\"></a>\n",
" <a style=\"margin-right:10px; margin-bottom:6px;\" href=\"https://opensource.org/licenses/MIT\"><img alt=\"License\" src=\"https://img.shields.io/badge/License-MIT-blue.svg\"></a>\n",
" <a style=\"margin-bottom:6px;\" href=\"https://github.com/yakhyo/uniface\"><img alt=\"GitHub Stars\" src=\"https://img.shields.io/github/stars/yakhyo/uniface.svg?style=social\"></a>\n",
"</div>\n",
"\n",
"**UniFace** is a lightweight, production-ready, all-in-one face analysis library built on ONNX Runtime.\n",
"\n",
"🔗 **GitHub**: [github.com/yakhyo/uniface](https://github.com/yakhyo/uniface) | 📚 **Docs**: [yakhyo.github.io/uniface](https://yakhyo.github.io/uniface)\n",
"\n",
"---\n",
"\n",
"This notebook demonstrates portrait matting using **MODNet** — a trimap-free model that produces soft alpha mattes from full images. No face detection or cropping required.\n",
"\n",
"## 1. Install UniFace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -q uniface\n",
"\n",
"# Clone repo for assets (Colab only)\n",
"import os\n",
"if 'COLAB_GPU' in os.environ or 'COLAB_RELEASE_TAG' in os.environ:\n",
" if not os.path.exists('uniface'):\n",
" !git clone --depth 1 https://github.com/yakhyo/uniface.git\n",
" os.chdir('uniface/examples')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Import Libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cv2\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import uniface\n",
"from uniface.matting import MODNet\n",
"\n",
"print(f\"UniFace version: {uniface.__version__}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Initialize Model\n",
"\n",
"MODNet has two variants:\n",
"- **PHOTOGRAPHIC** (default): optimized for high-quality portrait photos\n",
"- **WEBCAM**: optimized for real-time webcam feeds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"matting = MODNet()\n",
"\n",
"print(f\"Input size: {matting.input_size}\")\n",
"print(f\"Input name: {matting.input_name}\")\n",
"print(f\"Output names: {matting.output_names}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Helper Functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def compose(image, matte, background=None):\n",
" \"\"\"Composite foreground over a background using the alpha matte.\"\"\"\n",
" h, w = image.shape[:2]\n",
" matte_3ch = matte[:, :, np.newaxis]\n",
"\n",
" if background is None:\n",
" bg = np.full_like(image, (0, 177, 64), dtype=np.uint8)\n",
" else:\n",
" bg = cv2.resize(background, (w, h), interpolation=cv2.INTER_AREA)\n",
"\n",
" return (image * matte_3ch + bg * (1 - matte_3ch)).astype(np.uint8)\n",
"\n",
"\n",
"def show_results(image, matte):\n",
" \"\"\"Display original, matte, and green screen as a single merged image.\"\"\"\n",
" matte_vis = cv2.cvtColor((matte * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR)\n",
" green = compose(image, matte)\n",
" merged = np.hstack([image, matte_vis, green])\n",
"\n",
" plt.figure(figsize=(18, 6))\n",
" plt.imshow(cv2.cvtColor(merged, cv2.COLOR_BGR2RGB))\n",
" plt.axis(\"off\")\n",
" plt.tight_layout()\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Basic Matting"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image = cv2.imread(\"../assets/demos/src_portrait1.jpg\")\n",
"print(f\"Image shape: {image.shape}\")\n",
"\n",
"matte = matting.predict(image)\n",
"print(f\"Matte shape: {matte.shape}\")\n",
"print(f\"Matte dtype: {matte.dtype}\")\n",
"print(f\"Matte range: [{matte.min():.3f}, {matte.max():.3f}]\")\n",
"\n",
"show_results(image, matte)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Transparent Background (RGBA)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"alpha = (matte * 255).astype(np.uint8)\n",
"rgba = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)\n",
"rgba[:, :, 3] = alpha\n",
"\n",
"# Checkerboard background to visualize transparency\n",
"h, w = image.shape[:2]\n",
"checker = np.zeros((h, w, 3), dtype=np.uint8)\n",
"block = 20\n",
"for y in range(0, h, block):\n",
" for x in range(0, w, block):\n",
" if (y // block + x // block) % 2 == 0:\n",
" checker[y:y+block, x:x+block] = 200\n",
" else:\n",
" checker[y:y+block, x:x+block] = 255\n",
"\n",
"matte_3ch = matte[:, :, np.newaxis]\n",
"rgba_vis = (image * matte_3ch + checker * (1 - matte_3ch)).astype(np.uint8)\n",
"\n",
"merged = np.hstack([image, rgba_vis])\n",
"\n",
"plt.figure(figsize=(16, 5))\n",
"plt.imshow(cv2.cvtColor(merged, cv2.COLOR_BGR2RGB))\n",
"plt.axis(\"off\")\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"print(f\"RGBA shape: {rgba.shape}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Custom Background"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a gradient background\n",
"h, w = image.shape[:2]\n",
"gradient = np.zeros((h, w, 3), dtype=np.uint8)\n",
"for y in range(h):\n",
" ratio = y / h\n",
" gradient[y, :] = [int(180 * (1 - ratio)), int(100 + 80 * ratio), int(220 * ratio)]\n",
"\n",
"custom_bg = compose(image, matte, gradient)\n",
"green_bg = compose(image, matte)\n",
"\n",
"merged = np.hstack([image, green_bg, custom_bg])\n",
"\n",
"plt.figure(figsize=(18, 6))\n",
"plt.imshow(cv2.cvtColor(merged, cv2.COLOR_BGR2RGB))\n",
"plt.axis(\"off\")\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"MODNet provides trimap-free portrait matting:\n",
"\n",
"- **`predict(image)`** — returns `(H, W)` float32 alpha matte in `[0, 1]`\n",
"- **No face detection needed** — works on full images directly\n",
"- **Two variants** — `PHOTOGRAPHIC` for photos, `WEBCAM` for real-time\n",
"- **Compositing** — use the matte for transparent PNGs, green screen, or custom backgrounds\n",
"\n",
"For more details, see the [Matting docs](https://yakhyo.github.io/uniface/modules/matting/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -150,6 +150,7 @@ nav:
- Landmarks: modules/landmarks.md
- Attributes: modules/attributes.md
- Parsing: modules/parsing.md
- Matting: modules/matting.md
- Gaze: modules/gaze.md
- Head Pose: modules/headpose.md
- Anti-Spoofing: modules/spoofing.md

View File

@@ -1,6 +1,6 @@
[project]
name = "uniface"
version = "3.4.0"
version = "3.5.0"
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Tracking, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
readme = "README.md"
license = "MIT"

158
tests/test_matting.py Normal file
View File

@@ -0,0 +1,158 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
import numpy as np
import pytest
from uniface.constants import MODNetWeights
from uniface.matting import MODNet, create_matting_model
def test_modnet_initialization():
"""Test MODNet initialization with default weights."""
matting = MODNet()
assert matting is not None
assert matting.input_size == 512
def test_modnet_with_webcam_weights():
"""Test MODNet initialization with webcam variant."""
matting = MODNet(model_name=MODNetWeights.WEBCAM)
assert matting is not None
assert matting.input_size == 512
def test_modnet_custom_input_size():
"""Test MODNet with custom input size."""
matting = MODNet(input_size=256)
assert matting.input_size == 256
def test_modnet_preprocess():
"""Test preprocessing produces correct tensor shape and dtype."""
matting = MODNet()
image = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
tensor, orig_h, orig_w = matting.preprocess(image)
assert tensor.dtype == np.float32
assert tensor.ndim == 4
assert tensor.shape[0] == 1
assert tensor.shape[1] == 3
assert tensor.shape[2] % 32 == 0
assert tensor.shape[3] % 32 == 0
assert orig_h == 480
assert orig_w == 640
def test_modnet_preprocess_small_image():
"""Test preprocessing with image smaller than input_size."""
matting = MODNet(input_size=512)
image = np.random.randint(0, 255, (128, 128, 3), dtype=np.uint8)
tensor, orig_h, orig_w = matting.preprocess(image)
assert tensor.shape[2] % 32 == 0
assert tensor.shape[3] % 32 == 0
assert orig_h == 128
assert orig_w == 128
def test_modnet_preprocess_large_image():
"""Test preprocessing with image larger than input_size."""
matting = MODNet(input_size=512)
image = np.random.randint(0, 255, (1080, 1920, 3), dtype=np.uint8)
tensor, orig_h, orig_w = matting.preprocess(image)
assert tensor.shape[2] % 32 == 0
assert tensor.shape[3] % 32 == 0
assert orig_h == 1080
assert orig_w == 1920
def test_modnet_postprocess():
"""Test postprocessing resizes matte to original dimensions."""
matting = MODNet()
dummy_output = np.random.rand(1, 1, 512, 672).astype(np.float32)
matte = matting.postprocess(dummy_output, original_size=(640, 480))
assert matte.shape == (480, 640)
assert matte.dtype == np.float32
def test_modnet_predict():
"""Test end-to-end prediction."""
matting = MODNet()
image = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
matte = matting.predict(image)
assert matte.shape == (480, 640)
assert matte.dtype == np.float32
assert matte.min() >= 0.0
assert matte.max() <= 1.0
def test_modnet_callable():
"""Test that MODNet is callable via __call__."""
matting = MODNet()
image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
matte = matting(image)
assert matte.shape == (256, 256)
assert matte.dtype == np.float32
def test_modnet_different_input_sizes():
"""Test prediction with various image dimensions."""
matting = MODNet()
sizes = [(256, 256), (480, 640), (720, 1280), (300, 500)]
for h, w in sizes:
image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
matte = matting.predict(image)
assert matte.shape == (h, w), f'Failed for size {h}x{w}'
assert matte.dtype == np.float32
# Factory tests
def test_create_matting_model_default():
"""Test factory with default parameters."""
matting = create_matting_model()
assert matting is not None
assert isinstance(matting, MODNet)
def test_create_matting_model_with_enum():
"""Test factory with enum."""
matting = create_matting_model(MODNetWeights.WEBCAM)
assert isinstance(matting, MODNet)
def test_create_matting_model_with_string():
"""Test factory with string model name."""
matting = create_matting_model('modnet_photographic')
assert isinstance(matting, MODNet)
def test_create_matting_model_webcam_string():
"""Test factory with webcam string model name."""
matting = create_matting_model('modnet_webcam')
assert isinstance(matting, MODNet)
def test_create_matting_model_invalid():
"""Test factory with invalid model name."""
with pytest.raises(ValueError, match='Unknown matting model'):
create_matting_model('invalid_model')

View File

@@ -19,6 +19,7 @@ This library provides unified APIs for:
- Face tracking (ByteTrack with Kalman filtering)
- Facial landmarks (106-point detection)
- Face parsing (semantic segmentation)
- Portrait matting (trimap-free alpha matte)
- Gaze estimation
- Head pose estimation
- Age, gender, and emotion prediction
@@ -30,7 +31,7 @@ from __future__ import annotations
__license__ = 'MIT'
__author__ = 'Yakhyokhuja Valikhujaev'
__version__ = '3.4.0'
__version__ = '3.5.0'
import contextlib
@@ -51,6 +52,7 @@ from .detection import (
from .gaze import MobileGaze, create_gaze_estimator
from .headpose import HeadPose, create_head_pose_estimator
from .landmark import Landmark106, create_landmarker
from .matting import MODNet, create_matting_model
from .parsing import BiSeNet, XSeg, create_face_parser
from .privacy import BlurFace
from .recognition import AdaFace, ArcFace, EdgeFace, MobileFace, SphereFace, create_recognizer
@@ -74,6 +76,7 @@ __all__ = [
'create_detector',
'create_face_parser',
'create_gaze_estimator',
'create_matting_model',
'create_head_pose_estimator',
'create_landmarker',
'create_recognizer',
@@ -98,6 +101,8 @@ __all__ = [
# Head pose models
'HeadPose',
'HeadPoseResult',
# Matting models
'MODNet',
# Parsing models
'BiSeNet',
'XSeg',

View File

@@ -201,6 +201,15 @@ class XSegWeights(str, Enum):
DEFAULT = "xseg"
class MODNetWeights(str, Enum):
"""
MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition.
https://github.com/yakhyo/modnet
"""
PHOTOGRAPHIC = "modnet_photographic"
WEBCAM = "modnet_webcam"
class MiniFASNetWeights(str, Enum):
"""
MiniFASNet: Lightweight Face Anti-Spoofing models.
@@ -443,6 +452,16 @@ MODEL_REGISTRY: dict[Enum, ModelInfo] = {
url='https://github.com/yakhyo/face-segmentation/releases/download/weights/xseg.onnx',
sha256='0b57328efcb839d85973164b617ceee9dfe6cfcb2c82e8a033bba9f4f09b27e5'
),
# MODNet (Portrait Matting)
MODNetWeights.PHOTOGRAPHIC: ModelInfo(
url='https://github.com/yakhyo/modnet/releases/download/weights/modnet_photographic.onnx',
sha256='5069a5e306b9f5e9f4f2b0360264c9f8ea13b257c7c39943c7cf6a2ec3a102ae'
),
MODNetWeights.WEBCAM: ModelInfo(
url='https://github.com/yakhyo/modnet/releases/download/weights/modnet_webcam.onnx',
sha256='de03cc16f3c91f25b7c2f0b42ea1a8d34f40a752234f3887572655e744e55306'
),
}

View File

@@ -0,0 +1,53 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
from uniface.constants import MODNetWeights
from .base import BaseMatting
from .modnet import MODNet
__all__ = ['BaseMatting', 'MODNet', 'create_matting_model']
def create_matting_model(
model_name: str | MODNetWeights = MODNetWeights.PHOTOGRAPHIC,
**kwargs,
) -> BaseMatting:
"""Factory function to create a portrait matting model.
Args:
model_name: Model to create. Options: ``MODNetWeights.PHOTOGRAPHIC``
(high-quality photos), ``MODNetWeights.WEBCAM`` (real-time webcam).
Also accepts string values like ``"modnet_photographic"`` or
``"modnet_webcam"``.
**kwargs: Additional arguments passed to the model constructor
(e.g. ``input_size``, ``providers``).
Returns:
An instance of the requested matting model.
Raises:
ValueError: If the model_name is not recognized.
Example:
>>> matting = create_matting_model()
>>> matte = matting.predict(image)
"""
if isinstance(model_name, MODNetWeights):
return MODNet(model_name=model_name, **kwargs)
if isinstance(model_name, str):
try:
weights = MODNetWeights(model_name)
return MODNet(model_name=weights, **kwargs)
except ValueError:
pass
valid_models = [m.value for m in MODNetWeights]
raise ValueError(f"Unknown matting model: '{model_name}'. Valid options are: {', '.join(valid_models)}")
valid_models = [m.value for m in MODNetWeights]
raise ValueError(f"Unknown matting model: '{model_name}'. Valid options are: {', '.join(valid_models)}")

88
uniface/matting/base.py Normal file
View File

@@ -0,0 +1,88 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
from abc import ABC, abstractmethod
import numpy as np
class BaseMatting(ABC):
"""Abstract base class for portrait matting models.
Unlike face parsers that operate on face crops and produce class labels or
face-region masks, matting models operate on full images and produce a soft
alpha matte (float32 in [0, 1]) separating foreground from background.
Subclasses must implement the full pipeline: model initialization,
preprocessing, postprocessing, and the ``predict`` entry point.
"""
@abstractmethod
def _initialize_model(self) -> None:
"""Initialize the underlying model for inference.
This method should handle loading model weights, creating the
inference session (e.g., ONNX Runtime), and any necessary
setup procedures to prepare the model for prediction.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
@abstractmethod
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, int, int]:
"""Preprocess the input image for model inference.
Args:
image: An image in BGR format with shape ``(H, W, 3)``.
Returns:
A tuple of ``(tensor, orig_h, orig_w)`` where *tensor* is the
preprocessed image ready for inference.
"""
raise NotImplementedError('Subclasses must implement the preprocess method.')
@abstractmethod
def postprocess(self, outputs: np.ndarray, original_size: tuple[int, int]) -> np.ndarray:
"""Postprocess raw model outputs into an alpha matte.
Args:
outputs: Raw outputs from the model inference.
original_size: Original image size as ``(width, height)``.
Returns:
Alpha matte with shape ``(H, W)`` and values in ``[0, 1]``.
"""
raise NotImplementedError('Subclasses must implement the postprocess method.')
@abstractmethod
def predict(self, image: np.ndarray) -> np.ndarray:
"""Run end-to-end matting on an image.
Args:
image: An image in BGR format with shape ``(H, W, 3)``.
Returns:
Alpha matte with shape ``(H, W)``, float32 in ``[0, 1]``.
Example:
>>> matting = create_matting_model()
>>> matte = matting.predict(image)
>>> print(f'Matte shape: {matte.shape}, dtype: {matte.dtype}')
"""
raise NotImplementedError('Subclasses must implement the predict method.')
def __call__(self, image: np.ndarray) -> np.ndarray:
"""Callable shortcut for :meth:`predict`.
Args:
image: An image in BGR format with shape ``(H, W, 3)``.
Returns:
Alpha matte with shape ``(H, W)``, float32 in ``[0, 1]``.
"""
return self.predict(image)

162
uniface/matting/modnet.py Normal file
View File

@@ -0,0 +1,162 @@
# Copyright 2025-2026 Yakhyokhuja Valikhujaev
# Author: Yakhyokhuja Valikhujaev
# GitHub: https://github.com/yakhyo
from __future__ import annotations
import cv2
import numpy as np
from uniface.constants import MODNetWeights
from uniface.log import Logger
from uniface.model_store import verify_model_weights
from uniface.onnx_utils import create_onnx_session
from .base import BaseMatting
__all__ = ['MODNet']
STRIDE = 32
class MODNet(BaseMatting):
"""MODNet: Real-Time Trimap-Free Portrait Matting with ONNX Runtime.
MODNet produces a soft alpha matte from a full image without requiring
a trimap. It uses a MobileNetV2 backbone with low-resolution, high-resolution,
and fusion branches to generate accurate mattes at real-time speed.
Two pretrained variants are available:
- ``PHOTOGRAPHIC``: optimized for high-quality portrait photos.
- ``WEBCAM``: optimized for real-time webcam feeds.
Reference:
Ke et al., "MODNet: Real-Time Trimap-Free Portrait Matting via
Objective Decomposition", AAAI 2022.
https://github.com/yakhyo/modnet
Args:
model_name: The enum specifying the MODNet variant to load.
Defaults to ``MODNetWeights.PHOTOGRAPHIC``.
input_size: Target size for the shorter side during preprocessing.
The image is resized so its shorter side equals this value
(aspect ratio preserved), then both dimensions are floored to
multiples of 32. Defaults to 512.
providers: ONNX Runtime execution providers. If ``None``, auto-detects
the best available provider.
Attributes:
input_size (int): Target shorter-side size for preprocessing.
Example:
>>> from uniface.matting import MODNet
>>>
>>> matting = MODNet()
>>> matte = matting.predict(image) # (H, W) float32 in [0, 1]
>>>
>>> # Composite onto green background
>>> import numpy as np
>>> bg = np.full_like(image, (0, 177, 64), dtype=np.uint8)
>>> alpha = matte[..., np.newaxis]
>>> result = (image * alpha + bg * (1 - alpha)).astype(np.uint8)
"""
def __init__(
self,
model_name: MODNetWeights = MODNetWeights.PHOTOGRAPHIC,
input_size: int = 512,
providers: list[str] | None = None,
) -> None:
Logger.info(f'Initializing MODNet with model={model_name}, input_size={input_size}')
self.input_size = input_size
self.providers = providers
self.model_path = verify_model_weights(model_name)
self._initialize_model()
def _initialize_model(self) -> None:
"""Initialize the ONNX model from the stored model path.
Raises:
RuntimeError: If the model fails to load or initialize.
"""
try:
self.session = create_onnx_session(self.model_path, providers=self.providers)
input_cfg = self.session.get_inputs()[0]
self.input_name = input_cfg.name
outputs = self.session.get_outputs()
self.output_names = [output.name for output in outputs]
Logger.info(f'MODNet initialized with input_size={self.input_size}')
except Exception as e:
Logger.error(f"Failed to load MODNet model from '{self.model_path}'", exc_info=True)
raise RuntimeError(f'Failed to initialize MODNet model: {e}') from e
def preprocess(self, image: np.ndarray) -> tuple[np.ndarray, int, int]:
"""Preprocess a BGR image for MODNet inference.
The image is converted to RGB, resized so its shorter side matches
``input_size`` (aspect ratio preserved), floored to multiples of 32,
and normalized to ``[-1, 1]``.
Args:
image: Input image in BGR format with shape ``(H, W, 3)``.
Returns:
A tuple of ``(tensor, orig_h, orig_w)`` where *tensor* has shape
``(1, 3, H', W')`` in float32.
"""
orig_h, orig_w = image.shape[:2]
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
if max(orig_h, orig_w) < self.input_size or min(orig_h, orig_w) > self.input_size:
if orig_w >= orig_h:
new_h = self.input_size
new_w = int(orig_w / orig_h * self.input_size)
else:
new_w = self.input_size
new_h = int(orig_h / orig_w * self.input_size)
else:
new_h, new_w = orig_h, orig_w
new_h = new_h - (new_h % STRIDE)
new_w = new_w - (new_w % STRIDE)
rgb = cv2.resize(rgb, (new_w, new_h), interpolation=cv2.INTER_AREA)
x = rgb.astype(np.float32) / 255.0
x = (x - 0.5) / 0.5
x = np.transpose(x, (2, 0, 1))
return np.expand_dims(x, axis=0), orig_h, orig_w
def postprocess(self, outputs: np.ndarray, original_size: tuple[int, int]) -> np.ndarray:
"""Postprocess raw model output into an alpha matte.
Args:
outputs: Raw ONNX output with shape ``(1, 1, H', W')``.
original_size: Target size as ``(width, height)``.
Returns:
Alpha matte with shape ``(H, W)``, float32 in ``[0, 1]``.
"""
matte = outputs[0, 0]
matte = cv2.resize(matte, original_size, interpolation=cv2.INTER_AREA)
return matte
def predict(self, image: np.ndarray) -> np.ndarray:
"""Run portrait matting on a BGR image.
Args:
image: Input image in BGR format with shape ``(H, W, 3)``.
Returns:
Alpha matte with shape ``(H, W)``, float32 in ``[0, 1]``.
"""
tensor, orig_h, orig_w = self.preprocess(image)
outputs = self.session.run(self.output_names, {self.input_name: tensor})
return self.postprocess(outputs[0], (orig_w, orig_h))