mirror of
https://github.com/yakhyo/uniface.git
synced 2025-12-30 09:02:25 +00:00
Compare commits
16 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
7c98a60d26 | ||
|
|
d97a3b2cb2 | ||
|
|
2200ba063c | ||
|
|
9bcbfa65c2 | ||
|
|
96306a0910 | ||
|
|
3389aa3e4c | ||
|
|
b282e6ccc1 | ||
|
|
d085c6a822 | ||
|
|
13b518e96d | ||
|
|
1b877bc9fc | ||
|
|
bb1d209f3b | ||
|
|
54b769c0f1 | ||
|
|
4d1921e531 | ||
|
|
da8a5cf35b | ||
|
|
3982d677a9 | ||
|
|
f4458f0550 |
BIN
.github/logos/gaze_crop.png
vendored
Normal file
BIN
.github/logos/gaze_crop.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 716 KiB |
BIN
.github/logos/gaze_org.png
vendored
Normal file
BIN
.github/logos/gaze_org.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 673 KiB |
26
.github/workflows/ci.yml
vendored
26
.github/workflows/ci.yml
vendored
@@ -10,14 +10,20 @@ on:
|
||||
- main
|
||||
- develop
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 15
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ["3.10", "3.11", "3.12", "3.13"]
|
||||
os: [ubuntu-latest, macos-latest, windows-latest]
|
||||
python-version: ["3.11", "3.13"]
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
@@ -27,7 +33,7 @@ jobs:
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
cache: 'pip'
|
||||
cache: "pip"
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
@@ -38,21 +44,18 @@ jobs:
|
||||
run: |
|
||||
python -c "import onnxruntime as ort; print('Available providers:', ort.get_available_providers())"
|
||||
|
||||
- name: Lint with ruff (if available)
|
||||
run: |
|
||||
pip install ruff || true
|
||||
ruff check . --exit-zero || true
|
||||
continue-on-error: true
|
||||
- name: Lint with ruff
|
||||
run: ruff check .
|
||||
|
||||
- name: Run tests
|
||||
run: pytest -v --tb=short
|
||||
|
||||
- name: Test package imports
|
||||
run: |
|
||||
python -c "from uniface import RetinaFace, ArcFace, Landmark106, AgeGender; print('All imports successful')"
|
||||
run: python -c "import uniface; print(f'uniface {uniface.__version__} loaded with {len(uniface.__all__)} exports')"
|
||||
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 10
|
||||
needs: test
|
||||
|
||||
steps:
|
||||
@@ -63,7 +66,7 @@ jobs:
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.10"
|
||||
cache: 'pip'
|
||||
cache: "pip"
|
||||
|
||||
- name: Install build tools
|
||||
run: |
|
||||
@@ -84,4 +87,3 @@ jobs:
|
||||
name: dist-python-${{ github.sha }}
|
||||
path: dist/
|
||||
retention-days: 7
|
||||
|
||||
|
||||
16
.github/workflows/publish.yml
vendored
16
.github/workflows/publish.yml
vendored
@@ -5,9 +5,14 @@ on:
|
||||
tags:
|
||||
- "v*.*.*" # Trigger only on version tags like v0.1.9
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
validate:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 5
|
||||
outputs:
|
||||
version: ${{ steps.get_version.outputs.version }}
|
||||
tag_version: ${{ steps.get_version.outputs.tag_version }}
|
||||
@@ -16,13 +21,18 @@ jobs:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Get version from tag and pyproject.toml
|
||||
id: get_version
|
||||
run: |
|
||||
TAG_VERSION=${GITHUB_REF#refs/tags/v}
|
||||
echo "tag_version=$TAG_VERSION" >> $GITHUB_OUTPUT
|
||||
|
||||
PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' pyproject.toml)
|
||||
PYPROJECT_VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml','rb'))['project']['version'])")
|
||||
echo "version=$PYPROJECT_VERSION" >> $GITHUB_OUTPUT
|
||||
|
||||
echo "Tag version: v$TAG_VERSION"
|
||||
@@ -38,12 +48,13 @@ jobs:
|
||||
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 15
|
||||
needs: validate
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ["3.10", "3.11", "3.12", "3.13"]
|
||||
python-version: ["3.11", "3.13"]
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
@@ -65,6 +76,7 @@ jobs:
|
||||
|
||||
publish:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 10
|
||||
needs: [validate, test]
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
@@ -21,11 +21,28 @@ Thank you for considering contributing to UniFace! We welcome contributions of a
|
||||
|
||||
### Code Style
|
||||
|
||||
This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
|
||||
|
||||
```bash
|
||||
# Check for linting errors
|
||||
ruff check .
|
||||
|
||||
# Auto-fix linting errors
|
||||
ruff check . --fix
|
||||
|
||||
# Format code
|
||||
ruff format .
|
||||
```
|
||||
|
||||
**Guidelines:**
|
||||
- Follow PEP8 guidelines
|
||||
- Use type hints (Python 3.10+)
|
||||
- Write docstrings for public APIs
|
||||
- Line length: 120 characters
|
||||
- Keep code simple and readable
|
||||
|
||||
All PRs must pass `ruff check .` before merging.
|
||||
|
||||
## Development Setup
|
||||
|
||||
```bash
|
||||
@@ -51,8 +68,14 @@ Example notebooks demonstrating library usage:
|
||||
| Face Recognition | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
||||
| Face Verification | [face_verification.ipynb](examples/face_verification.ipynb) |
|
||||
| Face Search | [face_search.ipynb](examples/face_search.ipynb) |
|
||||
| Face Anonymization | [face_anonymization.ipynb](examples/face_anonymization.ipynb) |
|
||||
|
||||
## Questions?
|
||||
|
||||
Open an issue or start a discussion on GitHub.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
180
MODELS.md
180
MODELS.md
@@ -80,10 +80,11 @@ detector = SCRFD(
|
||||
|
||||
YOLOv5-Face models provide excellent detection accuracy with 5-point facial landmarks, optimized for real-time applications.
|
||||
|
||||
| Model Name | Params | Size | Easy | Medium | Hard | FLOPs (G) | Use Case |
|
||||
| -------------- | ------ | ---- | ------ | ------ | ------ | --------- | ------------------------------ |
|
||||
| `YOLOV5S` ⭐ | 7.1M | 28MB | 94.33% | 92.61% | 83.15% | 5.751 | **Real-time + accuracy** |
|
||||
| `YOLOV5M` | 21.1M | 84MB | 95.30% | 93.76% | 85.28% | 18.146 | High accuracy |
|
||||
| Model Name | Size | Easy | Medium | Hard | Use Case |
|
||||
| -------------- | ---- | ------ | ------ | ------ | ------------------------------ |
|
||||
| `YOLOV5N` | 11MB | 93.61% | 91.52% | 80.53% | Lightweight/Mobile |
|
||||
| `YOLOV5S` ⭐ | 28MB | 94.33% | 92.61% | 83.15% | **Real-time + accuracy** |
|
||||
| `YOLOV5M` | 82MB | 95.30% | 93.76% | 85.28% | High accuracy |
|
||||
|
||||
**Accuracy**: WIDER FACE validation set - from [YOLOv5-Face paper](https://arxiv.org/abs/2105.12931)
|
||||
**Speed**: Benchmark on your own hardware using `scripts/run_detection.py --iterations 100`
|
||||
@@ -95,6 +96,13 @@ YOLOv5-Face models provide excellent detection accuracy with 5-point facial land
|
||||
from uniface import YOLOv5Face
|
||||
from uniface.constants import YOLOv5FaceWeights
|
||||
|
||||
# Lightweight/Mobile
|
||||
detector = YOLOv5Face(
|
||||
model_name=YOLOv5FaceWeights.YOLOV5N,
|
||||
conf_thresh=0.6,
|
||||
nms_thresh=0.5
|
||||
)
|
||||
|
||||
# Real-time detection (recommended)
|
||||
detector = YOLOv5Face(
|
||||
model_name=YOLOv5FaceWeights.YOLOV5S,
|
||||
@@ -251,9 +259,9 @@ landmarks = landmarker.get_landmarks(image, bbox)
|
||||
from uniface import AgeGender
|
||||
|
||||
predictor = AgeGender()
|
||||
gender_id, age = predictor.predict(image, bbox)
|
||||
# Returns: (gender_id, age_in_years)
|
||||
# gender_id: 0 for Female, 1 for Male
|
||||
gender, age = predictor.predict(image, bbox)
|
||||
# Returns: (gender, age_in_years)
|
||||
# gender: 0 for Female, 1 for Male
|
||||
```
|
||||
|
||||
---
|
||||
@@ -283,6 +291,160 @@ emotion, confidence = predictor.predict(image, landmarks)
|
||||
|
||||
---
|
||||
|
||||
## Gaze Estimation Models
|
||||
|
||||
### MobileGaze Family
|
||||
|
||||
Real-time gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
|
||||
|
||||
| Model Name | Params | Size | MAE* | Use Case |
|
||||
| -------------- | ------ | ------- | ----- | ----------------------------- |
|
||||
| `RESNET18` | 11.7M | 43 MB | 12.84 | Balanced accuracy/speed |
|
||||
| `RESNET34` ⭐ | 24.8M | 81.6 MB | 11.33 | **Recommended default** |
|
||||
| `RESNET50` | 25.6M | 91.3 MB | 11.34 | High accuracy |
|
||||
| `MOBILENET_V2` | 3.5M | 9.59 MB | 13.07 | Mobile/Edge devices |
|
||||
| `MOBILEONE_S0` | 2.1M | 4.8 MB | 12.58 | Lightweight/Real-time |
|
||||
|
||||
*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better
|
||||
|
||||
**Dataset**: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)
|
||||
**Training**: 200 epochs with classification-based approach (binned angles)
|
||||
|
||||
#### Usage
|
||||
|
||||
```python
|
||||
from uniface import MobileGaze
|
||||
from uniface.constants import GazeWeights
|
||||
import numpy as np
|
||||
|
||||
# Default (recommended)
|
||||
gaze_estimator = MobileGaze() # Uses RESNET34
|
||||
|
||||
# Lightweight model
|
||||
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
|
||||
|
||||
# Estimate gaze from face crop
|
||||
pitch, yaw = gaze_estimator.estimate(face_crop)
|
||||
print(f"Pitch: {np.degrees(pitch):.1f}°, Yaw: {np.degrees(yaw):.1f}°")
|
||||
```
|
||||
|
||||
**Note**: Requires face crop as input. Use face detection first to obtain bounding boxes.
|
||||
|
||||
---
|
||||
|
||||
## Face Parsing Models
|
||||
|
||||
### BiSeNet Family
|
||||
|
||||
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
|
||||
|
||||
| Model Name | Params | Size | Classes | Use Case |
|
||||
| -------------- | ------ | ------- | ------- | ----------------------------- |
|
||||
| `RESNET18` ⭐ | 13.3M | 50.7 MB | 19 | **Recommended default** |
|
||||
| `RESNET34` | 24.1M | 89.2 MB | 19 | Higher accuracy |
|
||||
|
||||
**19 Facial Component Classes:**
|
||||
1. Background
|
||||
2. Skin
|
||||
3. Left Eyebrow
|
||||
4. Right Eyebrow
|
||||
5. Left Eye
|
||||
6. Right Eye
|
||||
7. Eye Glasses
|
||||
8. Left Ear
|
||||
9. Right Ear
|
||||
10. Ear Ring
|
||||
11. Nose
|
||||
12. Mouth
|
||||
13. Upper Lip
|
||||
14. Lower Lip
|
||||
15. Neck
|
||||
16. Neck Lace
|
||||
17. Cloth
|
||||
18. Hair
|
||||
19. Hat
|
||||
|
||||
**Dataset**: Trained on CelebAMask-HQ
|
||||
**Architecture**: BiSeNet with ResNet backbone
|
||||
**Input Size**: 512×512 (automatically resized)
|
||||
|
||||
#### Usage
|
||||
|
||||
```python
|
||||
from uniface.parsing import BiSeNet
|
||||
from uniface.constants import ParsingWeights
|
||||
from uniface.visualization import vis_parsing_maps
|
||||
import cv2
|
||||
|
||||
# Default (recommended)
|
||||
parser = BiSeNet() # Uses RESNET18
|
||||
|
||||
# Higher accuracy model
|
||||
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||
|
||||
# Parse face image (already cropped)
|
||||
mask = parser.parse(face_image)
|
||||
|
||||
# Visualize with overlay
|
||||
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||||
|
||||
# mask shape: (H, W) with values 0-18 representing classes
|
||||
print(f"Detected {len(np.unique(mask))} facial components")
|
||||
```
|
||||
|
||||
**Applications:**
|
||||
- Face makeup and beauty applications
|
||||
- Virtual try-on systems
|
||||
- Face editing and manipulation
|
||||
- Facial feature extraction
|
||||
- Portrait segmentation
|
||||
|
||||
**Note**: Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
|
||||
|
||||
---
|
||||
|
||||
## Anti-Spoofing Models
|
||||
|
||||
### MiniFASNet Family
|
||||
|
||||
Lightweight face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).
|
||||
|
||||
| Model Name | Size | Scale | Use Case |
|
||||
| ---------- | ------ | ----- | ----------------------------- |
|
||||
| `V1SE` | 1.2 MB | 4.0 | Squeeze-and-excitation variant |
|
||||
| `V2` ⭐ | 1.2 MB | 2.7 | **Recommended default** |
|
||||
|
||||
**Dataset**: Trained on face anti-spoofing datasets
|
||||
**Output**: Returns (label_idx, score) where label_idx: 0=Fake, 1=Real
|
||||
|
||||
#### Usage
|
||||
|
||||
```python
|
||||
from uniface import RetinaFace
|
||||
from uniface.spoofing import MiniFASNet
|
||||
from uniface.constants import MiniFASNetWeights
|
||||
|
||||
# Default (V2, recommended)
|
||||
detector = RetinaFace()
|
||||
spoofer = MiniFASNet()
|
||||
|
||||
# V1SE variant
|
||||
spoofer = MiniFASNet(model_name=MiniFASNetWeights.V1SE)
|
||||
|
||||
# Detect and check liveness
|
||||
faces = detector.detect(image)
|
||||
for face in faces:
|
||||
label_idx, score = spoofer.predict(image, face['bbox'])
|
||||
# label_idx: 0 = Fake, 1 = Real
|
||||
label = 'Real' if label_idx == 1 else 'Fake'
|
||||
print(f"{label}: {score:.1%}")
|
||||
```
|
||||
|
||||
**Note**: Requires face bounding box from a detector. Use with RetinaFace, SCRFD, or YOLOv5Face.
|
||||
|
||||
---
|
||||
|
||||
## Model Updates
|
||||
|
||||
Models are automatically downloaded and cached on first use. Cache location: `~/.uniface/models/`
|
||||
@@ -322,6 +484,9 @@ python scripts/download_model.py --model MNET_V2
|
||||
- **YOLOv5-Face Original**: [deepcam-cn/yolov5-face](https://github.com/deepcam-cn/yolov5-face) - Original PyTorch implementation
|
||||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
||||
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet training code and pretrained weights
|
||||
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
|
||||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
||||
|
||||
### Papers
|
||||
@@ -331,3 +496,4 @@ python scripts/download_model.py --model MNET_V2
|
||||
- **YOLOv5-Face**: [YOLO5Face: Why Reinventing a Face Detector](https://arxiv.org/abs/2105.12931)
|
||||
- **ArcFace**: [Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
|
||||
- **SphereFace**: [Deep Hypersphere Embedding for Face Recognition](https://arxiv.org/abs/1704.08063)
|
||||
- **BiSeNet**: [Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897)
|
||||
|
||||
284
QUICKSTART.md
284
QUICKSTART.md
@@ -39,9 +39,9 @@ faces = detector.detect(image)
|
||||
# Print results
|
||||
for i, face in enumerate(faces):
|
||||
print(f"Face {i+1}:")
|
||||
print(f" Confidence: {face['confidence']:.2f}")
|
||||
print(f" BBox: {face['bbox']}")
|
||||
print(f" Landmarks: {len(face['landmarks'])} points")
|
||||
print(f" Confidence: {face.confidence:.2f}")
|
||||
print(f" BBox: {face.bbox}")
|
||||
print(f" Landmarks: {len(face.landmarks)} points")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
@@ -70,9 +70,9 @@ image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
# Extract visualization data
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
bboxes = [f.bbox for f in faces]
|
||||
scores = [f.confidence for f in faces]
|
||||
landmarks = [f.landmarks for f in faces]
|
||||
|
||||
# Draw on image
|
||||
draw_detections(
|
||||
@@ -113,8 +113,8 @@ faces2 = detector.detect(image2)
|
||||
|
||||
if faces1 and faces2:
|
||||
# Extract embeddings
|
||||
emb1 = recognizer.get_normalized_embedding(image1, faces1[0]['landmarks'])
|
||||
emb2 = recognizer.get_normalized_embedding(image2, faces2[0]['landmarks'])
|
||||
emb1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
|
||||
emb2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)
|
||||
|
||||
# Compute similarity (cosine similarity)
|
||||
similarity = np.dot(emb1, emb2.T)[0][0]
|
||||
@@ -159,9 +159,9 @@ while True:
|
||||
faces = detector.detect(frame)
|
||||
|
||||
# Draw results
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
bboxes = [f.bbox for f in faces]
|
||||
scores = [f.confidence for f in faces]
|
||||
landmarks = [f.landmarks for f in faces]
|
||||
draw_detections(
|
||||
image=frame,
|
||||
bboxes=bboxes,
|
||||
@@ -199,9 +199,9 @@ faces = detector.detect(image)
|
||||
|
||||
# Predict attributes
|
||||
for i, face in enumerate(faces):
|
||||
gender_id, age = age_gender.predict(image, face['bbox'])
|
||||
gender = 'Female' if gender_id == 0 else 'Male'
|
||||
print(f"Face {i+1}: {gender}, {age} years old")
|
||||
gender, age = age_gender.predict(image, face.bbox)
|
||||
gender_str = 'Female' if gender == 0 else 'Male'
|
||||
print(f"Face {i+1}: {gender_str}, {age} years old")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
@@ -230,7 +230,7 @@ image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
if faces:
|
||||
landmarks = landmarker.get_landmarks(image, faces[0]['bbox'])
|
||||
landmarks = landmarker.get_landmarks(image, faces[0].bbox)
|
||||
print(f"Detected {len(landmarks)} landmarks")
|
||||
|
||||
# Draw landmarks
|
||||
@@ -242,7 +242,223 @@ if faces:
|
||||
|
||||
---
|
||||
|
||||
## 7. Batch Processing (3 minutes)
|
||||
## 7. Gaze Estimation (2 minutes)
|
||||
|
||||
Estimate where a person is looking:
|
||||
|
||||
```python
|
||||
import cv2
|
||||
import numpy as np
|
||||
from uniface import RetinaFace, MobileGaze
|
||||
from uniface.visualization import draw_gaze
|
||||
|
||||
# Initialize models
|
||||
detector = RetinaFace()
|
||||
gaze_estimator = MobileGaze()
|
||||
|
||||
# Load image
|
||||
image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
# Estimate gaze for each face
|
||||
for i, face in enumerate(faces):
|
||||
x1, y1, x2, y2 = map(int, face.bbox[:4])
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size > 0:
|
||||
pitch, yaw = gaze_estimator.estimate(face_crop)
|
||||
print(f"Face {i+1}: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
|
||||
|
||||
# Draw gaze direction
|
||||
draw_gaze(image, face.bbox, pitch, yaw)
|
||||
|
||||
cv2.imwrite("gaze_output.jpg", image)
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Face 1: pitch=5.2°, yaw=-12.3°
|
||||
Face 2: pitch=-8.1°, yaw=15.7°
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Face Parsing (2 minutes)
|
||||
|
||||
Segment face into semantic components (skin, eyes, nose, mouth, hair, etc.):
|
||||
|
||||
```python
|
||||
import cv2
|
||||
import numpy as np
|
||||
from uniface.parsing import BiSeNet
|
||||
from uniface.visualization import vis_parsing_maps
|
||||
|
||||
# Initialize parser
|
||||
parser = BiSeNet() # Uses ResNet18 by default
|
||||
|
||||
# Load face image (already cropped)
|
||||
face_image = cv2.imread("face.jpg")
|
||||
|
||||
# Parse face into 19 components
|
||||
mask = parser.parse(face_image)
|
||||
|
||||
# Visualize with overlay
|
||||
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||||
|
||||
# Convert back to BGR for saving
|
||||
vis_bgr = cv2.cvtColor(vis_result, cv2.COLOR_RGB2BGR)
|
||||
cv2.imwrite("parsed_face.jpg", vis_bgr)
|
||||
|
||||
print(f"Detected {len(np.unique(mask))} facial components")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Detected 12 facial components
|
||||
```
|
||||
|
||||
**19 Facial Component Classes:**
|
||||
- Background, Skin, Eyebrows (L/R), Eyes (L/R), Eye Glasses
|
||||
- Ears (L/R), Ear Ring, Nose, Mouth, Lips (Upper/Lower)
|
||||
- Neck, Neck Lace, Cloth, Hair, Hat
|
||||
|
||||
---
|
||||
|
||||
## 9. Face Anonymization (2 minutes)
|
||||
|
||||
Automatically blur faces for privacy protection:
|
||||
|
||||
```python
|
||||
from uniface.privacy import anonymize_faces
|
||||
import cv2
|
||||
|
||||
# One-liner: automatic detection and blurring
|
||||
image = cv2.imread("group_photo.jpg")
|
||||
anonymized = anonymize_faces(image, method='pixelate')
|
||||
cv2.imwrite("anonymized.jpg", anonymized)
|
||||
print("Faces anonymized successfully!")
|
||||
```
|
||||
|
||||
**Manual control with custom parameters:**
|
||||
|
||||
```python
|
||||
from uniface import RetinaFace
|
||||
from uniface.privacy import BlurFace
|
||||
|
||||
# Initialize detector and blurrer
|
||||
detector = RetinaFace()
|
||||
blurrer = BlurFace(method='gaussian', blur_strength=5.0)
|
||||
|
||||
# Detect and anonymize
|
||||
faces = detector.detect(image)
|
||||
anonymized = blurrer.anonymize(image, faces)
|
||||
cv2.imwrite("output.jpg", anonymized)
|
||||
```
|
||||
|
||||
**Available blur methods:**
|
||||
|
||||
```python
|
||||
# Pixelation (news media standard)
|
||||
blurrer = BlurFace(method='pixelate', pixel_blocks=8)
|
||||
|
||||
# Gaussian blur (smooth, natural)
|
||||
blurrer = BlurFace(method='gaussian', blur_strength=4.0)
|
||||
|
||||
# Black boxes (maximum privacy)
|
||||
blurrer = BlurFace(method='blackout', color=(0, 0, 0))
|
||||
|
||||
# Elliptical blur (natural face shape)
|
||||
blurrer = BlurFace(method='elliptical', blur_strength=3.0, margin=30)
|
||||
|
||||
# Median blur (edge-preserving)
|
||||
blurrer = BlurFace(method='median', blur_strength=3.0)
|
||||
```
|
||||
|
||||
**Webcam anonymization:**
|
||||
|
||||
```python
|
||||
import cv2
|
||||
from uniface import RetinaFace
|
||||
from uniface.privacy import BlurFace
|
||||
|
||||
detector = RetinaFace()
|
||||
blurrer = BlurFace(method='pixelate')
|
||||
cap = cv2.VideoCapture(0)
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
faces = detector.detect(frame)
|
||||
frame = blurrer.anonymize(frame, faces, inplace=True)
|
||||
|
||||
cv2.imshow('Anonymized', frame)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
```
|
||||
|
||||
**Command-line tool:**
|
||||
|
||||
```bash
|
||||
# Anonymize image with pixelation
|
||||
python scripts/run_anonymization.py --image photo.jpg
|
||||
|
||||
# Real-time webcam anonymization
|
||||
python scripts/run_anonymization.py --webcam --method gaussian
|
||||
|
||||
# Custom blur strength
|
||||
python scripts/run_anonymization.py --image photo.jpg --method gaussian --blur-strength 5.0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Face Anti-Spoofing (2 minutes)
|
||||
|
||||
Detect if a face is real or fake (photo, video replay, mask):
|
||||
|
||||
```python
|
||||
from uniface import RetinaFace
|
||||
from uniface.spoofing import MiniFASNet
|
||||
|
||||
detector = RetinaFace()
|
||||
spoofer = MiniFASNet() # Uses V2 by default
|
||||
|
||||
image = cv2.imread("photo.jpg")
|
||||
faces = detector.detect(image)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
label_idx, score = spoofer.predict(image, face.bbox)
|
||||
# label_idx: 0 = Fake, 1 = Real
|
||||
label = 'Real' if label_idx == 1 else 'Fake'
|
||||
print(f"Face {i+1}: {label} ({score:.1%})")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Face 1: Real (98.5%)
|
||||
```
|
||||
|
||||
**Command-line tool:**
|
||||
|
||||
```bash
|
||||
# Image
|
||||
python scripts/run_spoofing.py --image photo.jpg
|
||||
|
||||
# Webcam
|
||||
python scripts/run_spoofing.py --source 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Batch Processing (3 minutes)
|
||||
|
||||
Process multiple images:
|
||||
|
||||
@@ -275,7 +491,7 @@ print("Done!")
|
||||
|
||||
---
|
||||
|
||||
## 8. Model Selection
|
||||
## 12. Model Selection
|
||||
|
||||
Choose the right model for your use case:
|
||||
|
||||
@@ -326,6 +542,35 @@ recognizer = MobileFace(model_name=MobileFaceWeights.MNET_V2) # Fast, small siz
|
||||
recognizer = SphereFace(model_name=SphereFaceWeights.SPHERE20) # Alternative method
|
||||
```
|
||||
|
||||
### Gaze Estimation Models
|
||||
|
||||
```python
|
||||
from uniface import MobileGaze
|
||||
from uniface.constants import GazeWeights
|
||||
|
||||
# Default (recommended)
|
||||
gaze_estimator = MobileGaze() # Uses RESNET34
|
||||
|
||||
# Lightweight (mobile/edge devices)
|
||||
gaze_estimator = MobileGaze(model_name=GazeWeights.MOBILEONE_S0)
|
||||
|
||||
# High accuracy
|
||||
gaze_estimator = MobileGaze(model_name=GazeWeights.RESNET50)
|
||||
```
|
||||
|
||||
### Face Parsing Models
|
||||
|
||||
```python
|
||||
from uniface.parsing import BiSeNet
|
||||
from uniface.constants import ParsingWeights
|
||||
|
||||
# Default (recommended, 50.7 MB)
|
||||
parser = BiSeNet() # Uses RESNET18
|
||||
|
||||
# Higher accuracy (89.2 MB)
|
||||
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
@@ -387,6 +632,9 @@ Explore interactive examples for common tasks:
|
||||
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
||||
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
|
||||
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
|
||||
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
|
||||
| **Face Anonymization** | Blur or pixelate faces for privacy protection | [face_anonymization.ipynb](examples/face_anonymization.ipynb) |
|
||||
| **Gaze Estimation** | Estimate gaze direction | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
|
||||
|
||||
### Additional Resources
|
||||
|
||||
@@ -400,4 +648,6 @@ Explore interactive examples for common tasks:
|
||||
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch)
|
||||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference)
|
||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
|
||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation)
|
||||
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
|
||||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
|
||||
|
||||
197
README.md
197
README.md
@@ -1,17 +1,21 @@
|
||||
# UniFace: All-in-One Face Analysis Library
|
||||
|
||||
<div align="center">
|
||||
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://www.python.org/)
|
||||
[](https://pypi.org/project/uniface/)
|
||||
[](https://github.com/yakhyo/uniface/actions)
|
||||
[](https://pepy.tech/project/uniface)
|
||||
[](https://deepwiki.com/yakhyo/uniface)
|
||||
[](https://pepy.tech/project/uniface)
|
||||
[](https://deepwiki.com/yakhyo/uniface)
|
||||
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src=".github/logos/logo_web.webp" width=75%>
|
||||
</div>
|
||||
|
||||
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, and attribute analysis with hardware acceleration support across platforms.
|
||||
**UniFace** is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides high-performance face detection, recognition, landmark detection, face parsing, gaze estimation, and attribute analysis with hardware acceleration support across platforms.
|
||||
|
||||
---
|
||||
|
||||
@@ -20,7 +24,11 @@
|
||||
- **High-Speed Face Detection**: ONNX-optimized RetinaFace, SCRFD, and YOLOv5-Face models
|
||||
- **Facial Landmark Detection**: Accurate 106-point landmark localization
|
||||
- **Face Recognition**: ArcFace, MobileFace, and SphereFace embeddings
|
||||
- **Face Parsing**: BiSeNet-based semantic segmentation with 19 facial component classes
|
||||
- **Gaze Estimation**: Real-time gaze direction prediction with MobileGaze
|
||||
- **Attribute Analysis**: Age, gender, and emotion detection
|
||||
- **Anti-Spoofing**: Face liveness detection with MiniFASNet models
|
||||
- **Face Anonymization**: Privacy-preserving face blurring with 5 methods (pixelate, gaussian, blackout, elliptical, median)
|
||||
- **Face Alignment**: Precise alignment for downstream tasks
|
||||
- **Hardware Acceleration**: ARM64 optimizations (Apple Silicon), CUDA (NVIDIA), CPU fallback
|
||||
- **Simple API**: Intuitive factory functions and clean interfaces
|
||||
@@ -97,9 +105,9 @@ faces = detector.detect(image)
|
||||
|
||||
# Process results
|
||||
for face in faces:
|
||||
bbox = face['bbox'] # [x1, y1, x2, y2]
|
||||
confidence = face['confidence']
|
||||
landmarks = face['landmarks'] # 5-point landmarks
|
||||
bbox = face.bbox # np.ndarray [x1, y1, x2, y2]
|
||||
confidence = face.confidence
|
||||
landmarks = face.landmarks # np.ndarray (5, 2) landmarks
|
||||
print(f"Face detected with confidence: {confidence:.2f}")
|
||||
```
|
||||
|
||||
@@ -117,8 +125,8 @@ recognizer = ArcFace()
|
||||
faces1 = detector.detect(image1)
|
||||
faces2 = detector.detect(image2)
|
||||
|
||||
embedding1 = recognizer.get_normalized_embedding(image1, faces1[0]['landmarks'])
|
||||
embedding2 = recognizer.get_normalized_embedding(image2, faces2[0]['landmarks'])
|
||||
embedding1 = recognizer.get_normalized_embedding(image1, faces1[0].landmarks)
|
||||
embedding2 = recognizer.get_normalized_embedding(image2, faces2[0].landmarks)
|
||||
|
||||
# Compare faces
|
||||
similarity = compute_similarity(embedding1, embedding2)
|
||||
@@ -134,7 +142,7 @@ detector = RetinaFace()
|
||||
landmarker = Landmark106()
|
||||
|
||||
faces = detector.detect(image)
|
||||
landmarks = landmarker.get_landmarks(image, faces[0]['bbox'])
|
||||
landmarks = landmarker.get_landmarks(image, faces[0].bbox)
|
||||
# Returns 106 (x, y) landmark points
|
||||
```
|
||||
|
||||
@@ -147,11 +155,125 @@ detector = RetinaFace()
|
||||
age_gender = AgeGender()
|
||||
|
||||
faces = detector.detect(image)
|
||||
gender_id, age = age_gender.predict(image, faces[0]['bbox'])
|
||||
gender = 'Female' if gender_id == 0 else 'Male'
|
||||
print(f"{gender}, {age} years old")
|
||||
gender, age = age_gender.predict(image, faces[0].bbox)
|
||||
gender_str = 'Female' if gender == 0 else 'Male'
|
||||
print(f"{gender_str}, {age} years old")
|
||||
```
|
||||
|
||||
### Gaze Estimation
|
||||
|
||||
```python
|
||||
from uniface import RetinaFace, MobileGaze
|
||||
from uniface.visualization import draw_gaze
|
||||
import numpy as np
|
||||
|
||||
detector = RetinaFace()
|
||||
gaze_estimator = MobileGaze()
|
||||
|
||||
faces = detector.detect(image)
|
||||
for face in faces:
|
||||
x1, y1, x2, y2 = map(int, face.bbox[:4])
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
pitch, yaw = gaze_estimator.estimate(face_crop)
|
||||
print(f"Gaze: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
|
||||
|
||||
# Visualize
|
||||
draw_gaze(image, face.bbox, pitch, yaw)
|
||||
```
|
||||
|
||||
### Face Parsing
|
||||
|
||||
```python
|
||||
from uniface.parsing import BiSeNet
|
||||
from uniface.visualization import vis_parsing_maps
|
||||
|
||||
# Initialize parser
|
||||
parser = BiSeNet() # Uses ResNet18 by default
|
||||
|
||||
# Parse face image (already cropped)
|
||||
mask = parser.parse(face_image)
|
||||
|
||||
# Visualize with overlay
|
||||
import cv2
|
||||
face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
vis_result = vis_parsing_maps(face_rgb, mask, save_image=False)
|
||||
|
||||
# mask contains 19 classes: skin, eyes, nose, mouth, hair, etc.
|
||||
print(f"Unique classes: {len(np.unique(mask))}")
|
||||
```
|
||||
|
||||
### Face Anti-Spoofing
|
||||
|
||||
Detect if a face is real or fake (photo, video replay, mask):
|
||||
|
||||
```python
|
||||
from uniface import RetinaFace
|
||||
from uniface.spoofing import MiniFASNet
|
||||
|
||||
detector = RetinaFace()
|
||||
spoofer = MiniFASNet() # Uses V2 by default
|
||||
|
||||
faces = detector.detect(image)
|
||||
for face in faces:
|
||||
label_idx, score = spoofer.predict(image, face.bbox)
|
||||
# label_idx: 0 = Fake, 1 = Real
|
||||
label = 'Real' if label_idx == 1 else 'Fake'
|
||||
print(f"{label}: {score:.1%}")
|
||||
```
|
||||
|
||||
### Face Anonymization
|
||||
|
||||
Protect privacy by blurring or pixelating faces with 5 different methods:
|
||||
|
||||
```python
|
||||
from uniface import RetinaFace
|
||||
from uniface.privacy import BlurFace, anonymize_faces
|
||||
import cv2
|
||||
|
||||
# Method 1: One-liner with automatic detection
|
||||
image = cv2.imread("photo.jpg")
|
||||
anonymized = anonymize_faces(image, method='pixelate')
|
||||
cv2.imwrite("anonymized.jpg", anonymized)
|
||||
|
||||
# Method 2: Manual control with custom parameters
|
||||
detector = RetinaFace()
|
||||
blurrer = BlurFace(method='gaussian', blur_strength=5.0)
|
||||
|
||||
faces = detector.detect(image)
|
||||
anonymized = blurrer.anonymize(image, faces)
|
||||
|
||||
# Available blur methods:
|
||||
methods = {
|
||||
'pixelate': BlurFace(method='pixelate', pixel_blocks=10), # Blocky effect (news media standard)
|
||||
'gaussian': BlurFace(method='gaussian', blur_strength=3.0), # Smooth, natural blur
|
||||
'blackout': BlurFace(method='blackout', color=(0, 0, 0)), # Solid color boxes (maximum privacy)
|
||||
'elliptical': BlurFace(method='elliptical', margin=20), # Soft oval blur (natural face shape)
|
||||
'median': BlurFace(method='median', blur_strength=3.0) # Edge-preserving blur
|
||||
}
|
||||
|
||||
# Real-time webcam anonymization
|
||||
cap = cv2.VideoCapture(0)
|
||||
detector = RetinaFace()
|
||||
blurrer = BlurFace(method='pixelate')
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
faces = detector.detect(frame)
|
||||
frame = blurrer.anonymize(frame, faces, inplace=True)
|
||||
|
||||
cv2.imshow('Anonymized', frame)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
@@ -170,16 +292,20 @@ print(f"{gender}, {age} years old")
|
||||
from uniface.detection import RetinaFace, SCRFD
|
||||
from uniface.recognition import ArcFace
|
||||
from uniface.landmark import Landmark106
|
||||
from uniface.privacy import BlurFace, anonymize_faces
|
||||
|
||||
from uniface.constants import SCRFDWeights
|
||||
|
||||
# Create detector with default settings
|
||||
detector = RetinaFace()
|
||||
|
||||
# Create with custom config
|
||||
detector = SCRFD(
|
||||
model_name='scrfd_10g_kps',
|
||||
conf_thresh=0.8,
|
||||
model_name=SCRFDWeights.SCRFD_10G_KPS, # SCRFDWeights.SCRFD_500M_KPS
|
||||
conf_thresh=0.4,
|
||||
input_size=(640, 640)
|
||||
)
|
||||
# Or with defaults settings: detector = SCRFD()
|
||||
|
||||
# Recognition and landmarks
|
||||
recognizer = ArcFace()
|
||||
@@ -198,6 +324,7 @@ detector = RetinaFace(
|
||||
conf_thresh=0.5,
|
||||
nms_thresh=0.4
|
||||
)
|
||||
# Or detector = RetinaFace()
|
||||
|
||||
# YOLOv5-Face detection
|
||||
detector = YOLOv5Face(
|
||||
@@ -205,6 +332,7 @@ detector = YOLOv5Face(
|
||||
conf_thresh=0.6,
|
||||
nms_thresh=0.5
|
||||
)
|
||||
# Or detector = YOLOv5Face
|
||||
|
||||
# Recognition
|
||||
recognizer = ArcFace() # Uses default weights
|
||||
@@ -229,7 +357,7 @@ faces = detect_faces(image, method='retinaface', conf_thresh=0.8) # methods: re
|
||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
|
||||
| `RetinaFace` | `model_name=RetinaFaceWeights.MNET_V2`, `conf_thresh=0.5`, `nms_thresh=0.4`, `input_size=(640, 640)`, `dynamic_size=False` | Supports 5-point landmarks |
|
||||
| `SCRFD` | `model_name=SCRFDWeights.SCRFD_10G_KPS`, `conf_thresh=0.5`, `nms_thresh=0.4`, `input_size=(640, 640)` | Supports 5-point landmarks |
|
||||
| `YOLOv5Face` | `model_name=YOLOv5FaceWeights.YOLOV5S`, `conf_thresh=0.6`, `nms_thresh=0.5`, `input_size=640` (fixed) | Landmarks supported;`input_size` must be 640 |
|
||||
| `YOLOv5Face` | `model_name=YOLOv5FaceWeights.YOLOV5S`, `conf_thresh=0.6`, `nms_thresh=0.5`, `input_size=640` (fixed) | Supports 5-point landmarks; models: YOLOV5N/S/M; `input_size` must be 640 |
|
||||
|
||||
**Recognition**
|
||||
|
||||
@@ -247,6 +375,24 @@ faces = detect_faces(image, method='retinaface', conf_thresh=0.8) # methods: re
|
||||
| `AgeGender` | `model_name=AgeGenderWeights.DEFAULT`; `input_size` auto-detected | Requires bbox; ONNXRuntime |
|
||||
| `Emotion` | `model_weights=DDAMFNWeights.AFFECNET7`, `input_size=(112, 112)` | Requires 5-point landmarks; TorchScript |
|
||||
|
||||
**Gaze Estimation**
|
||||
|
||||
| Class | Key params (defaults) | Notes |
|
||||
| ------------- | ------------------------------------------ | ------------------------------------ |
|
||||
| `MobileGaze` | `model_name=GazeWeights.RESNET34` | Returns (pitch, yaw) angles in radians; trained on Gaze360 |
|
||||
|
||||
**Face Parsing**
|
||||
|
||||
| Class | Key params (defaults) | Notes |
|
||||
| ---------- | ---------------------------------------- | ------------------------------------ |
|
||||
| `BiSeNet` | `model_name=ParsingWeights.RESNET18`, `input_size=(512, 512)` | 19 facial component classes; BiSeNet architecture with ResNet backbone |
|
||||
|
||||
**Anti-Spoofing**
|
||||
|
||||
| Class | Key params (defaults) | Notes |
|
||||
| ------------- | ----------------------------------------- | ------------------------------------ |
|
||||
| `MiniFASNet` | `model_name=MiniFASNetWeights.V2` | Returns (label_idx, score); 0=Fake, 1=Real |
|
||||
|
||||
---
|
||||
|
||||
## Model Performance
|
||||
@@ -260,6 +406,7 @@ faces = detect_faces(image, method='retinaface', conf_thresh=0.8) # methods: re
|
||||
| retinaface_r34 | 94.16% | 93.12% | 88.90% | High accuracy |
|
||||
| scrfd_500m | 90.57% | 88.12% | 68.51% | Real-time applications |
|
||||
| scrfd_10g | 95.16% | 93.87% | 83.05% | Best accuracy/speed |
|
||||
| yolov5n_face | 93.61% | 91.52% | 80.53% | Lightweight/Mobile |
|
||||
| yolov5s_face | 94.33% | 92.61% | 83.15% | Real-time + accuracy |
|
||||
| yolov5m_face | 95.30% | 93.76% | 85.28% | High accuracy |
|
||||
|
||||
@@ -292,6 +439,9 @@ Interactive examples covering common face analysis tasks:
|
||||
| **Face Recognition** | Extract face embeddings and compare faces | [face_analyzer.ipynb](examples/face_analyzer.ipynb) |
|
||||
| **Face Verification** | Compare two faces to verify identity | [face_verification.ipynb](examples/face_verification.ipynb) |
|
||||
| **Face Search** | Find a person in a group photo | [face_search.ipynb](examples/face_search.ipynb) |
|
||||
| **Face Parsing** | Segment face into semantic components | [face_parsing.ipynb](examples/face_parsing.ipynb) |
|
||||
| **Face Anonymization** | Blur or pixelate faces for privacy protection | [face_anonymization.ipynb](examples/face_anonymization.ipynb) |
|
||||
| **Gaze Estimation** | Estimate gaze direction from face images | [gaze_estimation.ipynb](examples/gaze_estimation.ipynb) |
|
||||
|
||||
### Webcam Face Detection
|
||||
|
||||
@@ -311,9 +461,9 @@ while True:
|
||||
faces = detector.detect(frame)
|
||||
|
||||
# Extract data for visualization
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
bboxes = [f.bbox for f in faces]
|
||||
scores = [f.confidence for f in faces]
|
||||
landmarks = [f.landmarks for f in faces]
|
||||
|
||||
draw_detections(
|
||||
image=frame,
|
||||
@@ -347,7 +497,7 @@ for person_id, image_path in person_images.items():
|
||||
faces = detector.detect(image)
|
||||
if faces:
|
||||
embedding = recognizer.get_normalized_embedding(
|
||||
image, faces[0]['landmarks']
|
||||
image, faces[0].landmarks
|
||||
)
|
||||
database[person_id] = embedding
|
||||
|
||||
@@ -356,7 +506,7 @@ query_image = cv2.imread("query.jpg")
|
||||
query_faces = detector.detect(query_image)
|
||||
if query_faces:
|
||||
query_embedding = recognizer.get_normalized_embedding(
|
||||
query_image, query_faces[0]['landmarks']
|
||||
query_image, query_faces[0].landmarks
|
||||
)
|
||||
|
||||
# Find best match
|
||||
@@ -482,7 +632,11 @@ uniface/
|
||||
│ ├── detection/ # Face detection models
|
||||
│ ├── recognition/ # Face recognition models
|
||||
│ ├── landmark/ # Landmark detection
|
||||
│ ├── parsing/ # Face parsing
|
||||
│ ├── gaze/ # Gaze estimation
|
||||
│ ├── attribute/ # Age, gender, emotion
|
||||
│ ├── spoofing/ # Face anti-spoofing
|
||||
│ ├── privacy/ # Face anonymization & blurring
|
||||
│ ├── onnx_utils.py # ONNX Runtime utilities
|
||||
│ ├── model_store.py # Model download & caching
|
||||
│ └── visualization.py # Drawing utilities
|
||||
@@ -498,6 +652,9 @@ uniface/
|
||||
- **RetinaFace Training**: [yakhyo/retinaface-pytorch](https://github.com/yakhyo/retinaface-pytorch) - PyTorch implementation and training code
|
||||
- **YOLOv5-Face ONNX**: [yakhyo/yolov5-face-onnx-inference](https://github.com/yakhyo/yolov5-face-onnx-inference) - ONNX inference implementation
|
||||
- **Face Recognition Training**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition) - ArcFace, MobileFace, SphereFace training code
|
||||
- **Face Parsing Training**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing) - BiSeNet face parsing training code and pretrained weights
|
||||
- **Gaze Estimation Training**: [yakhyo/gaze-estimation](https://github.com/yakhyo/gaze-estimation) - MobileGaze training code and pretrained weights
|
||||
- **Face Anti-Spoofing**: [yakhyo/face-anti-spoofing](https://github.com/yakhyo/face-anti-spoofing) - MiniFASNet ONNX inference (weights from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing))
|
||||
- **InsightFace**: [deepinsight/insightface](https://github.com/deepinsight/insightface) - Model architectures and pretrained weights
|
||||
|
||||
## Contributing
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -13,9 +13,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -q uniface"
|
||||
]
|
||||
@@ -29,14 +37,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1.3.0\n"
|
||||
"1.6.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -65,7 +73,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -95,7 +103,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -136,7 +144,7 @@
|
||||
"\n",
|
||||
" # Print face attributes\n",
|
||||
" for i, face in enumerate(faces, 1):\n",
|
||||
" print(f' Face {i}: {face.gender}, {face.age}y')\n",
|
||||
" print(f' Face {i}: {face.sex}, {face.age}y')\n",
|
||||
"\n",
|
||||
" # Prepare visualization (without text overlay)\n",
|
||||
" vis_image = image.copy()\n",
|
||||
@@ -159,7 +167,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -186,7 +194,7 @@
|
||||
" axes[1, idx].axis('off')\n",
|
||||
" info_text = f'{len(faces)} face(s)\\n'\n",
|
||||
" for i, face in enumerate(faces, 1):\n",
|
||||
" info_text += f'Face {i}: {face.gender}, {face.age}y\\n'\n",
|
||||
" info_text += f'Face {i}: {face.sex}, {face.age}y\\n'\n",
|
||||
"\n",
|
||||
" axes[1, idx].text(0.5, 0.5, info_text,\n",
|
||||
" ha='center', va='center',\n",
|
||||
@@ -207,7 +215,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -236,7 +244,7 @@
|
||||
" print(f' - Confidence: {face.confidence:.3f}')\n",
|
||||
" print(f' - Landmarks shape: {face.landmarks.shape}')\n",
|
||||
" print(f' - Age: {face.age} years')\n",
|
||||
" print(f' - Gender: {face.gender}')\n",
|
||||
" print(f' - Gender: {face.sex}')\n",
|
||||
" print(f' - Embedding shape: {face.embedding.shape}')\n",
|
||||
" print(f' - Embedding dimension: {face.embedding.shape[1]}D')"
|
||||
]
|
||||
@@ -252,14 +260,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Similarity between faces: 0.1201\n",
|
||||
"Similarity between faces: 0.1135\n",
|
||||
"Same person: No (threshold=0.6)\n"
|
||||
]
|
||||
}
|
||||
@@ -283,7 +291,7 @@
|
||||
"\n",
|
||||
"- `analyzer.analyze()` performs detection, recognition, and attribute prediction in one call\n",
|
||||
"- Each `Face` object contains: `bbox`, `confidence`, `landmarks`, `embedding`, `age`, `gender`\n",
|
||||
"- Gender is available as both ID (0=Female, 1=Male) and string via `face.gender` property\n",
|
||||
"- Gender is available as both ID (0=Female, 1=Male) and string via `face.sex` property\n",
|
||||
"- Face embeddings are L2-normalized (norm ≈ 1.0) for similarity computation\n",
|
||||
"- Use `face.compute_similarity(other_face)` to compare faces (returns cosine similarity)\n",
|
||||
"- Typical similarity threshold: 0.6 (same person if similarity > 0.6)"
|
||||
@@ -297,7 +305,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "base",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -311,7 +319,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.0"
|
||||
"version": "3.13.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
327
examples/face_anonymization.ipynb
Normal file
327
examples/face_anonymization.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -13,9 +13,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -q uniface"
|
||||
]
|
||||
@@ -29,14 +37,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1.3.0\n"
|
||||
"1.6.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -61,7 +69,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -88,7 +96,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -99,7 +107,7 @@
|
||||
"<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1024x624>"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -119,7 +127,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -145,14 +153,14 @@
|
||||
"# Load image\n",
|
||||
"image = cv2.imread(image_path)\n",
|
||||
"\n",
|
||||
"# Detect faces - returns list of face dictionaries\n",
|
||||
"# Detect faces - returns list of Face objects\n",
|
||||
"faces = detector.detect(image)\n",
|
||||
"print(f'Detected {len(faces)} face(s)')\n",
|
||||
"\n",
|
||||
"# Unpack face data for visualization\n",
|
||||
"bboxes = [f['bbox'] for f in faces]\n",
|
||||
"scores = [f['confidence'] for f in faces]\n",
|
||||
"landmarks = [f['landmarks'] for f in faces]\n",
|
||||
"bboxes = [f.bbox for f in faces]\n",
|
||||
"scores = [f.confidence for f in faces]\n",
|
||||
"landmarks = [f.landmarks for f in faces]\n",
|
||||
"\n",
|
||||
"# Draw detections\n",
|
||||
"draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
|
||||
@@ -175,7 +183,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -203,9 +211,9 @@
|
||||
"faces = detector.detect(image, max_num=2)\n",
|
||||
"print(f'Detected {len(faces)} face(s)')\n",
|
||||
"\n",
|
||||
"bboxes = [f['bbox'] for f in faces]\n",
|
||||
"scores = [f['confidence'] for f in faces]\n",
|
||||
"landmarks = [f['landmarks'] for f in faces]\n",
|
||||
"bboxes = [f.bbox for f in faces]\n",
|
||||
"scores = [f.confidence for f in faces]\n",
|
||||
"landmarks = [f.landmarks for f in faces]\n",
|
||||
"\n",
|
||||
"draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
|
||||
"\n",
|
||||
@@ -222,7 +230,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -250,9 +258,9 @@
|
||||
"faces = detector.detect(image, max_num=5)\n",
|
||||
"print(f'Detected {len(faces)} face(s)')\n",
|
||||
"\n",
|
||||
"bboxes = [f['bbox'] for f in faces]\n",
|
||||
"scores = [f['confidence'] for f in faces]\n",
|
||||
"landmarks = [f['landmarks'] for f in faces]\n",
|
||||
"bboxes = [f.bbox for f in faces]\n",
|
||||
"scores = [f.confidence for f in faces]\n",
|
||||
"landmarks = [f.landmarks for f in faces]\n",
|
||||
"\n",
|
||||
"draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=0.6, fancy_bbox=True)\n",
|
||||
"\n",
|
||||
@@ -266,7 +274,8 @@
|
||||
"source": [
|
||||
"## Notes\n",
|
||||
"\n",
|
||||
"- `detect()` returns a list of dictionaries with keys: `bbox`, `confidence`, `landmarks`\n",
|
||||
"- `detect()` returns a list of `Face` objects with attributes: `bbox`, `confidence`, `landmarks`\n",
|
||||
"- Access attributes using dot notation: `face.bbox`, `face.confidence`, `face.landmarks`\n",
|
||||
"- Adjust `conf_thresh` and `nms_thresh` for your use case\n",
|
||||
"- Use `max_num` to limit detected faces"
|
||||
]
|
||||
|
||||
387
examples/face_parsing.ipynb
Normal file
387
examples/face_parsing.ipynb
Normal file
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -11,15 +11,6 @@
|
||||
"## 1. Install UniFace"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -q uniface"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
@@ -29,7 +20,24 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1.3.0\n"
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -q uniface"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1.6.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -56,7 +64,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -72,12 +80,12 @@
|
||||
"analyzer = FaceAnalyzer(\n",
|
||||
" detector=RetinaFace(conf_thresh=0.5),\n",
|
||||
" recognizer=ArcFace()\n",
|
||||
")\n"
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -99,12 +107,12 @@
|
||||
"faces1 = analyzer.analyze(image1)\n",
|
||||
"faces2 = analyzer.analyze(image2)\n",
|
||||
"\n",
|
||||
"print(f'Detected {len(faces1)} and {len(faces2)} faces')\n"
|
||||
"print(f'Detected {len(faces1)} and {len(faces2)} faces')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -130,31 +138,7 @@
|
||||
"axes[1].axis('off')\n",
|
||||
"\n",
|
||||
"plt.tight_layout()\n",
|
||||
"plt.show()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Similarity: 0.1201\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"if faces1 and faces2:\n",
|
||||
" face1 = faces1[0]\n",
|
||||
" face2 = faces2[0]\n",
|
||||
"\n",
|
||||
" similarity = face1.compute_similarity(face2)\n",
|
||||
" print(f'Similarity: {similarity:.4f}')\n",
|
||||
"else:\n",
|
||||
" print('Error: Could not detect faces')\n"
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -166,7 +150,31 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Similarity: 0.1201\n",
|
||||
"Similarity: 0.1135\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"if faces1 and faces2:\n",
|
||||
" face1 = faces1[0]\n",
|
||||
" face2 = faces2[0]\n",
|
||||
"\n",
|
||||
" similarity = face1.compute_similarity(face2)\n",
|
||||
" print(f'Similarity: {similarity:.4f}')\n",
|
||||
"else:\n",
|
||||
" print('Error: Could not detect faces')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Similarity: 0.1135\n",
|
||||
"Threshold: 0.6\n",
|
||||
"Result: Different people\n"
|
||||
]
|
||||
@@ -180,12 +188,12 @@
|
||||
"\n",
|
||||
" print(f'Similarity: {similarity:.4f}')\n",
|
||||
" print(f'Threshold: {THRESHOLD}')\n",
|
||||
" print(f'Result: {\"Same person\" if is_same_person else \"Different people\"}')\n"
|
||||
" print(f'Result: {\"Same person\" if is_same_person else \"Different people\"}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -193,9 +201,9 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Comparing multiple pairs:\n",
|
||||
"image0.jpg vs image1.jpg: 0.1201\n",
|
||||
"image0.jpg vs image2.jpg: 0.0951\n",
|
||||
"image1.jpg vs image2.jpg: -0.0047\n"
|
||||
"image0.jpg vs image1.jpg: 0.1135\n",
|
||||
"image0.jpg vs image2.jpg: 0.0833\n",
|
||||
"image1.jpg vs image2.jpg: -0.0082\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -220,7 +228,7 @@
|
||||
" img1_name = img1_path.split('/')[-1]\n",
|
||||
" img2_name = img2_path.split('/')[-1]\n",
|
||||
"\n",
|
||||
" print(f'{img1_name} vs {img2_name}: {sim:.4f}')\n"
|
||||
" print(f'{img1_name} vs {img2_name}: {sim:.4f}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
270
examples/gaze_estimation.ipynb
Normal file
270
examples/gaze_estimation.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -1,7 +1,7 @@
|
||||
[project]
|
||||
name = "uniface"
|
||||
version = "1.3.0"
|
||||
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Age, and Gender Detection"
|
||||
version = "1.6.0"
|
||||
description = "UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection"
|
||||
readme = "README.md"
|
||||
license = { text = "MIT" }
|
||||
authors = [{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" }]
|
||||
@@ -9,11 +9,14 @@ maintainers = [
|
||||
{ name = "Yakhyokhuja Valikhujaev", email = "yakhyo9696@gmail.com" },
|
||||
]
|
||||
|
||||
requires-python = ">=3.10,<3.14"
|
||||
requires-python = ">=3.11,<3.14"
|
||||
keywords = [
|
||||
"face-detection",
|
||||
"face-recognition",
|
||||
"facial-landmarks",
|
||||
"face-parsing",
|
||||
"face-segmentation",
|
||||
"gaze-estimation",
|
||||
"age-detection",
|
||||
"gender-detection",
|
||||
"computer-vision",
|
||||
@@ -21,6 +24,7 @@ keywords = [
|
||||
"onnx",
|
||||
"onnxruntime",
|
||||
"face-analysis",
|
||||
"bisenet",
|
||||
]
|
||||
|
||||
classifiers = [
|
||||
@@ -30,7 +34,6 @@ classifiers = [
|
||||
"License :: OSI Approved :: MIT License",
|
||||
"Operating System :: OS Independent",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
"Programming Language :: Python :: 3.13",
|
||||
@@ -69,7 +72,7 @@ uniface = ["py.typed"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 120
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
exclude = [
|
||||
".git",
|
||||
".ruff_cache",
|
||||
|
||||
@@ -7,8 +7,10 @@ Scripts for testing UniFace features.
|
||||
| Script | Description |
|
||||
|--------|-------------|
|
||||
| `run_detection.py` | Face detection on image or webcam |
|
||||
| `run_anonymization.py` | Face anonymization/blurring for privacy |
|
||||
| `run_age_gender.py` | Age and gender prediction |
|
||||
| `run_emotion.py` | Emotion detection (7 or 8 emotions) |
|
||||
| `run_gaze_estimation.py` | Gaze direction estimation |
|
||||
| `run_landmarks.py` | 106-point facial landmark detection |
|
||||
| `run_recognition.py` | Face embedding extraction and comparison |
|
||||
| `run_face_analyzer.py` | Complete face analysis (detection + recognition + attributes) |
|
||||
@@ -25,6 +27,11 @@ Scripts for testing UniFace features.
|
||||
python scripts/run_detection.py --image assets/test.jpg
|
||||
python scripts/run_detection.py --webcam
|
||||
|
||||
# Face anonymization
|
||||
python scripts/run_anonymization.py --image assets/test.jpg --method pixelate
|
||||
python scripts/run_anonymization.py --webcam --method gaussian
|
||||
python scripts/run_anonymization.py --image photo.jpg --method pixelate --pixel-blocks 5
|
||||
|
||||
# Age and gender
|
||||
python scripts/run_age_gender.py --image assets/test.jpg
|
||||
python scripts/run_age_gender.py --webcam
|
||||
@@ -33,6 +40,10 @@ python scripts/run_age_gender.py --webcam
|
||||
python scripts/run_emotion.py --image assets/test.jpg
|
||||
python scripts/run_emotion.py --webcam
|
||||
|
||||
# Gaze estimation
|
||||
python scripts/run_gaze_estimation.py --image assets/test.jpg
|
||||
python scripts/run_gaze_estimation.py --webcam
|
||||
|
||||
# Landmarks
|
||||
python scripts/run_landmarks.py --image assets/test.jpg
|
||||
python scripts/run_landmarks.py --webcam
|
||||
|
||||
@@ -31,7 +31,9 @@ def process_image(detector, image_path: Path, output_path: Path, threshold: floa
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
draw_detections(image, bboxes, scores, landmarks, vis_threshold=threshold)
|
||||
draw_detections(
|
||||
image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
|
||||
)
|
||||
|
||||
cv2.putText(
|
||||
image,
|
||||
|
||||
@@ -43,7 +43,9 @@ def process_image(
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
draw_detections(image, bboxes, scores, landmarks, vis_threshold=threshold)
|
||||
draw_detections(
|
||||
image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
|
||||
)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
gender_id, age = age_gender.predict(image, face['bbox'])
|
||||
@@ -77,7 +79,9 @@ def run_webcam(detector, age_gender, threshold: float = 0.6):
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=threshold)
|
||||
draw_detections(
|
||||
image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
|
||||
)
|
||||
|
||||
for face in faces:
|
||||
gender_id, age = age_gender.predict(frame, face['bbox']) # predict per face
|
||||
|
||||
207
scripts/run_anonymization.py
Normal file
207
scripts/run_anonymization.py
Normal file
@@ -0,0 +1,207 @@
|
||||
# Face anonymization/blurring for privacy
|
||||
# Usage: python run_anonymization.py --image path/to/image.jpg --method pixelate
|
||||
# python run_anonymization.py --webcam --method gaussian
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
import cv2
|
||||
|
||||
from uniface import RetinaFace
|
||||
from uniface.privacy import BlurFace
|
||||
|
||||
|
||||
def process_image(
|
||||
detector,
|
||||
blurrer: BlurFace,
|
||||
image_path: str,
|
||||
save_dir: str = 'outputs',
|
||||
show_detections: bool = False,
|
||||
):
|
||||
"""Process a single image."""
|
||||
image = cv2.imread(image_path)
|
||||
if image is None:
|
||||
print(f"Error: Failed to load image from '{image_path}'")
|
||||
return
|
||||
|
||||
# Detect faces
|
||||
faces = detector.detect(image)
|
||||
print(f'Detected {len(faces)} face(s)')
|
||||
|
||||
# Optionally draw detection boxes before blurring
|
||||
if show_detections and faces:
|
||||
from uniface.visualization import draw_detections
|
||||
|
||||
preview = image.copy()
|
||||
bboxes = [face['bbox'] for face in faces]
|
||||
scores = [face['confidence'] for face in faces]
|
||||
landmarks = [face['landmarks'] for face in faces]
|
||||
draw_detections(preview, bboxes, scores, landmarks)
|
||||
|
||||
# Show preview
|
||||
cv2.imshow('Detections (Press any key to continue)', preview)
|
||||
cv2.waitKey(0)
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
# Anonymize faces
|
||||
if faces:
|
||||
anonymized = blurrer.anonymize(image, faces)
|
||||
else:
|
||||
anonymized = image
|
||||
|
||||
# Save output
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
basename = os.path.splitext(os.path.basename(image_path))[0]
|
||||
output_path = os.path.join(save_dir, f'{basename}_anonymized.jpg')
|
||||
cv2.imwrite(output_path, anonymized)
|
||||
print(f'Output saved: {output_path}')
|
||||
|
||||
|
||||
def run_webcam(detector, blurrer: BlurFace):
|
||||
"""Run real-time anonymization on webcam."""
|
||||
cap = cv2.VideoCapture(0)
|
||||
if not cap.isOpened():
|
||||
print('Cannot open webcam')
|
||||
return
|
||||
|
||||
print("Press 'q' to quit")
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
frame = cv2.flip(frame, 1) # mirror for natural interaction
|
||||
if not ret:
|
||||
break
|
||||
|
||||
# Detect and anonymize
|
||||
faces = detector.detect(frame)
|
||||
if faces:
|
||||
frame = blurrer.anonymize(frame, faces, inplace=True)
|
||||
|
||||
# Display info
|
||||
cv2.putText(
|
||||
frame,
|
||||
f'Faces blurred: {len(faces)} | Method: {blurrer.method}',
|
||||
(10, 30),
|
||||
cv2.FONT_HERSHEY_SIMPLEX,
|
||||
0.7,
|
||||
(0, 255, 0),
|
||||
2,
|
||||
)
|
||||
|
||||
cv2.imshow('Face Anonymization (Press q to quit)', frame)
|
||||
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Face anonymization using various blur methods',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Anonymize image with pixelation (default)
|
||||
python run_anonymization.py --image photo.jpg
|
||||
|
||||
# Use Gaussian blur with custom strength
|
||||
python run_anonymization.py --image photo.jpg --method gaussian --blur-strength 5.0
|
||||
|
||||
# Real-time webcam anonymization
|
||||
python run_anonymization.py --webcam --method pixelate
|
||||
|
||||
# Black boxes for maximum privacy
|
||||
python run_anonymization.py --image photo.jpg --method blackout
|
||||
|
||||
# Custom pixelation intensity
|
||||
python run_anonymization.py --image photo.jpg --method pixelate --pixel-blocks 5
|
||||
""",
|
||||
)
|
||||
|
||||
# Input/output
|
||||
parser.add_argument('--image', type=str, help='Path to input image')
|
||||
parser.add_argument('--webcam', action='store_true', help='Use webcam for real-time anonymization')
|
||||
parser.add_argument('--save-dir', type=str, default='outputs', help='Output directory (default: outputs)')
|
||||
|
||||
# Blur method
|
||||
parser.add_argument(
|
||||
'--method',
|
||||
type=str,
|
||||
default='pixelate',
|
||||
choices=['gaussian', 'pixelate', 'blackout', 'elliptical', 'median'],
|
||||
help='Blur method (default: pixelate)',
|
||||
)
|
||||
|
||||
# Method-specific parameters
|
||||
parser.add_argument(
|
||||
'--blur-strength',
|
||||
type=float,
|
||||
default=3.0,
|
||||
help='Blur strength for gaussian/elliptical/median (default: 3.0)',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--pixel-blocks',
|
||||
type=int,
|
||||
default=20,
|
||||
help='Number of pixel blocks for pixelate (default: 10, lower=more pixelated)',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--color',
|
||||
type=str,
|
||||
default='0,0,0',
|
||||
help='Fill color for blackout as R,G,B (default: 0,0,0 for black)',
|
||||
)
|
||||
parser.add_argument('--margin', type=int, default=20, help='Margin for elliptical blur (default: 20)')
|
||||
|
||||
# Detection
|
||||
parser.add_argument(
|
||||
'--conf-thresh',
|
||||
type=float,
|
||||
default=0.5,
|
||||
help='Detection confidence threshold (default: 0.5)',
|
||||
)
|
||||
|
||||
# Visualization
|
||||
parser.add_argument(
|
||||
'--show-detections',
|
||||
action='store_true',
|
||||
help='Show detection boxes before blurring (image mode only)',
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Validate input
|
||||
if not args.image and not args.webcam:
|
||||
parser.error('Either --image or --webcam must be specified')
|
||||
|
||||
# Parse color
|
||||
color_values = [int(x) for x in args.color.split(',')]
|
||||
if len(color_values) != 3:
|
||||
parser.error('--color must be in format R,G,B (e.g., 0,0,0)')
|
||||
color = tuple(color_values)
|
||||
|
||||
# Initialize detector
|
||||
print(f'Initializing face detector (conf_thresh={args.conf_thresh})...')
|
||||
detector = RetinaFace(conf_thresh=args.conf_thresh)
|
||||
|
||||
# Initialize blurrer
|
||||
print(f'Initializing blur method: {args.method}')
|
||||
blurrer = BlurFace(
|
||||
method=args.method,
|
||||
blur_strength=args.blur_strength,
|
||||
pixel_blocks=args.pixel_blocks,
|
||||
color=color,
|
||||
margin=args.margin,
|
||||
)
|
||||
|
||||
# Run
|
||||
if args.webcam:
|
||||
run_webcam(detector, blurrer)
|
||||
else:
|
||||
process_image(detector, blurrer, args.image, args.save_dir, args.show_detections)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -51,7 +51,15 @@ def run_webcam(detector, threshold: float = 0.6):
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=threshold, draw_score=True, fancy_bbox=True)
|
||||
draw_detections(
|
||||
image=frame,
|
||||
bboxes=bboxes,
|
||||
scores=scores,
|
||||
landmarks=landmarks,
|
||||
vis_threshold=threshold,
|
||||
draw_score=True,
|
||||
fancy_bbox=True,
|
||||
)
|
||||
|
||||
cv2.putText(
|
||||
frame,
|
||||
|
||||
@@ -42,7 +42,9 @@ def process_image(
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
draw_detections(image, bboxes, scores, landmarks, vis_threshold=threshold)
|
||||
draw_detections(
|
||||
image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
|
||||
)
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
emotion, confidence = emotion_predictor.predict(image, face['landmarks'])
|
||||
|
||||
@@ -16,8 +16,8 @@ def draw_face_info(image, face, face_id):
|
||||
"""Draw face ID and attributes above bounding box."""
|
||||
x1, y1, x2, y2 = map(int, face.bbox)
|
||||
lines = [f'ID: {face_id}', f'Conf: {face.confidence:.2f}']
|
||||
if face.age and face.gender:
|
||||
lines.append(f'{face.gender}, {face.age}y')
|
||||
if face.age and face.sex:
|
||||
lines.append(f'{face.sex}, {face.age}y')
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
y_pos = y1 - 10 - (len(lines) - 1 - i) * 25
|
||||
@@ -41,7 +41,7 @@ def process_image(analyzer, image_path: str, save_dir: str = 'outputs', show_sim
|
||||
return
|
||||
|
||||
for i, face in enumerate(faces, 1):
|
||||
info = f' Face {i}: {face.gender}, {face.age}y' if face.age and face.gender else f' Face {i}'
|
||||
info = f' Face {i}: {face.sex}, {face.age}y' if face.age and face.sex else f' Face {i}'
|
||||
if face.embedding is not None:
|
||||
info += f' (embedding: {face.embedding.shape})'
|
||||
print(info)
|
||||
@@ -82,7 +82,7 @@ def process_image(analyzer, image_path: str, save_dir: str = 'outputs', show_sim
|
||||
bboxes = [f.bbox for f in faces]
|
||||
scores = [f.confidence for f in faces]
|
||||
landmarks = [f.landmarks for f in faces]
|
||||
draw_detections(image, bboxes, scores, landmarks)
|
||||
draw_detections(image=image, bboxes=bboxes, scores=scores, landmarks=landmarks, fancy_bbox=True)
|
||||
|
||||
for i, face in enumerate(faces, 1):
|
||||
draw_face_info(image, face, i)
|
||||
|
||||
126
scripts/run_face_parsing.py
Normal file
126
scripts/run_face_parsing.py
Normal file
@@ -0,0 +1,126 @@
|
||||
# Face parsing on detected faces
|
||||
# Usage: python run_face_parsing.py --image path/to/image.jpg
|
||||
# python run_face_parsing.py --webcam
|
||||
|
||||
import argparse
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import cv2
|
||||
|
||||
from uniface import RetinaFace
|
||||
from uniface.constants import ParsingWeights
|
||||
from uniface.parsing import BiSeNet
|
||||
from uniface.visualization import vis_parsing_maps
|
||||
|
||||
|
||||
def process_image(detector, parser, image_path: str, save_dir: str = 'outputs'):
|
||||
image = cv2.imread(image_path)
|
||||
if image is None:
|
||||
print(f"Error: Failed to load image from '{image_path}'")
|
||||
return
|
||||
|
||||
faces = detector.detect(image)
|
||||
print(f'Detected {len(faces)} face(s)')
|
||||
|
||||
result_image = image.copy()
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
bbox = face['bbox']
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
# Parse the face
|
||||
mask = parser.parse(face_crop)
|
||||
print(f' Face {i + 1}: parsed with {len(set(mask.flatten()))} unique classes')
|
||||
|
||||
# Visualize the parsing result
|
||||
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
|
||||
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
|
||||
|
||||
# Place the visualization back on the original image
|
||||
result_image[y1:y2, x1:x2] = vis_result
|
||||
|
||||
# Draw bounding box
|
||||
cv2.rectangle(result_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
|
||||
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_parsing.jpg')
|
||||
cv2.imwrite(output_path, result_image)
|
||||
print(f'Output saved: {output_path}')
|
||||
|
||||
|
||||
def run_webcam(detector, parser):
|
||||
cap = cv2.VideoCapture(0)
|
||||
if not cap.isOpened():
|
||||
print('Cannot open webcam')
|
||||
return
|
||||
|
||||
print("Press 'q' to quit")
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
frame = cv2.flip(frame, 1)
|
||||
faces = detector.detect(frame)
|
||||
|
||||
for face in faces:
|
||||
bbox = face['bbox']
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = frame[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
# Parse the face
|
||||
mask = parser.parse(face_crop)
|
||||
|
||||
# Visualize the parsing result
|
||||
face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
|
||||
vis_result = vis_parsing_maps(face_crop_rgb, mask, save_image=False)
|
||||
|
||||
# Place the visualization back on the frame
|
||||
frame[y1:y2, x1:x2] = vis_result
|
||||
|
||||
# Draw bounding box
|
||||
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
cv2.imshow('Face Parsing', frame)
|
||||
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
|
||||
def main():
|
||||
parser_arg = argparse.ArgumentParser(description='Run face parsing')
|
||||
parser_arg.add_argument('--image', type=str, help='Path to input image')
|
||||
parser_arg.add_argument('--webcam', action='store_true', help='Use webcam')
|
||||
parser_arg.add_argument('--save_dir', type=str, default='outputs')
|
||||
parser_arg.add_argument(
|
||||
'--model', type=str, default=ParsingWeights.RESNET18, choices=[ParsingWeights.RESNET18, ParsingWeights.RESNET34]
|
||||
)
|
||||
args = parser_arg.parse_args()
|
||||
|
||||
if not args.image and not args.webcam:
|
||||
parser_arg.error('Either --image or --webcam must be specified')
|
||||
|
||||
detector = RetinaFace()
|
||||
parser = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||
|
||||
if args.webcam:
|
||||
run_webcam(detector, parser)
|
||||
else:
|
||||
process_image(detector, parser, args.image, args.save_dir)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
104
scripts/run_gaze_estimation.py
Normal file
104
scripts/run_gaze_estimation.py
Normal file
@@ -0,0 +1,104 @@
|
||||
# Gaze estimation on detected faces
|
||||
# Usage: python run_gaze_estimation.py --image path/to/image.jpg
|
||||
# python run_gaze_estimation.py --webcam
|
||||
|
||||
import argparse
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface import RetinaFace
|
||||
from uniface.gaze import MobileGaze
|
||||
from uniface.visualization import draw_gaze
|
||||
|
||||
|
||||
def process_image(detector, gaze_estimator, image_path: str, save_dir: str = 'outputs'):
|
||||
image = cv2.imread(image_path)
|
||||
if image is None:
|
||||
print(f"Error: Failed to load image from '{image_path}'")
|
||||
return
|
||||
|
||||
faces = detector.detect(image)
|
||||
print(f'Detected {len(faces)} face(s)')
|
||||
|
||||
for i, face in enumerate(faces):
|
||||
bbox = face['bbox']
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = image[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
pitch, yaw = gaze_estimator.estimate(face_crop)
|
||||
print(f' Face {i + 1}: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°')
|
||||
|
||||
# Draw both bbox and gaze arrow with angle text
|
||||
draw_gaze(image, bbox, pitch, yaw, draw_angles=True)
|
||||
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_gaze.jpg')
|
||||
cv2.imwrite(output_path, image)
|
||||
print(f'Output saved: {output_path}')
|
||||
|
||||
|
||||
def run_webcam(detector, gaze_estimator):
|
||||
cap = cv2.VideoCapture(0)
|
||||
if not cap.isOpened():
|
||||
print('Cannot open webcam')
|
||||
return
|
||||
|
||||
print("Press 'q' to quit")
|
||||
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
frame = cv2.flip(frame, 1)
|
||||
faces = detector.detect(frame)
|
||||
|
||||
for face in faces:
|
||||
bbox = face['bbox']
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
face_crop = frame[y1:y2, x1:x2]
|
||||
|
||||
if face_crop.size == 0:
|
||||
continue
|
||||
|
||||
pitch, yaw = gaze_estimator.estimate(face_crop)
|
||||
# Draw both bbox and gaze arrow
|
||||
draw_gaze(frame, bbox, pitch, yaw)
|
||||
|
||||
cv2.putText(frame, f'Faces: {len(faces)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
cv2.imshow('Gaze Estimation', frame)
|
||||
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Run gaze estimation')
|
||||
parser.add_argument('--image', type=str, help='Path to input image')
|
||||
parser.add_argument('--webcam', action='store_true', help='Use webcam')
|
||||
parser.add_argument('--save_dir', type=str, default='outputs')
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.image and not args.webcam:
|
||||
parser.error('Either --image or --webcam must be specified')
|
||||
|
||||
detector = RetinaFace()
|
||||
gaze_estimator = MobileGaze()
|
||||
|
||||
if args.webcam:
|
||||
run_webcam(detector, gaze_estimator)
|
||||
else:
|
||||
process_image(detector, gaze_estimator, args.image, args.save_dir)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
201
scripts/run_spoofing.py
Normal file
201
scripts/run_spoofing.py
Normal file
@@ -0,0 +1,201 @@
|
||||
# Face Anti-Spoofing Detection
|
||||
# Usage:
|
||||
# Image: python run_spoofing.py --image path/to/image.jpg
|
||||
# Video: python run_spoofing.py --video path/to/video.mp4
|
||||
# Webcam: python run_spoofing.py --source 0
|
||||
|
||||
import argparse
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface import RetinaFace
|
||||
from uniface.constants import MiniFASNetWeights
|
||||
from uniface.spoofing import create_spoofer
|
||||
|
||||
|
||||
def draw_spoofing_result(
|
||||
image: np.ndarray,
|
||||
bbox: list,
|
||||
label_idx: int,
|
||||
score: float,
|
||||
thickness: int = 2,
|
||||
) -> None:
|
||||
"""Draw bounding box with anti-spoofing result.
|
||||
|
||||
Args:
|
||||
image: Input image to draw on.
|
||||
bbox: Bounding box in [x1, y1, x2, y2] format.
|
||||
label_idx: Prediction label index (0 = Fake, 1 = Real).
|
||||
score: Confidence score (0.0 to 1.0).
|
||||
thickness: Line thickness for bounding box.
|
||||
"""
|
||||
x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
|
||||
# Color based on result (green for real, red for fake)
|
||||
is_real = label_idx == 1
|
||||
color = (0, 255, 0) if is_real else (0, 0, 255)
|
||||
|
||||
# Draw bounding box
|
||||
cv2.rectangle(image, (x1, y1), (x2, y2), color, thickness)
|
||||
|
||||
# Prepare label
|
||||
label = 'Real' if is_real else 'Fake'
|
||||
text = f'{label}: {score:.1%}'
|
||||
|
||||
# Draw label background
|
||||
(tw, th), baseline = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2)
|
||||
cv2.rectangle(image, (x1, y1 - th - 10), (x1 + tw + 10, y1), color, -1)
|
||||
|
||||
# Draw label text
|
||||
cv2.putText(image, text, (x1 + 5, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
|
||||
|
||||
|
||||
def process_image(detector, spoofer, image_path: str, save_dir: str = 'outputs') -> None:
|
||||
"""Process a single image for face anti-spoofing detection."""
|
||||
image = cv2.imread(image_path)
|
||||
if image is None:
|
||||
print(f"Error: Failed to load image from '{image_path}'")
|
||||
return
|
||||
|
||||
# Detect faces
|
||||
faces = detector.detect(image)
|
||||
print(f'Detected {len(faces)} face(s)')
|
||||
|
||||
if not faces:
|
||||
print('No faces detected in the image.')
|
||||
return
|
||||
|
||||
# Run anti-spoofing on each face
|
||||
for i, face in enumerate(faces, 1):
|
||||
label_idx, score = spoofer.predict(image, face['bbox'])
|
||||
# label_idx: 0 = Fake, 1 = Real
|
||||
label = 'Real' if label_idx == 1 else 'Fake'
|
||||
print(f' Face {i}: {label} ({score:.1%})')
|
||||
|
||||
# Draw result on image
|
||||
draw_spoofing_result(image, face['bbox'], label_idx, score)
|
||||
|
||||
# Save output
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
output_path = os.path.join(save_dir, f'{Path(image_path).stem}_spoofing.jpg')
|
||||
cv2.imwrite(output_path, image)
|
||||
print(f'Output saved: {output_path}')
|
||||
|
||||
|
||||
def process_video(detector, spoofer, source, save_dir: str = 'outputs') -> None:
|
||||
"""Process video or webcam stream for face anti-spoofing detection."""
|
||||
# Handle webcam or video file
|
||||
if isinstance(source, int) or source.isdigit():
|
||||
cap = cv2.VideoCapture(int(source))
|
||||
is_webcam = True
|
||||
output_name = 'webcam_spoofing.mp4'
|
||||
else:
|
||||
cap = cv2.VideoCapture(source)
|
||||
is_webcam = False
|
||||
output_name = f'{Path(source).stem}_spoofing.mp4'
|
||||
|
||||
if not cap.isOpened():
|
||||
print(f'Error: Failed to open video source: {source}')
|
||||
return
|
||||
|
||||
# Get video properties
|
||||
fps = int(cap.get(cv2.CAP_PROP_FPS)) if not is_webcam else 30
|
||||
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
|
||||
# Setup video writer
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
output_path = os.path.join(save_dir, output_name)
|
||||
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||
writer = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
|
||||
|
||||
print("Processing video... Press 'q' to quit")
|
||||
frame_count = 0
|
||||
|
||||
try:
|
||||
while cap.isOpened():
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
frame_count += 1
|
||||
|
||||
# Detect faces
|
||||
faces = detector.detect(frame)
|
||||
|
||||
# Run anti-spoofing on each face
|
||||
for face in faces:
|
||||
label_idx, score = spoofer.predict(frame, face['bbox'])
|
||||
draw_spoofing_result(frame, face['bbox'], label_idx, score)
|
||||
|
||||
# Write frame
|
||||
writer.write(frame)
|
||||
|
||||
# Display frame
|
||||
cv2.imshow('Face Anti-Spoofing', frame)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
print('Stopped by user.')
|
||||
break
|
||||
|
||||
finally:
|
||||
cap.release()
|
||||
writer.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
print(f'Processed {frame_count} frames')
|
||||
if not is_webcam:
|
||||
print(f'Output saved: {output_path}')
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Face Anti-Spoofing Detection')
|
||||
parser.add_argument('--image', type=str, help='Path to input image')
|
||||
parser.add_argument('--video', type=str, help='Path to input video')
|
||||
parser.add_argument('--source', type=str, help='Video source (0 for webcam)')
|
||||
parser.add_argument(
|
||||
'--model',
|
||||
type=str,
|
||||
default='v2',
|
||||
choices=['v1se', 'v2'],
|
||||
help='Model variant: v1se or v2 (default: v2)',
|
||||
)
|
||||
parser.add_argument('--scale', type=float, default=None, help='Custom crop scale (default: auto)')
|
||||
parser.add_argument('--save_dir', type=str, default='outputs', help='Output directory')
|
||||
args = parser.parse_args()
|
||||
|
||||
# Check that at least one input source is provided
|
||||
if not any([args.image, args.video, args.source]):
|
||||
parser.print_help()
|
||||
print('\nError: Please provide --image, --video, or --source')
|
||||
return
|
||||
|
||||
# Select model variant
|
||||
model_name = MiniFASNetWeights.V1SE if args.model == 'v1se' else MiniFASNetWeights.V2
|
||||
|
||||
# Initialize models
|
||||
print(f'Initializing models (MiniFASNet {args.model.upper()})...')
|
||||
detector = RetinaFace()
|
||||
spoofer = create_spoofer(model_name=model_name, scale=args.scale)
|
||||
|
||||
# Process input
|
||||
if args.image:
|
||||
if not os.path.exists(args.image):
|
||||
print(f'Error: Image not found: {args.image}')
|
||||
return
|
||||
process_image(detector, spoofer, args.image, args.save_dir)
|
||||
|
||||
elif args.video:
|
||||
if not os.path.exists(args.video):
|
||||
print(f'Error: Video not found: {args.video}')
|
||||
return
|
||||
process_video(detector, spoofer, args.video, args.save_dir)
|
||||
|
||||
elif args.source:
|
||||
process_video(detector, spoofer, args.source, args.save_dir)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -55,7 +55,9 @@ def process_video(
|
||||
bboxes = [f['bbox'] for f in faces]
|
||||
scores = [f['confidence'] for f in faces]
|
||||
landmarks = [f['landmarks'] for f in faces]
|
||||
draw_detections(frame, bboxes, scores, landmarks, vis_threshold=threshold)
|
||||
draw_detections(
|
||||
image=frame, bboxes=bboxes, scores=scores, landmarks=landmarks, vis_threshold=threshold, fancy_bbox=True
|
||||
)
|
||||
|
||||
cv2.putText(
|
||||
frame,
|
||||
|
||||
118
tests/test_parsing.py
Normal file
118
tests/test_parsing.py
Normal file
@@ -0,0 +1,118 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
from uniface.constants import ParsingWeights
|
||||
from uniface.parsing import BiSeNet, create_face_parser
|
||||
|
||||
|
||||
def test_bisenet_initialization():
|
||||
"""Test BiSeNet initialization."""
|
||||
parser = BiSeNet()
|
||||
assert parser is not None
|
||||
assert parser.input_size == (512, 512)
|
||||
|
||||
|
||||
def test_bisenet_with_different_models():
|
||||
"""Test BiSeNet with different model weights."""
|
||||
parser_resnet18 = BiSeNet(model_name=ParsingWeights.RESNET18)
|
||||
parser_resnet34 = BiSeNet(model_name=ParsingWeights.RESNET34)
|
||||
|
||||
assert parser_resnet18 is not None
|
||||
assert parser_resnet34 is not None
|
||||
|
||||
|
||||
def test_bisenet_preprocess():
|
||||
"""Test preprocessing."""
|
||||
parser = BiSeNet()
|
||||
|
||||
# Create a dummy face image
|
||||
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
|
||||
|
||||
# Preprocess
|
||||
preprocessed = parser.preprocess(face_image)
|
||||
|
||||
assert preprocessed.shape == (1, 3, 512, 512)
|
||||
assert preprocessed.dtype == np.float32
|
||||
|
||||
|
||||
def test_bisenet_postprocess():
|
||||
"""Test postprocessing."""
|
||||
parser = BiSeNet()
|
||||
|
||||
# Create dummy model output (batch_size=1, num_classes=19, H=512, W=512)
|
||||
dummy_output = np.random.randn(1, 19, 512, 512).astype(np.float32)
|
||||
|
||||
# Postprocess
|
||||
mask = parser.postprocess(dummy_output, original_size=(256, 256))
|
||||
|
||||
assert mask.shape == (256, 256)
|
||||
assert mask.dtype == np.uint8
|
||||
assert mask.min() >= 0
|
||||
assert mask.max() < 19 # 19 classes (0-18)
|
||||
|
||||
|
||||
def test_bisenet_parse():
|
||||
"""Test end-to-end parsing."""
|
||||
parser = BiSeNet()
|
||||
|
||||
# Create a dummy face image
|
||||
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
|
||||
|
||||
# Parse
|
||||
mask = parser.parse(face_image)
|
||||
|
||||
assert mask.shape == (256, 256)
|
||||
assert mask.dtype == np.uint8
|
||||
assert mask.min() >= 0
|
||||
assert mask.max() < 19
|
||||
|
||||
|
||||
def test_bisenet_callable():
|
||||
"""Test that BiSeNet is callable."""
|
||||
parser = BiSeNet()
|
||||
face_image = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
|
||||
|
||||
# Should work as callable
|
||||
mask = parser(face_image)
|
||||
|
||||
assert mask.shape == (256, 256)
|
||||
assert mask.dtype == np.uint8
|
||||
|
||||
|
||||
def test_create_face_parser_with_enum():
|
||||
"""Test factory function with enum."""
|
||||
parser = create_face_parser(ParsingWeights.RESNET18)
|
||||
assert parser is not None
|
||||
assert isinstance(parser, BiSeNet)
|
||||
|
||||
|
||||
def test_create_face_parser_with_string():
|
||||
"""Test factory function with string."""
|
||||
parser = create_face_parser('parsing_resnet18')
|
||||
assert parser is not None
|
||||
assert isinstance(parser, BiSeNet)
|
||||
|
||||
|
||||
def test_create_face_parser_invalid_model():
|
||||
"""Test factory function with invalid model name."""
|
||||
with pytest.raises(ValueError, match='Unknown face parsing model'):
|
||||
create_face_parser('invalid_model')
|
||||
|
||||
|
||||
def test_bisenet_different_input_sizes():
|
||||
"""Test parsing with different input image sizes."""
|
||||
parser = BiSeNet()
|
||||
|
||||
# Test with different sizes
|
||||
sizes = [(128, 128), (256, 256), (512, 512), (640, 480)]
|
||||
|
||||
for h, w in sizes:
|
||||
face_image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
|
||||
mask = parser.parse(face_image)
|
||||
|
||||
assert mask.shape == (h, w), f'Failed for size {h}x{w}'
|
||||
assert mask.dtype == np.uint8
|
||||
@@ -13,13 +13,13 @@
|
||||
|
||||
__license__ = 'MIT'
|
||||
__author__ = 'Yakhyokhuja Valikhujaev'
|
||||
__version__ = '1.3.0'
|
||||
__version__ = '1.6.0'
|
||||
|
||||
|
||||
from uniface.face_utils import compute_similarity, face_alignment
|
||||
from uniface.log import Logger, enable_logging
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.visualization import draw_detections
|
||||
from uniface.visualization import draw_detections, vis_parsing_maps
|
||||
|
||||
from .analyzer import FaceAnalyzer
|
||||
from .attribute import AgeGender
|
||||
@@ -37,8 +37,12 @@ from .detection import (
|
||||
detect_faces,
|
||||
list_available_detectors,
|
||||
)
|
||||
from .gaze import MobileGaze, create_gaze_estimator
|
||||
from .landmark import Landmark106, create_landmarker
|
||||
from .parsing import BiSeNet, create_face_parser
|
||||
from .privacy import BlurFace, anonymize_faces
|
||||
from .recognition import ArcFace, MobileFace, SphereFace, create_recognizer
|
||||
from .spoofing import MiniFASNet, create_spoofer
|
||||
|
||||
__all__ = [
|
||||
'__author__',
|
||||
@@ -49,8 +53,11 @@ __all__ = [
|
||||
'FaceAnalyzer',
|
||||
# Factory functions
|
||||
'create_detector',
|
||||
'create_face_parser',
|
||||
'create_gaze_estimator',
|
||||
'create_landmarker',
|
||||
'create_recognizer',
|
||||
'create_spoofer',
|
||||
'detect_faces',
|
||||
'list_available_detectors',
|
||||
# Detection models
|
||||
@@ -63,12 +70,22 @@ __all__ = [
|
||||
'SphereFace',
|
||||
# Landmark models
|
||||
'Landmark106',
|
||||
# Gaze models
|
||||
'MobileGaze',
|
||||
# Parsing models
|
||||
'BiSeNet',
|
||||
# Attribute models
|
||||
'AgeGender',
|
||||
'Emotion',
|
||||
# Spoofing models
|
||||
'MiniFASNet',
|
||||
# Privacy
|
||||
'BlurFace',
|
||||
'anonymize_faces',
|
||||
# Utilities
|
||||
'compute_similarity',
|
||||
'draw_detections',
|
||||
'vis_parsing_maps',
|
||||
'face_alignment',
|
||||
'verify_model_weights',
|
||||
'Logger',
|
||||
|
||||
@@ -36,42 +36,24 @@ class FaceAnalyzer:
|
||||
|
||||
def analyze(self, image: np.ndarray) -> List[Face]:
|
||||
"""Analyze faces in an image."""
|
||||
detections = self.detector.detect(image)
|
||||
Logger.debug(f'Detected {len(detections)} face(s)')
|
||||
faces = self.detector.detect(image)
|
||||
Logger.debug(f'Detected {len(faces)} face(s)')
|
||||
|
||||
faces = []
|
||||
for idx, detection in enumerate(detections):
|
||||
bbox = detection['bbox']
|
||||
confidence = detection['confidence']
|
||||
landmarks = detection['landmarks']
|
||||
|
||||
embedding = None
|
||||
for idx, face in enumerate(faces):
|
||||
if self.recognizer is not None:
|
||||
try:
|
||||
embedding = self.recognizer.get_normalized_embedding(image, landmarks)
|
||||
Logger.debug(f' Face {idx + 1}: Extracted embedding with shape {embedding.shape}')
|
||||
face.embedding = self.recognizer.get_normalized_embedding(image, face.landmarks)
|
||||
Logger.debug(f' Face {idx + 1}: Extracted embedding with shape {face.embedding.shape}')
|
||||
except Exception as e:
|
||||
Logger.warning(f' Face {idx + 1}: Failed to extract embedding: {e}')
|
||||
|
||||
age, gender_id = None, None
|
||||
if self.age_gender is not None:
|
||||
try:
|
||||
gender_id, age = self.age_gender.predict(image, bbox)
|
||||
gender_str = 'Female' if gender_id == 0 else 'Male'
|
||||
Logger.debug(f' Face {idx + 1}: Age={age}, Gender={gender_str}')
|
||||
face.gender, face.age = self.age_gender.predict(image, face.bbox)
|
||||
Logger.debug(f' Face {idx + 1}: Age={face.age}, Gender={face.gender}')
|
||||
except Exception as e:
|
||||
Logger.warning(f' Face {idx + 1}: Failed to predict age/gender: {e}')
|
||||
|
||||
face = Face(
|
||||
bbox=bbox,
|
||||
confidence=confidence,
|
||||
landmarks=landmarks,
|
||||
embedding=embedding,
|
||||
age=age,
|
||||
gender_id=gender_id,
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
Logger.info(f'Analysis complete: {len(faces)} face(s) processed')
|
||||
return faces
|
||||
|
||||
|
||||
@@ -62,11 +62,13 @@ class YOLOv5FaceWeights(str, Enum):
|
||||
Exported to ONNX from: https://github.com/yakhyo/yolov5-face-onnx-inference
|
||||
|
||||
Model Performance (WIDER FACE):
|
||||
- YOLOV5S: 7.1M params, 28MB, 94.33% Easy / 92.61% Medium / 83.15% Hard
|
||||
- YOLOV5M: 21.1M params, 84MB, 95.30% Easy / 93.76% Medium / 85.28% Hard
|
||||
- YOLOV5N: 11MB, 93.61% Easy / 91.52% Medium / 80.53% Hard
|
||||
- YOLOV5S: 28MB, 94.33% Easy / 92.61% Medium / 83.15% Hard
|
||||
- YOLOV5M: 82MB, 95.30% Easy / 93.76% Medium / 85.28% Hard
|
||||
"""
|
||||
YOLOV5S = "yolov5s_face"
|
||||
YOLOV5M = "yolov5m_face"
|
||||
YOLOV5N = "yolov5n"
|
||||
YOLOV5S = "yolov5s"
|
||||
YOLOV5M = "yolov5m"
|
||||
|
||||
|
||||
class DDAMFNWeights(str, Enum):
|
||||
@@ -94,6 +96,43 @@ class LandmarkWeights(str, Enum):
|
||||
DEFAULT = "2d_106"
|
||||
|
||||
|
||||
class GazeWeights(str, Enum):
|
||||
"""
|
||||
MobileGaze: Real-Time Gaze Estimation models.
|
||||
Trained on Gaze360 dataset.
|
||||
https://github.com/yakhyo/gaze-estimation
|
||||
"""
|
||||
RESNET18 = "gaze_resnet18"
|
||||
RESNET34 = "gaze_resnet34"
|
||||
RESNET50 = "gaze_resnet50"
|
||||
MOBILENET_V2 = "gaze_mobilenetv2"
|
||||
MOBILEONE_S0 = "gaze_mobileone_s0"
|
||||
|
||||
|
||||
class ParsingWeights(str, Enum):
|
||||
"""
|
||||
Face Parsing: Semantic Segmentation of Facial Components.
|
||||
Trained on CelebAMask-HQ dataset.
|
||||
https://github.com/yakhyo/face-parsing
|
||||
"""
|
||||
RESNET18 = "parsing_resnet18"
|
||||
RESNET34 = "parsing_resnet34"
|
||||
|
||||
|
||||
class MiniFASNetWeights(str, Enum):
|
||||
"""
|
||||
MiniFASNet: Lightweight Face Anti-Spoofing models.
|
||||
Trained on face anti-spoofing datasets.
|
||||
https://github.com/yakhyo/face-anti-spoofing
|
||||
|
||||
Model Variants:
|
||||
- V1SE: Uses scale=4.0 for face crop (squeese-and-excitation version)
|
||||
- V2: Uses scale=2.7 for face crop (improved version)
|
||||
"""
|
||||
V1SE = "minifasnet_v1se"
|
||||
V2 = "minifasnet_v2"
|
||||
|
||||
|
||||
MODEL_URLS: Dict[Enum, str] = {
|
||||
# RetinaFace
|
||||
RetinaFaceWeights.MNET_025: 'https://github.com/yakhyo/uniface/releases/download/weights/retinaface_mv1_0.25.onnx',
|
||||
@@ -117,6 +156,7 @@ MODEL_URLS: Dict[Enum, str] = {
|
||||
SCRFDWeights.SCRFD_10G_KPS: 'https://github.com/yakhyo/uniface/releases/download/weights/scrfd_10g_kps.onnx',
|
||||
SCRFDWeights.SCRFD_500M_KPS: 'https://github.com/yakhyo/uniface/releases/download/weights/scrfd_500m_kps.onnx',
|
||||
# YOLOv5-Face
|
||||
YOLOv5FaceWeights.YOLOV5N: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5n_face.onnx',
|
||||
YOLOv5FaceWeights.YOLOV5S: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5s_face.onnx',
|
||||
YOLOv5FaceWeights.YOLOV5M: 'https://github.com/yakhyo/yolov5-face-onnx-inference/releases/download/weights/yolov5m_face.onnx',
|
||||
# DDAFM
|
||||
@@ -126,6 +166,18 @@ MODEL_URLS: Dict[Enum, str] = {
|
||||
AgeGenderWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/weights/genderage.onnx',
|
||||
# Landmarks
|
||||
LandmarkWeights.DEFAULT: 'https://github.com/yakhyo/uniface/releases/download/weights/2d106det.onnx',
|
||||
# Gaze (MobileGaze)
|
||||
GazeWeights.RESNET18: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet18_gaze.onnx',
|
||||
GazeWeights.RESNET34: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet34_gaze.onnx',
|
||||
GazeWeights.RESNET50: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/resnet50_gaze.onnx',
|
||||
GazeWeights.MOBILENET_V2: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobilenetv2_gaze.onnx',
|
||||
GazeWeights.MOBILEONE_S0: 'https://github.com/yakhyo/gaze-estimation/releases/download/weights/mobileone_s0_gaze.onnx',
|
||||
# Parsing
|
||||
ParsingWeights.RESNET18: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet18.onnx',
|
||||
ParsingWeights.RESNET34: 'https://github.com/yakhyo/face-parsing/releases/download/weights/resnet34.onnx',
|
||||
# Anti-Spoofing (MiniFASNet)
|
||||
MiniFASNetWeights.V1SE: 'https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV1SE.onnx',
|
||||
MiniFASNetWeights.V2: 'https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV2.onnx',
|
||||
}
|
||||
|
||||
MODEL_SHA256: Dict[Enum, str] = {
|
||||
@@ -151,6 +203,7 @@ MODEL_SHA256: Dict[Enum, str] = {
|
||||
SCRFDWeights.SCRFD_10G_KPS: '5838f7fe053675b1c7a08b633df49e7af5495cee0493c7dcf6697200b85b5b91',
|
||||
SCRFDWeights.SCRFD_500M_KPS: '5e4447f50245bbd7966bd6c0fa52938c61474a04ec7def48753668a9d8b4ea3a',
|
||||
# YOLOv5-Face
|
||||
YOLOv5FaceWeights.YOLOV5N: 'eb244a06e36999db732b317c2b30fa113cd6cfc1a397eaf738f2d6f33c01f640',
|
||||
YOLOv5FaceWeights.YOLOV5S: 'fc682801cd5880e1e296184a14aea0035486b5146ec1a1389d2e7149cb134bb2',
|
||||
YOLOv5FaceWeights.YOLOV5M: '04302ce27a15bde3e20945691b688e2dd018a10e92dd8932146bede6a49207b2',
|
||||
# DDAFM
|
||||
@@ -160,6 +213,18 @@ MODEL_SHA256: Dict[Enum, str] = {
|
||||
AgeGenderWeights.DEFAULT: '4fde69b1c810857b88c64a335084f1c3fe8f01246c9a191b48c7bb756d6652fb',
|
||||
# Landmark
|
||||
LandmarkWeights.DEFAULT: 'f001b856447c413801ef5c42091ed0cd516fcd21f2d6b79635b1e733a7109dbf',
|
||||
# MobileGaze (trained on Gaze360)
|
||||
GazeWeights.RESNET18: '23d5d7e4f6f40dce8c35274ce9d08b45b9e22cbaaf5af73182f473229d713d31',
|
||||
GazeWeights.RESNET34: '4457ee5f7acd1a5ab02da4b61f02fc3a0b17adbf3844dd0ba3cd4288f2b5e1de',
|
||||
GazeWeights.RESNET50: 'e1eaf98f5ec7c89c6abe7cfe39f7be83e747163f98d1ff945c0603b3c521be22',
|
||||
GazeWeights.MOBILENET_V2: 'fdcdb84e3e6421b5a79e8f95139f249fc258d7f387eed5ddac2b80a9a15ce076',
|
||||
GazeWeights.MOBILEONE_S0: 'c0b5a4f4a0ffd24f76ab3c1452354bb2f60110899fd9a88b464c75bafec0fde8',
|
||||
# Face Parsing
|
||||
ParsingWeights.RESNET18: '0d9bd318e46987c3bdbfacae9e2c0f461cae1c6ac6ea6d43bbe541a91727e33f',
|
||||
ParsingWeights.RESNET34: '5b805bba7b5660ab7070b5a381dcf75e5b3e04199f1e9387232a77a00095102e',
|
||||
# Anti-Spoofing (MiniFASNet)
|
||||
MiniFASNetWeights.V1SE: 'ebab7f90c7833fbccd46d3a555410e78d969db5438e169b6524be444862b3676',
|
||||
MiniFASNetWeights.V2: 'b32929adc2d9c34b9486f8c4c7bc97c1b69bc0ea9befefc380e4faae4e463907',
|
||||
}
|
||||
|
||||
CHUNK_SIZE = 8192
|
||||
|
||||
@@ -7,6 +7,8 @@ from typing import Any, Dict, List
|
||||
|
||||
import numpy as np
|
||||
|
||||
from uniface.face import Face
|
||||
|
||||
from .base import BaseDetector
|
||||
from .retinaface import RetinaFace
|
||||
from .scrfd import SCRFD
|
||||
@@ -16,7 +18,7 @@ from .yolov5 import YOLOv5Face
|
||||
_detector_cache: Dict[str, BaseDetector] = {}
|
||||
|
||||
|
||||
def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> List[Dict[str, Any]]:
|
||||
def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> List[Face]:
|
||||
"""
|
||||
High-level face detection function.
|
||||
|
||||
@@ -26,18 +28,18 @@ def detect_faces(image: np.ndarray, method: str = 'retinaface', **kwargs) -> Lis
|
||||
**kwargs: Additional arguments passed to the detector.
|
||||
|
||||
Returns:
|
||||
List[Dict[str, Any]]: A list of dictionaries, where each dictionary represents a detected face and contains:
|
||||
- 'bbox' (List[float]): [x1, y1, x2, y2] bounding box coordinates.
|
||||
- 'confidence' (float): The confidence score of the detection.
|
||||
- 'landmarks' (List[List[float]]): 5-point facial landmarks.
|
||||
List[Face]: A list of Face objects, each containing:
|
||||
- bbox (np.ndarray): [x1, y1, x2, y2] bounding box coordinates.
|
||||
- confidence (float): The confidence score of the detection.
|
||||
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2).
|
||||
|
||||
Example:
|
||||
>>> from uniface import detect_faces
|
||||
>>> image = cv2.imread("your_image.jpg")
|
||||
>>> faces = detect_faces(image, method='retinaface', conf_thresh=0.8)
|
||||
>>> for face in faces:
|
||||
... print(f"Found face with confidence: {face['confidence']}")
|
||||
... print(f"BBox: {face['bbox']}")
|
||||
... print(f"Found face with confidence: {face.confidence}")
|
||||
... print(f"BBox: {face.bbox}")
|
||||
"""
|
||||
method_name = method.lower()
|
||||
|
||||
|
||||
@@ -7,6 +7,8 @@ from typing import Any, Dict, List
|
||||
|
||||
import numpy as np
|
||||
|
||||
from uniface.face import Face
|
||||
|
||||
|
||||
class BaseDetector(ABC):
|
||||
"""
|
||||
@@ -21,7 +23,7 @@ class BaseDetector(ABC):
|
||||
self.config = kwargs
|
||||
|
||||
@abstractmethod
|
||||
def detect(self, image: np.ndarray, **kwargs) -> List[Dict[str, Any]]:
|
||||
def detect(self, image: np.ndarray, **kwargs) -> List[Face]:
|
||||
"""
|
||||
Detect faces in an image.
|
||||
|
||||
@@ -30,18 +32,17 @@ class BaseDetector(ABC):
|
||||
**kwargs: Additional detection parameters
|
||||
|
||||
Returns:
|
||||
List[Dict[str, Any]]: List of detected faces, where each dictionary contains:
|
||||
- 'bbox' (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- 'confidence' (float): Detection confidence score (0.0 to 1.0)
|
||||
- 'landmarks' (np.ndarray): Facial landmarks with shape (5, 2) for 5-point landmarks
|
||||
or (68, 2) for 68-point landmarks. Empty array if not supported.
|
||||
List[Face]: List of detected Face objects, each containing:
|
||||
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- confidence (float): Detection confidence score (0.0 to 1.0)
|
||||
- landmarks (np.ndarray): Facial landmarks with shape (5, 2) for 5-point landmarks
|
||||
|
||||
Example:
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face['bbox'] # np.ndarray with shape (4,)
|
||||
... confidence = face['confidence'] # float
|
||||
... landmarks = face['landmarks'] # np.ndarray with shape (5, 2)
|
||||
... bbox = face.bbox # np.ndarray with shape (4,)
|
||||
... confidence = face.confidence # float
|
||||
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Any, Dict, List, Literal, Tuple
|
||||
from typing import Any, List, Literal, Tuple
|
||||
|
||||
import numpy as np
|
||||
|
||||
@@ -14,6 +14,7 @@ from uniface.common import (
|
||||
resize_image,
|
||||
)
|
||||
from uniface.constants import RetinaFaceWeights
|
||||
from uniface.face import Face
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
@@ -154,7 +155,7 @@ class RetinaFace(BaseDetector):
|
||||
max_num: int = 0,
|
||||
metric: Literal['default', 'max'] = 'max',
|
||||
center_weight: float = 2.0,
|
||||
) -> List[Dict[str, Any]]:
|
||||
) -> List[Face]:
|
||||
"""
|
||||
Perform face detection on an input image and return bounding boxes and facial landmarks.
|
||||
|
||||
@@ -168,19 +169,19 @@ class RetinaFace(BaseDetector):
|
||||
when using the "default" metric. Defaults to 2.0.
|
||||
|
||||
Returns:
|
||||
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
|
||||
- 'bbox' (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- 'confidence' (float): Detection confidence score (0.0 to 1.0)
|
||||
- 'landmarks' (np.ndarray): 5-point facial landmarks with shape (5, 2)
|
||||
List[Face]: List of Face objects, each containing:
|
||||
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- confidence (float): Detection confidence score (0.0 to 1.0)
|
||||
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
|
||||
|
||||
Example:
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face['bbox'] # np.ndarray with shape (4,)
|
||||
... confidence = face['confidence'] # float
|
||||
... landmarks = face['landmarks'] # np.ndarray with shape (5, 2)
|
||||
... bbox = face.bbox # np.ndarray with shape (4,)
|
||||
... confidence = face.confidence # float
|
||||
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
|
||||
... # Can pass landmarks directly to recognition
|
||||
... embedding = recognizer.get_normalized_embedding(image, landmarks)
|
||||
... embedding = recognizer.get_normalized_embedding(image, face.landmarks)
|
||||
"""
|
||||
|
||||
original_height, original_width = image.shape[:2]
|
||||
@@ -229,12 +230,12 @@ class RetinaFace(BaseDetector):
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face_dict = {
|
||||
'bbox': detections[i, :4].astype(np.float32),
|
||||
'confidence': float(detections[i, 4]),
|
||||
'landmarks': landmarks[i].astype(np.float32),
|
||||
}
|
||||
faces.append(face_dict)
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
@@ -293,7 +294,7 @@ class RetinaFace(BaseDetector):
|
||||
landmarks[: self.post_nms_topk],
|
||||
)
|
||||
|
||||
landmarks = landmarks.reshape(-1, 5, 2).astype(np.int32)
|
||||
landmarks = landmarks.reshape(-1, 5, 2).astype(np.float32)
|
||||
|
||||
return detections, landmarks
|
||||
|
||||
@@ -350,19 +351,12 @@ if __name__ == '__main__':
|
||||
|
||||
# Process each detected face
|
||||
for face in faces:
|
||||
# Extract bbox and landmarks from dictionary
|
||||
bbox = face['bbox'] # [x1, y1, x2, y2]
|
||||
landmarks = face['landmarks'] # [[x1, y1], [x2, y2], ...]
|
||||
confidence = face['confidence']
|
||||
# Extract bbox and landmarks from Face object
|
||||
draw_bbox(frame, face.bbox, face.confidence)
|
||||
|
||||
# Pass bbox and confidence separately
|
||||
draw_bbox(frame, bbox, confidence)
|
||||
|
||||
# Convert landmarks to numpy array format if needed
|
||||
if landmarks is not None and len(landmarks) > 0:
|
||||
# Convert list of [x, y] pairs to numpy array
|
||||
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
|
||||
draw_keypoints(frame, points)
|
||||
# Draw landmarks if available
|
||||
if face.landmarks is not None and len(face.landmarks) > 0:
|
||||
draw_keypoints(frame, face.landmarks)
|
||||
|
||||
# Display face count
|
||||
cv2.putText(
|
||||
|
||||
@@ -2,13 +2,14 @@
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Any, Dict, List, Literal, Tuple
|
||||
from typing import Any, List, Literal, Tuple
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.common import distance2bbox, distance2kps, non_max_suppression, resize_image
|
||||
from uniface.constants import SCRFDWeights
|
||||
from uniface.face import Face
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
@@ -193,7 +194,7 @@ class SCRFD(BaseDetector):
|
||||
max_num: int = 0,
|
||||
metric: Literal['default', 'max'] = 'max',
|
||||
center_weight: float = 2.0,
|
||||
) -> List[Dict[str, Any]]:
|
||||
) -> List[Face]:
|
||||
"""
|
||||
Perform face detection on an input image and return bounding boxes and facial landmarks.
|
||||
|
||||
@@ -207,19 +208,19 @@ class SCRFD(BaseDetector):
|
||||
when using the "default" metric. Defaults to 2.0.
|
||||
|
||||
Returns:
|
||||
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
|
||||
- 'bbox' (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- 'confidence' (float): Detection confidence score (0.0 to 1.0)
|
||||
- 'landmarks' (np.ndarray): 5-point facial landmarks with shape (5, 2)
|
||||
List[Face]: List of Face objects, each containing:
|
||||
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- confidence (float): Detection confidence score (0.0 to 1.0)
|
||||
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
|
||||
|
||||
Example:
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face['bbox'] # np.ndarray with shape (4,)
|
||||
... confidence = face['confidence'] # float
|
||||
... landmarks = face['landmarks'] # np.ndarray with shape (5, 2)
|
||||
... bbox = face.bbox # np.ndarray with shape (4,)
|
||||
... confidence = face.confidence # float
|
||||
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
|
||||
... # Can pass landmarks directly to recognition
|
||||
... embedding = recognizer.get_normalized_embedding(image, landmarks)
|
||||
... embedding = recognizer.get_normalized_embedding(image, face.landmarks)
|
||||
"""
|
||||
|
||||
original_height, original_width = image.shape[:2]
|
||||
@@ -251,7 +252,7 @@ class SCRFD(BaseDetector):
|
||||
|
||||
detections = pre_det[keep, :]
|
||||
landmarks = landmarks[order, :, :]
|
||||
landmarks = landmarks[keep, :, :].astype(np.int32)
|
||||
landmarks = landmarks[keep, :, :].astype(np.float32)
|
||||
|
||||
if 0 < max_num < detections.shape[0]:
|
||||
# Calculate area of detections
|
||||
@@ -280,12 +281,12 @@ class SCRFD(BaseDetector):
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face_dict = {
|
||||
'bbox': detections[i, :4].astype(np.float32),
|
||||
'confidence': float(detections[i, 4]),
|
||||
'landmarks': landmarks[i].astype(np.float32),
|
||||
}
|
||||
faces.append(face_dict)
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
@@ -324,19 +325,12 @@ if __name__ == '__main__':
|
||||
|
||||
# Process each detected face
|
||||
for face in faces:
|
||||
# Extract bbox and landmarks from dictionary
|
||||
bbox = face['bbox'] # [x1, y1, x2, y2]
|
||||
landmarks = face['landmarks'] # [[x1, y1], [x2, y2], ...]
|
||||
confidence = face['confidence']
|
||||
# Extract bbox and landmarks from Face object
|
||||
draw_bbox(frame, face.bbox, face.confidence)
|
||||
|
||||
# Pass bbox and confidence separately
|
||||
draw_bbox(frame, bbox, confidence)
|
||||
|
||||
# Convert landmarks to numpy array format if needed
|
||||
if landmarks is not None and len(landmarks) > 0:
|
||||
# Convert list of [x, y] pairs to numpy array
|
||||
points = np.array(landmarks, dtype=np.float32) # Shape: (5, 2)
|
||||
draw_keypoints(frame, points)
|
||||
# Draw landmarks if available
|
||||
if face.landmarks is not None and len(face.landmarks) > 0:
|
||||
draw_keypoints(frame, face.landmarks)
|
||||
|
||||
# Display face count
|
||||
cv2.putText(
|
||||
|
||||
@@ -2,13 +2,14 @@
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Any, Dict, List, Literal, Tuple
|
||||
from typing import Any, List, Literal, Tuple
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.common import non_max_suppression
|
||||
from uniface.constants import YOLOv5FaceWeights
|
||||
from uniface.face import Face
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
@@ -259,7 +260,7 @@ class YOLOv5Face(BaseDetector):
|
||||
max_num: int = 0,
|
||||
metric: Literal['default', 'max'] = 'max',
|
||||
center_weight: float = 2.0,
|
||||
) -> List[Dict[str, Any]]:
|
||||
) -> List[Face]:
|
||||
"""
|
||||
Perform face detection on an input image and return bounding boxes and facial landmarks.
|
||||
|
||||
@@ -273,19 +274,19 @@ class YOLOv5Face(BaseDetector):
|
||||
when using the "default" metric. Defaults to 2.0.
|
||||
|
||||
Returns:
|
||||
List[Dict[str, Any]]: List of face detection dictionaries, each containing:
|
||||
- 'bbox' (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- 'confidence' (float): Detection confidence score (0.0 to 1.0)
|
||||
- 'landmarks' (np.ndarray): 5-point facial landmarks with shape (5, 2)
|
||||
List[Face]: List of Face objects, each containing:
|
||||
- bbox (np.ndarray): Bounding box coordinates with shape (4,) as [x1, y1, x2, y2]
|
||||
- confidence (float): Detection confidence score (0.0 to 1.0)
|
||||
- landmarks (np.ndarray): 5-point facial landmarks with shape (5, 2)
|
||||
|
||||
Example:
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face['bbox'] # np.ndarray with shape (4,)
|
||||
... confidence = face['confidence'] # float
|
||||
... landmarks = face['landmarks'] # np.ndarray with shape (5, 2)
|
||||
... bbox = face.bbox # np.ndarray with shape (4,)
|
||||
... confidence = face.confidence # float
|
||||
... landmarks = face.landmarks # np.ndarray with shape (5, 2)
|
||||
... # Can pass landmarks directly to recognition
|
||||
... embedding = recognizer.get_normalized_embedding(image, landmarks)
|
||||
... embedding = recognizer.get_normalized_embedding(image, face.landmarks)
|
||||
"""
|
||||
|
||||
original_height, original_width = image.shape[:2]
|
||||
@@ -330,11 +331,11 @@ class YOLOv5Face(BaseDetector):
|
||||
|
||||
faces = []
|
||||
for i in range(detections.shape[0]):
|
||||
face_dict = {
|
||||
'bbox': detections[i, :4].astype(np.float32),
|
||||
'confidence': float(detections[i, 4]),
|
||||
'landmarks': landmarks[i].astype(np.float32),
|
||||
}
|
||||
faces.append(face_dict)
|
||||
face = Face(
|
||||
bbox=detections[i, :4],
|
||||
confidence=float(detections[i, 4]),
|
||||
landmarks=landmarks[i],
|
||||
)
|
||||
faces.append(face)
|
||||
|
||||
return faces
|
||||
|
||||
@@ -14,14 +14,19 @@ __all__ = ['Face']
|
||||
|
||||
@dataclass
|
||||
class Face:
|
||||
"""Detected face with analysis results."""
|
||||
"""
|
||||
Detected face with analysis results.
|
||||
"""
|
||||
|
||||
# Required attributes
|
||||
bbox: np.ndarray
|
||||
confidence: float
|
||||
landmarks: np.ndarray
|
||||
|
||||
# Optional attributes
|
||||
embedding: Optional[np.ndarray] = None
|
||||
age: Optional[int] = None
|
||||
gender_id: Optional[int] = None # 0: Female, 1: Male
|
||||
gender: Optional[int] = None # 0 or 1
|
||||
|
||||
def compute_similarity(self, other: 'Face') -> float:
|
||||
"""Compute cosine similarity with another face."""
|
||||
@@ -34,18 +39,28 @@ class Face:
|
||||
return asdict(self)
|
||||
|
||||
@property
|
||||
def gender(self) -> str:
|
||||
def sex(self) -> str:
|
||||
"""Get gender as a string label (Female or Male)."""
|
||||
if self.gender_id is None:
|
||||
if self.gender is None:
|
||||
return None
|
||||
return 'Female' if self.gender_id == 0 else 'Male'
|
||||
return 'Female' if self.gender == 0 else 'Male'
|
||||
|
||||
@property
|
||||
def bbox_xyxy(self) -> np.ndarray:
|
||||
"""Get bounding box coordinates in (x1, y1, x2, y2) format."""
|
||||
return self.bbox.copy()
|
||||
|
||||
@property
|
||||
def bbox_xywh(self) -> np.ndarray:
|
||||
"""Get bounding box coordinates in (x1, y1, w, h) format."""
|
||||
return np.array([self.bbox[0], self.bbox[1], self.bbox[2] - self.bbox[0], self.bbox[3] - self.bbox[1]])
|
||||
|
||||
def __repr__(self) -> str:
|
||||
parts = [f'Face(confidence={self.confidence:.3f}']
|
||||
if self.age is not None:
|
||||
parts.append(f'age={self.age}')
|
||||
if self.gender_id is not None:
|
||||
parts.append(f'gender={self.gender}')
|
||||
if self.gender is not None:
|
||||
parts.append(f'sex={self.sex}')
|
||||
if self.embedding is not None:
|
||||
parts.append(f'embedding_dim={self.embedding.shape[0]}')
|
||||
return ', '.join(parts) + ')'
|
||||
|
||||
54
uniface/gaze/__init__.py
Normal file
54
uniface/gaze/__init__.py
Normal file
@@ -0,0 +1,54 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from .base import BaseGazeEstimator
|
||||
from .models import MobileGaze
|
||||
|
||||
|
||||
def create_gaze_estimator(method: str = 'mobilegaze', **kwargs) -> BaseGazeEstimator:
|
||||
"""
|
||||
Factory function to create gaze estimators.
|
||||
|
||||
This function initializes and returns a gaze estimator instance based on the
|
||||
specified method. It acts as a high-level interface to the underlying
|
||||
model classes.
|
||||
|
||||
Args:
|
||||
method (str): The gaze estimation method to use.
|
||||
Options: 'mobilegaze' (default).
|
||||
**kwargs: Model-specific parameters passed to the estimator's constructor.
|
||||
For example, `model_name` can be used to select a specific
|
||||
backbone from `GazeWeights` enum (RESNET18, RESNET34, RESNET50,
|
||||
MOBILENET_V2, MOBILEONE_S0).
|
||||
|
||||
Returns:
|
||||
BaseGazeEstimator: An initialized gaze estimator instance ready for use.
|
||||
|
||||
Raises:
|
||||
ValueError: If the specified `method` is not supported.
|
||||
|
||||
Examples:
|
||||
>>> # Create the default MobileGaze estimator (ResNet18 backbone)
|
||||
>>> estimator = create_gaze_estimator()
|
||||
|
||||
>>> # Create with MobileNetV2 backbone
|
||||
>>> from uniface.constants import GazeWeights
|
||||
>>> estimator = create_gaze_estimator(
|
||||
... 'mobilegaze',
|
||||
... model_name=GazeWeights.MOBILENET_V2
|
||||
... )
|
||||
|
||||
>>> # Use the estimator
|
||||
>>> pitch, yaw = estimator.estimate(face_crop)
|
||||
"""
|
||||
method = method.lower()
|
||||
|
||||
if method in ('mobilegaze', 'mobile_gaze', 'gaze'):
|
||||
return MobileGaze(**kwargs)
|
||||
else:
|
||||
available = ['mobilegaze']
|
||||
raise ValueError(f"Unsupported gaze estimation method: '{method}'. Available: {available}")
|
||||
|
||||
|
||||
__all__ = ['create_gaze_estimator', 'MobileGaze', 'BaseGazeEstimator']
|
||||
108
uniface/gaze/base.py
Normal file
108
uniface/gaze/base.py
Normal file
@@ -0,0 +1,108 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Tuple
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class BaseGazeEstimator(ABC):
|
||||
"""
|
||||
Abstract base class for all gaze estimation models.
|
||||
|
||||
This class defines the common interface that all gaze estimators must implement,
|
||||
ensuring consistency across different gaze estimation methods. Gaze estimation
|
||||
predicts the direction a person is looking based on their face image.
|
||||
|
||||
The gaze direction is represented as pitch and yaw angles in radians:
|
||||
- Pitch: Vertical angle (positive = looking up, negative = looking down)
|
||||
- Yaw: Horizontal angle (positive = looking right, negative = looking left)
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the underlying model for inference.
|
||||
|
||||
This method should handle loading model weights, creating the
|
||||
inference session (e.g., ONNX Runtime), and any necessary
|
||||
setup procedures to prepare the model for prediction.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
|
||||
|
||||
@abstractmethod
|
||||
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Preprocess the input face image for model inference.
|
||||
|
||||
This method should take a raw face crop and convert it into the format
|
||||
expected by the model's inference engine (e.g., normalized tensor).
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format with
|
||||
shape (H, W, C).
|
||||
|
||||
Returns:
|
||||
np.ndarray: The preprocessed image tensor ready for inference,
|
||||
typically with shape (1, C, H, W).
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the preprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def postprocess(self, outputs: Tuple[np.ndarray, np.ndarray]) -> Tuple[float, float]:
|
||||
"""
|
||||
Postprocess raw model outputs into gaze angles.
|
||||
|
||||
This method takes the raw output from the model's inference and
|
||||
converts it into pitch and yaw angles in radians.
|
||||
|
||||
Args:
|
||||
outputs: Raw outputs from the model inference. The format depends
|
||||
on the specific model architecture.
|
||||
|
||||
Returns:
|
||||
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the postprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def estimate(self, face_image: np.ndarray) -> Tuple[float, float]:
|
||||
"""
|
||||
Perform end-to-end gaze estimation on a face image.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the gaze direction.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format.
|
||||
The face should be roughly centered and
|
||||
well-framed within the image.
|
||||
|
||||
Returns:
|
||||
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians:
|
||||
- pitch: Vertical gaze angle (positive = up, negative = down)
|
||||
- yaw: Horizontal gaze angle (positive = right, negative = left)
|
||||
|
||||
Example:
|
||||
>>> estimator = create_gaze_estimator()
|
||||
>>> pitch, yaw = estimator.estimate(face_crop)
|
||||
>>> print(f"Looking: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the estimate method.')
|
||||
|
||||
def __call__(self, face_image: np.ndarray) -> Tuple[float, float]:
|
||||
"""
|
||||
Provides a convenient, callable shortcut for the `estimate` method.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format.
|
||||
|
||||
Returns:
|
||||
Tuple[float, float]: A tuple of (pitch, yaw) angles in radians.
|
||||
"""
|
||||
return self.estimate(face_image)
|
||||
187
uniface/gaze/models.py
Normal file
187
uniface/gaze/models.py
Normal file
@@ -0,0 +1,187 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Tuple
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.constants import GazeWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
|
||||
from .base import BaseGazeEstimator
|
||||
|
||||
__all__ = ['MobileGaze']
|
||||
|
||||
|
||||
class MobileGaze(BaseGazeEstimator):
|
||||
"""
|
||||
MobileGaze: Real-Time Gaze Estimation with ONNX Runtime.
|
||||
|
||||
MobileGaze is a gaze estimation model that predicts gaze direction from a single
|
||||
face image. It supports multiple backbone architectures including ResNet 18/34/50,
|
||||
MobileNetV2, and MobileOne S0. The model uses a classification approach with binned
|
||||
angles, which are then decoded to continuous pitch and yaw values.
|
||||
|
||||
The model outputs gaze direction as pitch (vertical) and yaw (horizontal) angles
|
||||
in radians.
|
||||
|
||||
Reference:
|
||||
https://github.com/yakhyo/gaze-estimation
|
||||
|
||||
Args:
|
||||
model_name (GazeWeights): The enum specifying the gaze model backbone to load.
|
||||
Options: RESNET18, RESNET34, RESNET50, MOBILENET_V2, MOBILEONE_S0.
|
||||
Defaults to `GazeWeights.RESNET18`.
|
||||
input_size (Tuple[int, int]): The resolution (width, height) for the model's
|
||||
input. Defaults to (448, 448).
|
||||
|
||||
Attributes:
|
||||
input_size (Tuple[int, int]): Model input dimensions.
|
||||
input_mean (list): Per-channel mean values for normalization (ImageNet).
|
||||
input_std (list): Per-channel std values for normalization (ImageNet).
|
||||
|
||||
Example:
|
||||
>>> from uniface.gaze import MobileGaze
|
||||
>>> from uniface import RetinaFace
|
||||
>>>
|
||||
>>> detector = RetinaFace()
|
||||
>>> gaze_estimator = MobileGaze()
|
||||
>>>
|
||||
>>> # Detect faces and estimate gaze for each
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face['bbox']
|
||||
... x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
... face_crop = image[y1:y2, x1:x2]
|
||||
... pitch, yaw = gaze_estimator.estimate(face_crop)
|
||||
... print(f"Gaze: pitch={np.degrees(pitch):.1f}°, yaw={np.degrees(yaw):.1f}°")
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: GazeWeights = GazeWeights.RESNET34,
|
||||
input_size: Tuple[int, int] = (448, 448),
|
||||
) -> None:
|
||||
Logger.info(f'Initializing MobileGaze with model={model_name}, input_size={input_size}')
|
||||
|
||||
self.input_size = input_size
|
||||
self.input_mean = [0.485, 0.456, 0.406]
|
||||
self.input_std = [0.229, 0.224, 0.225]
|
||||
|
||||
# Model specific parameters for bin-based classification (Gaze360 config)
|
||||
self._bins = 90
|
||||
self._binwidth = 4
|
||||
self._angle_offset = 180
|
||||
self._idx_tensor = np.arange(self._bins, dtype=np.float32)
|
||||
|
||||
self.model_path = verify_model_weights(model_name)
|
||||
self._initialize_model()
|
||||
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the ONNX model from the stored model path.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
try:
|
||||
self.session = create_onnx_session(self.model_path)
|
||||
|
||||
# Get input configuration
|
||||
input_cfg = self.session.get_inputs()[0]
|
||||
input_shape = input_cfg.shape
|
||||
self.input_name = input_cfg.name
|
||||
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
|
||||
|
||||
# Get output configuration
|
||||
outputs = self.session.get_outputs()
|
||||
self.output_names = [output.name for output in outputs]
|
||||
|
||||
if len(self.output_names) != 2:
|
||||
raise ValueError(f'Expected 2 output nodes (pitch, yaw), got {len(self.output_names)}')
|
||||
|
||||
Logger.info(f'MobileGaze initialized with input size {self.input_size}')
|
||||
|
||||
except Exception as e:
|
||||
Logger.error(f"Failed to load gaze model from '{self.model_path}'", exc_info=True)
|
||||
raise RuntimeError(f'Failed to initialize gaze model: {e}') from e
|
||||
|
||||
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Preprocess a face crop for gaze estimation.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A cropped face image in BGR format.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
|
||||
"""
|
||||
# Convert BGR to RGB
|
||||
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
|
||||
# Resize to model input size
|
||||
image = cv2.resize(image, self.input_size)
|
||||
|
||||
# Normalize to [0, 1] and apply normalization
|
||||
image = image.astype(np.float32) / 255.0
|
||||
mean = np.array(self.input_mean, dtype=np.float32)
|
||||
std = np.array(self.input_std, dtype=np.float32)
|
||||
image = (image - mean) / std
|
||||
|
||||
# HWC -> CHW -> NCHW
|
||||
image = np.transpose(image, (2, 0, 1))
|
||||
image = np.expand_dims(image, axis=0).astype(np.float32)
|
||||
|
||||
return image
|
||||
|
||||
def _softmax(self, x: np.ndarray) -> np.ndarray:
|
||||
"""Apply softmax along axis 1."""
|
||||
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
|
||||
return e_x / e_x.sum(axis=1, keepdims=True)
|
||||
|
||||
def postprocess(self, outputs: Tuple[np.ndarray, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
|
||||
"""
|
||||
Postprocess raw model outputs into gaze angles.
|
||||
|
||||
This method takes the raw output from the model's inference and
|
||||
converts it into pitch and yaw angles in radians.
|
||||
|
||||
Args:
|
||||
outputs: Raw outputs from the model inference. The format depends
|
||||
on the specific model architecture.
|
||||
|
||||
Returns:
|
||||
Tuple[np.ndarray, np.ndarray]: A tuple of (pitch, yaw) angles in radians.
|
||||
"""
|
||||
pitch_logits, yaw_logits = outputs
|
||||
|
||||
# Convert logits to probabilities
|
||||
pitch_probs = self._softmax(pitch_logits)
|
||||
yaw_probs = self._softmax(yaw_logits)
|
||||
|
||||
# Compute expected bin index (soft-argmax)
|
||||
pitch_deg = np.sum(pitch_probs * self._idx_tensor, axis=1) * self._binwidth - self._angle_offset
|
||||
yaw_deg = np.sum(yaw_probs * self._idx_tensor, axis=1) * self._binwidth - self._angle_offset
|
||||
|
||||
# Convert degrees to radians
|
||||
pitch = np.radians(pitch_deg[0])
|
||||
yaw = np.radians(yaw_deg[0])
|
||||
|
||||
return pitch, yaw
|
||||
|
||||
def estimate(self, face_image: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
|
||||
"""
|
||||
Perform end-to-end gaze estimation on a face image.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the gaze direction.
|
||||
"""
|
||||
input_tensor = self.preprocess(face_image)
|
||||
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
|
||||
pitch, yaw = self.postprocess((outputs[0], outputs[1]))
|
||||
|
||||
return pitch, yaw
|
||||
61
uniface/parsing/__init__.py
Normal file
61
uniface/parsing/__init__.py
Normal file
@@ -0,0 +1,61 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Union
|
||||
|
||||
from uniface.constants import ParsingWeights
|
||||
|
||||
from .base import BaseFaceParser
|
||||
from .bisenet import BiSeNet
|
||||
|
||||
__all__ = ['BaseFaceParser', 'BiSeNet', 'create_face_parser']
|
||||
|
||||
|
||||
def create_face_parser(
|
||||
model_name: Union[str, ParsingWeights] = ParsingWeights.RESNET18,
|
||||
) -> BaseFaceParser:
|
||||
"""
|
||||
Factory function to create a face parsing model instance.
|
||||
|
||||
This function provides a convenient way to instantiate face parsing models
|
||||
without directly importing the specific model classes. It supports both
|
||||
string-based and enum-based model selection.
|
||||
|
||||
Args:
|
||||
model_name (Union[str, ParsingWeights]): The face parsing model to create.
|
||||
Can be either a string or a ParsingWeights enum value.
|
||||
Available options:
|
||||
- 'parsing_resnet18' or ParsingWeights.RESNET18 (default)
|
||||
- 'parsing_resnet34' or ParsingWeights.RESNET34
|
||||
|
||||
Returns:
|
||||
BaseFaceParser: An instance of the requested face parsing model.
|
||||
|
||||
Raises:
|
||||
ValueError: If the model_name is not recognized.
|
||||
|
||||
Examples:
|
||||
>>> # Using enum
|
||||
>>> from uniface.parsing import create_face_parser
|
||||
>>> from uniface.constants import ParsingWeights
|
||||
>>> parser = create_face_parser(ParsingWeights.RESNET18)
|
||||
>>>
|
||||
>>> # Using string
|
||||
>>> parser = create_face_parser('parsing_resnet18')
|
||||
>>>
|
||||
>>> # Parse a face image
|
||||
>>> mask = parser.parse(face_crop)
|
||||
"""
|
||||
# Convert string to enum if necessary
|
||||
if isinstance(model_name, str):
|
||||
try:
|
||||
model_name = ParsingWeights(model_name)
|
||||
except ValueError as e:
|
||||
valid_models = [e.value for e in ParsingWeights]
|
||||
raise ValueError(
|
||||
f"Unknown face parsing model: '{model_name}'. Valid options are: {', '.join(valid_models)}"
|
||||
) from e
|
||||
|
||||
# All parsing models use the same BiSeNet class
|
||||
return BiSeNet(model_name=model_name)
|
||||
106
uniface/parsing/base.py
Normal file
106
uniface/parsing/base.py
Normal file
@@ -0,0 +1,106 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Tuple
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class BaseFaceParser(ABC):
|
||||
"""
|
||||
Abstract base class for all face parsing models.
|
||||
|
||||
This class defines the common interface that all face parsing models must implement,
|
||||
ensuring consistency across different parsing methods. Face parsing segments a face
|
||||
image into semantic regions such as skin, eyes, nose, mouth, hair, etc.
|
||||
|
||||
The output is a segmentation mask where each pixel is assigned a class label
|
||||
representing a facial component.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the underlying model for inference.
|
||||
|
||||
This method should handle loading model weights, creating the
|
||||
inference session (e.g., ONNX Runtime), and any necessary
|
||||
setup procedures to prepare the model for prediction.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
|
||||
|
||||
@abstractmethod
|
||||
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Preprocess the input face image for model inference.
|
||||
|
||||
This method should take a raw face crop and convert it into the format
|
||||
expected by the model's inference engine (e.g., normalized tensor).
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A face image in BGR format with
|
||||
shape (H, W, C).
|
||||
|
||||
Returns:
|
||||
np.ndarray: The preprocessed image tensor ready for inference,
|
||||
typically with shape (1, C, H, W).
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the preprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
|
||||
"""
|
||||
Postprocess raw model outputs into a segmentation mask.
|
||||
|
||||
This method takes the raw output from the model's inference and
|
||||
converts it into a segmentation mask at the original image size.
|
||||
|
||||
Args:
|
||||
outputs (np.ndarray): Raw outputs from the model inference.
|
||||
original_size (Tuple[int, int]): Original image size (width, height).
|
||||
|
||||
Returns:
|
||||
np.ndarray: Segmentation mask with the same size as the original image.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the postprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def parse(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Perform end-to-end face parsing on a face image.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the segmentation mask.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A face image in BGR format.
|
||||
The face should be roughly centered and
|
||||
well-framed within the image.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Segmentation mask with the same size as input image,
|
||||
where each pixel value represents a facial component class.
|
||||
|
||||
Example:
|
||||
>>> parser = create_face_parser()
|
||||
>>> mask = parser.parse(face_crop)
|
||||
>>> print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the parse method.')
|
||||
|
||||
def __call__(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Provides a convenient, callable shortcut for the `parse` method.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A face image in BGR format.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Segmentation mask with the same size as input image.
|
||||
"""
|
||||
return self.parse(face_image)
|
||||
166
uniface/parsing/bisenet.py
Normal file
166
uniface/parsing/bisenet.py
Normal file
@@ -0,0 +1,166 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Tuple
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.constants import ParsingWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
|
||||
from .base import BaseFaceParser
|
||||
|
||||
__all__ = ['BiSeNet']
|
||||
|
||||
|
||||
class BiSeNet(BaseFaceParser):
|
||||
"""
|
||||
BiSeNet: Bilateral Segmentation Network for Face Parsing with ONNX Runtime.
|
||||
|
||||
BiSeNet is a semantic segmentation model that segments a face image into
|
||||
different facial components such as skin, eyes, nose, mouth, hair, etc. The model
|
||||
uses a BiSeNet architecture with ResNet backbone and outputs a segmentation mask
|
||||
where each pixel is assigned a class label.
|
||||
|
||||
The model supports 19 facial component classes including:
|
||||
- Background, skin, eyebrows, eyes, nose, mouth, lips, ears, hair, etc.
|
||||
|
||||
Reference:
|
||||
https://github.com/yakhyo/face-parsing
|
||||
|
||||
Args:
|
||||
model_name (ParsingWeights): The enum specifying the parsing model to load.
|
||||
Options: RESNET18, RESNET34.
|
||||
Defaults to `ParsingWeights.RESNET18`.
|
||||
input_size (Tuple[int, int]): The resolution (width, height) for the model's
|
||||
input. Defaults to (512, 512).
|
||||
|
||||
Attributes:
|
||||
input_size (Tuple[int, int]): Model input dimensions.
|
||||
input_mean (np.ndarray): Per-channel mean values for normalization (ImageNet).
|
||||
input_std (np.ndarray): Per-channel std values for normalization (ImageNet).
|
||||
|
||||
Example:
|
||||
>>> from uniface.parsing import BiSeNet
|
||||
>>> from uniface import RetinaFace
|
||||
>>>
|
||||
>>> detector = RetinaFace()
|
||||
>>> parser = BiSeNet()
|
||||
>>>
|
||||
>>> # Detect faces and parse each face
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... bbox = face['bbox']
|
||||
... x1, y1, x2, y2 = map(int, bbox[:4])
|
||||
... face_crop = image[y1:y2, x1:x2]
|
||||
... mask = parser.parse(face_crop)
|
||||
... print(f"Mask shape: {mask.shape}, unique classes: {np.unique(mask)}")
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: ParsingWeights = ParsingWeights.RESNET18,
|
||||
input_size: Tuple[int, int] = (512, 512),
|
||||
) -> None:
|
||||
Logger.info(f'Initializing BiSeNet with model={model_name}, input_size={input_size}')
|
||||
|
||||
self.input_size = input_size
|
||||
self.input_mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
|
||||
self.input_std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
|
||||
|
||||
self.model_path = verify_model_weights(model_name)
|
||||
self._initialize_model()
|
||||
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the ONNX model from the stored model path.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
try:
|
||||
self.session = create_onnx_session(self.model_path)
|
||||
|
||||
# Get input configuration
|
||||
input_cfg = self.session.get_inputs()[0]
|
||||
input_shape = input_cfg.shape
|
||||
self.input_name = input_cfg.name
|
||||
self.input_size = tuple(input_shape[2:4][::-1]) # Update from model
|
||||
|
||||
# Get output configuration
|
||||
outputs = self.session.get_outputs()
|
||||
self.output_names = [output.name for output in outputs]
|
||||
|
||||
Logger.info(f'BiSeNet initialized with input size {self.input_size}')
|
||||
|
||||
except Exception as e:
|
||||
Logger.error(f"Failed to load parsing model from '{self.model_path}'", exc_info=True)
|
||||
raise RuntimeError(f'Failed to initialize parsing model: {e}') from e
|
||||
|
||||
def preprocess(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Preprocess a face image for parsing.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A face image in BGR format.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Preprocessed image tensor with shape (1, 3, H, W).
|
||||
"""
|
||||
# Convert BGR to RGB
|
||||
image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
|
||||
# Resize to model input size
|
||||
image = cv2.resize(image, self.input_size, interpolation=cv2.INTER_LINEAR)
|
||||
|
||||
# Normalize to [0, 1] and apply normalization
|
||||
image = image.astype(np.float32) / 255.0
|
||||
image = (image - self.input_mean) / self.input_std
|
||||
|
||||
# HWC -> CHW -> NCHW
|
||||
image = np.transpose(image, (2, 0, 1))
|
||||
image = np.expand_dims(image, axis=0).astype(np.float32)
|
||||
|
||||
return image
|
||||
|
||||
def postprocess(self, outputs: np.ndarray, original_size: Tuple[int, int]) -> np.ndarray:
|
||||
"""
|
||||
Postprocess model output to segmentation mask.
|
||||
|
||||
Args:
|
||||
outputs (np.ndarray): Raw model output.
|
||||
original_size (Tuple[int, int]): Original image size (width, height).
|
||||
|
||||
Returns:
|
||||
np.ndarray: Segmentation mask resized to original dimensions.
|
||||
"""
|
||||
# Get the class with highest probability for each pixel
|
||||
predicted_mask = outputs.squeeze(0).argmax(0).astype(np.uint8)
|
||||
|
||||
# Resize back to original size
|
||||
restored_mask = cv2.resize(predicted_mask, original_size, interpolation=cv2.INTER_NEAREST)
|
||||
|
||||
return restored_mask
|
||||
|
||||
def parse(self, face_image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Perform end-to-end face parsing on a face image.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the segmentation mask.
|
||||
|
||||
Args:
|
||||
face_image (np.ndarray): A face image in BGR format.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Segmentation mask with the same size as input image.
|
||||
"""
|
||||
original_size = (face_image.shape[1], face_image.shape[0]) # (width, height)
|
||||
input_tensor = self.preprocess(face_image)
|
||||
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
|
||||
|
||||
return self.postprocess(outputs[0], original_size)
|
||||
52
uniface/privacy/__init__.py
Normal file
52
uniface/privacy/__init__.py
Normal file
@@ -0,0 +1,52 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Optional
|
||||
|
||||
import numpy as np
|
||||
|
||||
from .blur import BlurFace
|
||||
|
||||
|
||||
def anonymize_faces(
|
||||
image: np.ndarray,
|
||||
detector: Optional[object] = None,
|
||||
method: str = 'pixelate',
|
||||
blur_strength: float = 3.0,
|
||||
pixel_blocks: int = 10,
|
||||
conf_thresh: float = 0.5,
|
||||
**kwargs,
|
||||
) -> np.ndarray:
|
||||
"""One-line face anonymization with automatic detection.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): Input image (BGR format).
|
||||
detector: Face detector instance. Creates RetinaFace if None.
|
||||
method (str): Blur method name. Defaults to 'pixelate'.
|
||||
blur_strength (float): Blur intensity. Defaults to 3.0.
|
||||
pixel_blocks (int): Block count for pixelate. Defaults to 10.
|
||||
conf_thresh (float): Detection confidence threshold. Defaults to 0.5.
|
||||
**kwargs: Additional detector arguments.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Anonymized image.
|
||||
|
||||
Example:
|
||||
>>> from uniface.privacy import anonymize_faces
|
||||
>>> anonymized = anonymize_faces(image, method='pixelate')
|
||||
"""
|
||||
if detector is None:
|
||||
try:
|
||||
from uniface import RetinaFace
|
||||
|
||||
detector = RetinaFace(conf_thresh=conf_thresh, **kwargs)
|
||||
except ImportError as err:
|
||||
raise ImportError('Could not import RetinaFace. Please ensure UniFace is properly installed.') from err
|
||||
|
||||
faces = detector.detect(image)
|
||||
blurrer = BlurFace(method=method, blur_strength=blur_strength, pixel_blocks=pixel_blocks)
|
||||
return blurrer.anonymize(image, faces)
|
||||
|
||||
|
||||
__all__ = ['BlurFace', 'anonymize_faces']
|
||||
193
uniface/privacy/blur.py
Normal file
193
uniface/privacy/blur.py
Normal file
@@ -0,0 +1,193 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Dict, List, Tuple, Union
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
__all__ = ['BlurFace']
|
||||
|
||||
|
||||
def _gaussian_blur(region: np.ndarray, strength: float = 3.0) -> np.ndarray:
|
||||
"""Apply Gaussian blur to a region."""
|
||||
h, w = region.shape[:2]
|
||||
kernel_size = max(3, int((min(h, w) / 7) * strength)) | 1
|
||||
return cv2.GaussianBlur(region, (kernel_size, kernel_size), 0)
|
||||
|
||||
|
||||
def _median_blur(region: np.ndarray, strength: float = 3.0) -> np.ndarray:
|
||||
"""Apply median blur to a region."""
|
||||
h, w = region.shape[:2]
|
||||
kernel_size = max(3, int((min(h, w) / 7) * strength)) | 1
|
||||
return cv2.medianBlur(region, kernel_size)
|
||||
|
||||
|
||||
def _pixelate_blur(region: np.ndarray, blocks: int = 10) -> np.ndarray:
|
||||
"""Apply pixelation to a region."""
|
||||
h, w = region.shape[:2]
|
||||
temp_h, temp_w = max(1, h // blocks), max(1, w // blocks)
|
||||
temp = cv2.resize(region, (temp_w, temp_h), interpolation=cv2.INTER_LINEAR)
|
||||
return cv2.resize(temp, (w, h), interpolation=cv2.INTER_NEAREST)
|
||||
|
||||
|
||||
def _blackout_blur(region: np.ndarray, color: Tuple[int, int, int] = (0, 0, 0)) -> np.ndarray:
|
||||
"""Replace region with solid color."""
|
||||
return np.full_like(region, color)
|
||||
|
||||
|
||||
class EllipticalBlur:
|
||||
"""Elliptical blur with soft, feathered edges.
|
||||
|
||||
This blur applies Gaussian blur within an elliptical mask that follows
|
||||
the natural oval shape of faces, requiring full image context for proper blending.
|
||||
|
||||
Args:
|
||||
blur_strength (float): Blur intensity multiplier. Defaults to 3.0.
|
||||
margin (int): Extra pixels to extend ellipse beyond bbox. Defaults to 20.
|
||||
"""
|
||||
|
||||
def __init__(self, blur_strength: float = 3.0, margin: int = 20):
|
||||
self.blur_strength = blur_strength
|
||||
self.margin = margin
|
||||
|
||||
def __call__(
|
||||
self,
|
||||
image: np.ndarray,
|
||||
bboxes: List[Union[Tuple, List]],
|
||||
inplace: bool = False,
|
||||
) -> np.ndarray:
|
||||
if not inplace:
|
||||
image = image.copy()
|
||||
|
||||
h, w = image.shape[:2]
|
||||
|
||||
for bbox in bboxes:
|
||||
x1, y1, x2, y2 = map(int, bbox)
|
||||
center_x, center_y = (x1 + x2) // 2, (y1 + y2) // 2
|
||||
axes_x = (x2 - x1) // 2 + self.margin
|
||||
axes_y = (y2 - y1) // 2 + self.margin
|
||||
|
||||
# Create soft elliptical mask
|
||||
mask = np.zeros((h, w), dtype=np.float32)
|
||||
cv2.ellipse(mask, (center_x, center_y), (axes_x, axes_y), 0, 0, 360, 255, -1)
|
||||
mask = cv2.GaussianBlur(mask, (51, 51), 0) / 255.0
|
||||
mask = mask[:, :, np.newaxis]
|
||||
|
||||
kernel_size = max(3, int((min(axes_y, axes_x) * 2 / 7) * self.blur_strength)) | 1
|
||||
blurred = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
|
||||
image = (blurred * mask + image * (1 - mask)).astype(np.uint8)
|
||||
|
||||
return image
|
||||
|
||||
|
||||
class BlurFace:
|
||||
"""Face blurring with multiple anonymization methods.
|
||||
|
||||
Args:
|
||||
method (str): Blur method - 'gaussian', 'pixelate', 'blackout', 'elliptical', or 'median'.
|
||||
Defaults to 'pixelate'.
|
||||
blur_strength (float): Intensity for gaussian/elliptical/median. Defaults to 3.0.
|
||||
pixel_blocks (int): Block count for pixelate. Defaults to 10.
|
||||
color (Tuple[int, int, int]): Fill color (BGR) for blackout. Defaults to (0, 0, 0).
|
||||
margin (int): Edge margin for elliptical. Defaults to 20.
|
||||
|
||||
Example:
|
||||
>>> blurrer = BlurFace(method='pixelate')
|
||||
>>> anonymized = blurrer.anonymize(image, faces)
|
||||
"""
|
||||
|
||||
VALID_METHODS = {'gaussian', 'pixelate', 'blackout', 'elliptical', 'median'}
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
method: str = 'pixelate',
|
||||
blur_strength: float = 3.0,
|
||||
pixel_blocks: int = 15,
|
||||
color: Tuple[int, int, int] = (0, 0, 0),
|
||||
margin: int = 20,
|
||||
):
|
||||
self.method = method.lower()
|
||||
self._blur_strength = blur_strength
|
||||
self._pixel_blocks = pixel_blocks
|
||||
self._color = color
|
||||
self._margin = margin
|
||||
|
||||
if self.method not in self.VALID_METHODS:
|
||||
raise ValueError(f"Invalid blur method: '{method}'. Choose from: {sorted(self.VALID_METHODS)}")
|
||||
|
||||
if self.method == 'elliptical':
|
||||
self._elliptical = EllipticalBlur(blur_strength, margin)
|
||||
|
||||
def _blur_region(self, region: np.ndarray) -> np.ndarray:
|
||||
if self.method == 'gaussian':
|
||||
return _gaussian_blur(region, self._blur_strength)
|
||||
elif self.method == 'median':
|
||||
return _median_blur(region, self._blur_strength)
|
||||
elif self.method == 'pixelate':
|
||||
return _pixelate_blur(region, self._pixel_blocks)
|
||||
elif self.method == 'blackout':
|
||||
return _blackout_blur(region, self._color)
|
||||
|
||||
def anonymize(
|
||||
self,
|
||||
image: np.ndarray,
|
||||
faces: List[Dict],
|
||||
inplace: bool = False,
|
||||
) -> np.ndarray:
|
||||
"""Anonymize faces in an image.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): Input image (BGR format).
|
||||
faces (List[Dict]): Face detections with 'bbox' key containing [x1, y1, x2, y2].
|
||||
inplace (bool): Modify image in-place if True. Defaults to False.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Image with anonymized faces.
|
||||
"""
|
||||
if not faces:
|
||||
return image if inplace else image.copy()
|
||||
|
||||
bboxes = [face['bbox'] for face in faces]
|
||||
return self.blur_regions(image, bboxes, inplace)
|
||||
|
||||
def blur_regions(
|
||||
self,
|
||||
image: np.ndarray,
|
||||
bboxes: List[Union[Tuple, List]],
|
||||
inplace: bool = False,
|
||||
) -> np.ndarray:
|
||||
"""Blur specific rectangular regions in an image.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): Input image (BGR format).
|
||||
bboxes (List): Bounding boxes as [x1, y1, x2, y2].
|
||||
inplace (bool): Modify image in-place if True. Defaults to False.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Image with blurred regions.
|
||||
"""
|
||||
if not bboxes:
|
||||
return image if inplace else image.copy()
|
||||
|
||||
if self.method == 'elliptical':
|
||||
return self._elliptical(image, bboxes, inplace)
|
||||
|
||||
if not inplace:
|
||||
image = image.copy()
|
||||
|
||||
h, w = image.shape[:2]
|
||||
|
||||
for bbox in bboxes:
|
||||
x1, y1, x2, y2 = map(int, bbox)
|
||||
x1, y1 = max(0, x1), max(0, y1)
|
||||
x2, y2 = min(w, x2), min(h, y2)
|
||||
|
||||
if x2 > x1 and y2 > y1:
|
||||
image[y1:y2, x1:x2] = self._blur_region(image[y1:y2, x1:x2])
|
||||
|
||||
return image
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"BlurFace(method='{self.method}')"
|
||||
@@ -55,10 +55,4 @@ def create_recognizer(method: str = 'arcface', **kwargs) -> BaseRecognizer:
|
||||
raise ValueError(f"Unsupported method: '{method}'. Available: {available}")
|
||||
|
||||
|
||||
__all__ = [
|
||||
'create_recognizer',
|
||||
'ArcFace',
|
||||
'MobileFace',
|
||||
'SphereFace',
|
||||
'BaseRecognizer',
|
||||
]
|
||||
__all__ = ['create_recognizer', 'BaseRecognizer', 'ArcFace', 'MobileFace', 'SphereFace']
|
||||
|
||||
64
uniface/spoofing/__init__.py
Normal file
64
uniface/spoofing/__init__.py
Normal file
@@ -0,0 +1,64 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from uniface.constants import MiniFASNetWeights
|
||||
|
||||
from .base import BaseSpoofer
|
||||
from .minifasnet import MiniFASNet
|
||||
|
||||
__all__ = [
|
||||
'BaseSpoofer',
|
||||
'MiniFASNet',
|
||||
'MiniFASNetWeights',
|
||||
'create_spoofer',
|
||||
]
|
||||
|
||||
|
||||
def create_spoofer(
|
||||
model_name: MiniFASNetWeights = MiniFASNetWeights.V2,
|
||||
scale: Optional[float] = None,
|
||||
) -> MiniFASNet:
|
||||
"""
|
||||
Factory function to create a face anti-spoofing model.
|
||||
|
||||
This is a convenience function that creates a MiniFASNet instance
|
||||
with the specified model variant and optional custom scale.
|
||||
|
||||
Args:
|
||||
model_name (MiniFASNetWeights): The model variant to use.
|
||||
Options:
|
||||
- MiniFASNetWeights.V2: Improved version (default), uses scale=2.7
|
||||
- MiniFASNetWeights.V1SE: Squeeze-and-excitation version, uses scale=4.0
|
||||
Defaults to MiniFASNetWeights.V2.
|
||||
scale (Optional[float]): Custom crop scale factor for face region.
|
||||
If None, uses the default scale for the selected model variant.
|
||||
|
||||
Returns:
|
||||
MiniFASNet: An initialized face anti-spoofing model.
|
||||
|
||||
Example:
|
||||
>>> from uniface.spoofing import create_spoofer, MiniFASNetWeights
|
||||
>>> from uniface import RetinaFace
|
||||
>>>
|
||||
>>> # Create with default settings (V2 model)
|
||||
>>> spoofer = create_spoofer()
|
||||
>>>
|
||||
>>> # Create with V1SE model
|
||||
>>> spoofer = create_spoofer(model_name=MiniFASNetWeights.V1SE)
|
||||
>>>
|
||||
>>> # Create with custom scale
|
||||
>>> spoofer = create_spoofer(scale=3.0)
|
||||
>>>
|
||||
>>> # Use with face detector
|
||||
>>> detector = RetinaFace()
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... label_idx, score = spoofer.predict(image, face['bbox'])
|
||||
... # label_idx: 0 = Fake, 1 = Real
|
||||
... label = 'Real' if label_idx == 1 else 'Fake'
|
||||
... print(f'{label}: {score:.2%}')
|
||||
"""
|
||||
return MiniFASNet(model_name=model_name, scale=scale)
|
||||
117
uniface/spoofing/base.py
Normal file
117
uniface/spoofing/base.py
Normal file
@@ -0,0 +1,117 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import List, Tuple, Union
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class BaseSpoofer(ABC):
|
||||
"""
|
||||
Abstract base class for all face anti-spoofing models.
|
||||
|
||||
This class defines the common interface that all anti-spoofing models must implement,
|
||||
ensuring consistency across different spoofing detection methods. Anti-spoofing models
|
||||
detect whether a face is real (live person) or fake (photo, video, mask, etc.).
|
||||
|
||||
The prediction returns a tuple of (label_idx, score):
|
||||
- label_idx: 0 = Fake (spoof), 1 = Real (live)
|
||||
- score: Confidence score for the predicted label (0.0 to 1.0)
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the underlying model for inference.
|
||||
|
||||
This method should handle loading model weights, creating the
|
||||
inference session (e.g., ONNX Runtime), and any necessary
|
||||
setup procedures to prepare the model for prediction.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the _initialize_model method.')
|
||||
|
||||
@abstractmethod
|
||||
def preprocess(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> np.ndarray:
|
||||
"""
|
||||
Preprocess the input image for model inference.
|
||||
|
||||
This method should crop the face region using the bounding box,
|
||||
resize it to the model's expected input size, and normalize
|
||||
the pixel values as required by the model.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): Input image in BGR format with shape (H, W, C).
|
||||
bbox (Union[List, np.ndarray]): Face bounding box in [x1, y1, x2, y2] format.
|
||||
|
||||
Returns:
|
||||
np.ndarray: The preprocessed image tensor ready for inference,
|
||||
typically with shape (1, C, H, W).
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the preprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def postprocess(self, outputs: np.ndarray) -> Tuple[int, float]:
|
||||
"""
|
||||
Postprocess raw model outputs into prediction result.
|
||||
|
||||
This method takes the raw output from the model's inference and
|
||||
converts it into a label index and confidence score.
|
||||
|
||||
Args:
|
||||
outputs (np.ndarray): Raw outputs from the model inference (logits).
|
||||
|
||||
Returns:
|
||||
Tuple[int, float]: A tuple of (label_idx, score) where:
|
||||
- label_idx: 0 = Fake (spoof), 1 = Real (live)
|
||||
- score: Confidence score for the predicted label (0.0 to 1.0)
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the postprocess method.')
|
||||
|
||||
@abstractmethod
|
||||
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, float]:
|
||||
"""
|
||||
Perform end-to-end anti-spoofing prediction on a face.
|
||||
|
||||
This method orchestrates the full pipeline: preprocessing the input,
|
||||
running inference, and postprocessing to return the prediction.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): Input image in BGR format containing the face.
|
||||
bbox (Union[List, np.ndarray]): Face bounding box in [x1, y1, x2, y2] format.
|
||||
This is typically obtained from a face detector.
|
||||
|
||||
Returns:
|
||||
Tuple[int, float]: A tuple of (label_idx, score) where:
|
||||
- label_idx: 0 = Fake (spoof), 1 = Real (live)
|
||||
- score: Confidence score for the predicted label (0.0 to 1.0)
|
||||
|
||||
Example:
|
||||
>>> spoofer = MiniFASNet()
|
||||
>>> detector = RetinaFace()
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... label_idx, score = spoofer.predict(image, face['bbox'])
|
||||
... label = 'Real' if label_idx == 1 else 'Fake'
|
||||
... print(f'{label}: {score:.2%}')
|
||||
"""
|
||||
raise NotImplementedError('Subclasses must implement the predict method.')
|
||||
|
||||
def __call__(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, float]:
|
||||
"""
|
||||
Provides a convenient, callable shortcut for the `predict` method.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): Input image in BGR format.
|
||||
bbox (Union[List, np.ndarray]): Face bounding box in [x1, y1, x2, y2] format.
|
||||
|
||||
Returns:
|
||||
Tuple[int, float]: A tuple of (label_idx, score) where:
|
||||
- label_idx: 0 = Fake (spoof), 1 = Real (live)
|
||||
- score: Confidence score for the predicted label (0.0 to 1.0)
|
||||
"""
|
||||
return self.predict(image, bbox)
|
||||
225
uniface/spoofing/minifasnet.py
Normal file
225
uniface/spoofing/minifasnet.py
Normal file
@@ -0,0 +1,225 @@
|
||||
# Copyright 2025 Yakhyokhuja Valikhujaev
|
||||
# Author: Yakhyokhuja Valikhujaev
|
||||
# GitHub: https://github.com/yakhyo
|
||||
|
||||
from typing import List, Optional, Tuple, Union
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from uniface.constants import MiniFASNetWeights
|
||||
from uniface.log import Logger
|
||||
from uniface.model_store import verify_model_weights
|
||||
from uniface.onnx_utils import create_onnx_session
|
||||
|
||||
from .base import BaseSpoofer
|
||||
|
||||
__all__ = ['MiniFASNet']
|
||||
|
||||
# Default crop scales for each model variant
|
||||
DEFAULT_SCALES = {
|
||||
MiniFASNetWeights.V1SE: 4.0,
|
||||
MiniFASNetWeights.V2: 2.7,
|
||||
}
|
||||
|
||||
|
||||
class MiniFASNet(BaseSpoofer):
|
||||
"""
|
||||
MiniFASNet: Lightweight Face Anti-Spoofing with ONNX Runtime.
|
||||
|
||||
MiniFASNet is a face anti-spoofing model that detects whether a face is real
|
||||
(live person) or fake (photo, video replay, mask, etc.). It supports two model
|
||||
variants: V1SE (with squeeze-and-excitation) and V2 (improved version).
|
||||
|
||||
The model takes a face region cropped from the image using a bounding box
|
||||
and predicts whether it's a real or spoofed face.
|
||||
|
||||
Reference:
|
||||
https://github.com/yakhyo/face-anti-spoofing
|
||||
|
||||
Args:
|
||||
model_name (MiniFASNetWeights): The enum specifying the model variant to load.
|
||||
Options: V1SE (scale=4.0), V2 (scale=2.7).
|
||||
Defaults to `MiniFASNetWeights.V2`.
|
||||
scale (Optional[float]): Custom crop scale factor for face region.
|
||||
If None, uses the default scale for the selected model variant.
|
||||
V1SE uses 4.0, V2 uses 2.7.
|
||||
|
||||
Attributes:
|
||||
scale (float): Crop scale factor for face region extraction.
|
||||
input_size (Tuple[int, int]): Model input dimensions (width, height).
|
||||
|
||||
Example:
|
||||
>>> from uniface.spoofing import MiniFASNet
|
||||
>>> from uniface import RetinaFace
|
||||
>>>
|
||||
>>> detector = RetinaFace()
|
||||
>>> spoofer = MiniFASNet()
|
||||
>>>
|
||||
>>> # Detect faces and check if they are real
|
||||
>>> faces = detector.detect(image)
|
||||
>>> for face in faces:
|
||||
... label_idx, score = spoofer.predict(image, face['bbox'])
|
||||
... # label_idx: 0 = Fake, 1 = Real
|
||||
... label = 'Real' if label_idx == 1 else 'Fake'
|
||||
... print(f'{label}: {score:.2%}')
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: MiniFASNetWeights = MiniFASNetWeights.V2,
|
||||
scale: Optional[float] = None,
|
||||
) -> None:
|
||||
Logger.info(f'Initializing MiniFASNet with model={model_name.name}')
|
||||
|
||||
# Use default scale for the model variant if not specified
|
||||
self.scale = scale if scale is not None else DEFAULT_SCALES.get(model_name, 2.7)
|
||||
|
||||
self.model_path = verify_model_weights(model_name)
|
||||
self._initialize_model()
|
||||
|
||||
def _initialize_model(self) -> None:
|
||||
"""
|
||||
Initialize the ONNX model from the stored model path.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the model fails to load or initialize.
|
||||
"""
|
||||
try:
|
||||
self.session = create_onnx_session(self.model_path)
|
||||
|
||||
# Get input configuration
|
||||
input_cfg = self.session.get_inputs()[0]
|
||||
self.input_name = input_cfg.name
|
||||
# Input shape is (batch, channels, height, width) - we need (width, height)
|
||||
self.input_size = tuple(input_cfg.shape[2:4][::-1]) # (width, height)
|
||||
|
||||
# Get output configuration
|
||||
output_cfg = self.session.get_outputs()[0]
|
||||
self.output_name = output_cfg.name
|
||||
|
||||
Logger.info(f'MiniFASNet initialized with input size {self.input_size}, scale={self.scale}')
|
||||
|
||||
except Exception as e:
|
||||
Logger.error(f"Failed to load MiniFASNet model from '{self.model_path}'", exc_info=True)
|
||||
raise RuntimeError(f'Failed to initialize MiniFASNet model: {e}') from e
|
||||
|
||||
def _xyxy_to_xywh(self, bbox: Union[List, np.ndarray]) -> List[int]:
|
||||
"""Convert bounding box from [x1, y1, x2, y2] to [x, y, w, h] format."""
|
||||
x1, y1, x2, y2 = bbox[:4]
|
||||
return [int(x1), int(y1), int(x2 - x1), int(y2 - y1)]
|
||||
|
||||
def _crop_face(self, image: np.ndarray, bbox_xywh: List[int]) -> np.ndarray:
|
||||
"""
|
||||
Crop and resize face region from image using scale factor.
|
||||
|
||||
The crop is centered on the face bounding box and scaled to capture
|
||||
more context around the face, which is important for anti-spoofing.
|
||||
|
||||
Args:
|
||||
image: Input image in BGR format.
|
||||
bbox_xywh: Face bounding box in [x, y, w, h] format.
|
||||
|
||||
Returns:
|
||||
Cropped and resized face region.
|
||||
"""
|
||||
src_h, src_w = image.shape[:2]
|
||||
x, y, box_w, box_h = bbox_xywh
|
||||
|
||||
# Calculate the scale to apply based on image and face size
|
||||
scale = min((src_h - 1) / box_h, (src_w - 1) / box_w, self.scale)
|
||||
new_w = box_w * scale
|
||||
new_h = box_h * scale
|
||||
|
||||
# Calculate center of the bounding box
|
||||
center_x = x + box_w / 2
|
||||
center_y = y + box_h / 2
|
||||
|
||||
# Calculate new bounding box coordinates
|
||||
x1 = max(0, int(center_x - new_w / 2))
|
||||
y1 = max(0, int(center_y - new_h / 2))
|
||||
x2 = min(src_w - 1, int(center_x + new_w / 2))
|
||||
y2 = min(src_h - 1, int(center_y + new_h / 2))
|
||||
|
||||
# Crop and resize
|
||||
cropped = image[y1 : y2 + 1, x1 : x2 + 1]
|
||||
resized = cv2.resize(cropped, self.input_size)
|
||||
|
||||
return resized
|
||||
|
||||
def preprocess(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> np.ndarray:
|
||||
"""
|
||||
Preprocess the input image for model inference.
|
||||
|
||||
Crops the face region, converts to float32, and arranges
|
||||
dimensions for the model (NCHW format).
|
||||
|
||||
Args:
|
||||
image: Input image in BGR format with shape (H, W, C).
|
||||
bbox: Face bounding box in [x1, y1, x2, y2] format.
|
||||
|
||||
Returns:
|
||||
Preprocessed image tensor with shape (1, C, H, W).
|
||||
"""
|
||||
# Convert bbox format
|
||||
bbox_xywh = self._xyxy_to_xywh(bbox)
|
||||
|
||||
# Crop and resize face region
|
||||
face = self._crop_face(image, bbox_xywh)
|
||||
|
||||
# Convert to float32 (no normalization needed for this model)
|
||||
face = face.astype(np.float32)
|
||||
|
||||
# HWC -> CHW -> NCHW
|
||||
face = np.transpose(face, (2, 0, 1))
|
||||
face = np.expand_dims(face, axis=0)
|
||||
|
||||
return face
|
||||
|
||||
def _softmax(self, x: np.ndarray) -> np.ndarray:
|
||||
"""Apply softmax to logits along axis 1."""
|
||||
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
|
||||
return e_x / e_x.sum(axis=1, keepdims=True)
|
||||
|
||||
def postprocess(self, outputs: np.ndarray) -> Tuple[int, float]:
|
||||
"""
|
||||
Postprocess raw model outputs into prediction result.
|
||||
|
||||
Applies softmax to convert logits to probabilities and
|
||||
returns the predicted label index and confidence score.
|
||||
|
||||
Args:
|
||||
outputs: Raw outputs from the model inference (logits).
|
||||
|
||||
Returns:
|
||||
Tuple[int, float]: A tuple of (label_idx, score) where:
|
||||
- label_idx: 0 = Fake (spoof), 1 = Real (live)
|
||||
- score: Confidence score for the predicted label (0.0 to 1.0)
|
||||
"""
|
||||
probs = self._softmax(outputs)
|
||||
label_idx = int(np.argmax(probs))
|
||||
score = float(probs[0, label_idx])
|
||||
|
||||
return label_idx, score
|
||||
|
||||
def predict(self, image: np.ndarray, bbox: Union[List, np.ndarray]) -> Tuple[int, float]:
|
||||
"""
|
||||
Perform end-to-end anti-spoofing prediction on a face.
|
||||
|
||||
Args:
|
||||
image: Input image in BGR format containing the face.
|
||||
bbox: Face bounding box in [x1, y1, x2, y2] format.
|
||||
|
||||
Returns:
|
||||
Tuple[int, float]: A tuple of (label_idx, score) where:
|
||||
- label_idx: 0 = Fake (spoof), 1 = Real (live)
|
||||
- score: Confidence score for the predicted label (0.0 to 1.0)
|
||||
"""
|
||||
# Preprocess
|
||||
input_tensor = self.preprocess(image, bbox)
|
||||
|
||||
# Run inference
|
||||
outputs = self.session.run([self.output_name], {self.input_name: input_tensor})[0]
|
||||
|
||||
# Postprocess and return
|
||||
return self.postprocess(outputs)
|
||||
@@ -7,6 +7,52 @@ from typing import List, Tuple, Union
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
# Face parsing component names (19 classes)
|
||||
FACE_PARSING_LABELS = [
|
||||
'background',
|
||||
'skin',
|
||||
'l_brow',
|
||||
'r_brow',
|
||||
'l_eye',
|
||||
'r_eye',
|
||||
'eye_g',
|
||||
'l_ear',
|
||||
'r_ear',
|
||||
'ear_r',
|
||||
'nose',
|
||||
'mouth',
|
||||
'u_lip',
|
||||
'l_lip',
|
||||
'neck',
|
||||
'neck_l',
|
||||
'cloth',
|
||||
'hair',
|
||||
'hat',
|
||||
]
|
||||
|
||||
# Color palette for face parsing visualization
|
||||
FACE_PARSING_COLORS = [
|
||||
[0, 0, 0],
|
||||
[255, 85, 0],
|
||||
[255, 170, 0],
|
||||
[255, 0, 85],
|
||||
[255, 0, 170],
|
||||
[0, 255, 0],
|
||||
[85, 255, 0],
|
||||
[170, 255, 0],
|
||||
[0, 255, 85],
|
||||
[0, 255, 170],
|
||||
[0, 0, 255],
|
||||
[85, 0, 255],
|
||||
[170, 0, 255],
|
||||
[0, 85, 255],
|
||||
[0, 170, 255],
|
||||
[255, 255, 0],
|
||||
[255, 255, 85],
|
||||
[255, 255, 170],
|
||||
[255, 0, 255],
|
||||
]
|
||||
|
||||
|
||||
def draw_detections(
|
||||
*,
|
||||
@@ -126,3 +172,159 @@ def draw_fancy_bbox(
|
||||
# Bottom-right corner
|
||||
cv2.line(image, (x2, y2), (x2, y2 - corner_length), color, thickness)
|
||||
cv2.line(image, (x2, y2), (x2 - corner_length, y2), color, thickness)
|
||||
|
||||
|
||||
def draw_gaze(
|
||||
image: np.ndarray,
|
||||
bbox: np.ndarray,
|
||||
pitch: np.ndarray,
|
||||
yaw: np.ndarray,
|
||||
*,
|
||||
draw_bbox: bool = True,
|
||||
fancy_bbox: bool = True,
|
||||
draw_angles: bool = True,
|
||||
):
|
||||
"""
|
||||
Draws gaze direction with optional bounding box on an image.
|
||||
|
||||
Args:
|
||||
image: Input image to draw on (modified in-place).
|
||||
bbox: Face bounding box [x1, y1, x2, y2].
|
||||
pitch: Vertical gaze angle in radians.
|
||||
yaw: Horizontal gaze angle in radians.
|
||||
draw_bbox: Whether to draw the bounding box. Defaults to True.
|
||||
fancy_bbox: Use fancy corner-style bbox. Defaults to True.
|
||||
draw_angles: Whether to display pitch/yaw values as text. Defaults to False.
|
||||
"""
|
||||
x_min, y_min, x_max, y_max = map(int, bbox[:4])
|
||||
|
||||
# Calculate dynamic line thickness based on image size (same as draw_detections)
|
||||
line_thickness = max(round(sum(image.shape[:2]) / 2 * 0.003), 2)
|
||||
|
||||
# Calculate dynamic font scale based on bbox height (same as draw_detections)
|
||||
bbox_h = y_max - y_min
|
||||
font_scale = max(0.4, min(0.7, bbox_h / 200))
|
||||
font_thickness = 2
|
||||
|
||||
# Draw bounding box if requested
|
||||
if draw_bbox:
|
||||
if fancy_bbox:
|
||||
draw_fancy_bbox(image, bbox, color=(0, 255, 0), thickness=line_thickness)
|
||||
else:
|
||||
cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), line_thickness)
|
||||
|
||||
# Calculate center of the bounding box
|
||||
x_center = (x_min + x_max) // 2
|
||||
y_center = (y_min + y_max) // 2
|
||||
|
||||
# Calculate the direction of the gaze
|
||||
length = x_max - x_min
|
||||
dx = int(-length * np.sin(pitch) * np.cos(yaw))
|
||||
dy = int(-length * np.sin(yaw))
|
||||
|
||||
point1 = (x_center, y_center)
|
||||
point2 = (x_center + dx, y_center + dy)
|
||||
|
||||
# Calculate dynamic center point radius based on line thickness
|
||||
center_radius = max(line_thickness + 1, 4)
|
||||
|
||||
# Draw gaze direction
|
||||
cv2.circle(image, (x_center, y_center), radius=center_radius, color=(0, 0, 255), thickness=-1)
|
||||
cv2.arrowedLine(
|
||||
image,
|
||||
point1,
|
||||
point2,
|
||||
color=(0, 0, 255),
|
||||
thickness=line_thickness,
|
||||
line_type=cv2.LINE_AA,
|
||||
tipLength=0.25,
|
||||
)
|
||||
|
||||
# Draw angle values
|
||||
if draw_angles:
|
||||
text = f'P:{np.degrees(pitch):.0f}deg Y:{np.degrees(yaw):.0f}deg'
|
||||
(text_width, text_height), baseline = cv2.getTextSize(
|
||||
text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, font_thickness
|
||||
)
|
||||
|
||||
# Draw background rectangle for text
|
||||
cv2.rectangle(
|
||||
image,
|
||||
(x_min, y_min - text_height - baseline - 10),
|
||||
(x_min + text_width + 10, y_min),
|
||||
(0, 0, 255),
|
||||
-1,
|
||||
)
|
||||
|
||||
# Draw text
|
||||
cv2.putText(
|
||||
image,
|
||||
text,
|
||||
(x_min + 5, y_min - 5),
|
||||
cv2.FONT_HERSHEY_SIMPLEX,
|
||||
font_scale,
|
||||
(255, 255, 255),
|
||||
font_thickness,
|
||||
)
|
||||
|
||||
|
||||
def vis_parsing_maps(
|
||||
image: np.ndarray,
|
||||
segmentation_mask: np.ndarray,
|
||||
*,
|
||||
save_image: bool = False,
|
||||
save_path: str = 'result.png',
|
||||
) -> np.ndarray:
|
||||
"""
|
||||
Visualizes face parsing segmentation mask by overlaying colored regions on the image.
|
||||
|
||||
Args:
|
||||
image: Input face image in RGB format with shape (H, W, 3).
|
||||
segmentation_mask: Segmentation mask with shape (H, W) where each pixel
|
||||
value represents a facial component class (0-18).
|
||||
save_image: Whether to save the visualization to disk. Defaults to False.
|
||||
save_path: Path to save the visualization if save_image is True.
|
||||
|
||||
Returns:
|
||||
np.ndarray: Blended image with segmentation overlay in BGR format.
|
||||
|
||||
Example:
|
||||
>>> import cv2
|
||||
>>> from uniface.parsing import BiSeNet
|
||||
>>> from uniface.visualization import vis_parsing_maps
|
||||
>>>
|
||||
>>> parser = BiSeNet()
|
||||
>>> face_image = cv2.imread('face.jpg')
|
||||
>>> mask = parser.parse(face_image)
|
||||
>>>
|
||||
>>> # Visualize
|
||||
>>> face_rgb = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
|
||||
>>> result = vis_parsing_maps(face_rgb, mask)
|
||||
>>> cv2.imwrite('parsed_face.jpg', result)
|
||||
"""
|
||||
# Create numpy arrays for image and segmentation mask
|
||||
image = np.array(image).copy().astype(np.uint8)
|
||||
segmentation_mask = segmentation_mask.copy().astype(np.uint8)
|
||||
|
||||
# Create a color mask
|
||||
segmentation_mask_color = np.zeros((segmentation_mask.shape[0], segmentation_mask.shape[1], 3))
|
||||
|
||||
num_classes = np.max(segmentation_mask)
|
||||
|
||||
for class_index in range(1, num_classes + 1):
|
||||
class_pixels = np.where(segmentation_mask == class_index)
|
||||
segmentation_mask_color[class_pixels[0], class_pixels[1], :] = FACE_PARSING_COLORS[class_index]
|
||||
|
||||
segmentation_mask_color = segmentation_mask_color.astype(np.uint8)
|
||||
|
||||
# Convert image to BGR format for blending
|
||||
bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
|
||||
|
||||
# Blend the image with the segmentation mask
|
||||
blended_image = cv2.addWeighted(bgr_image, 0.6, segmentation_mask_color, 0.4, 0)
|
||||
|
||||
# Save the result if required
|
||||
if save_image:
|
||||
cv2.imwrite(save_path, blended_image, [int(cv2.IMWRITE_JPEG_QUALITY), 100])
|
||||
|
||||
return blended_image
|
||||
|
||||
Reference in New Issue
Block a user