Files
insightface/examples/edge_inference
2021-08-11 11:22:46 +08:00
..
2021-08-11 11:22:46 +08:00

InsightFace Edge Inference and Deployment

In this tutorial, we give examples and benchmarks of running insightface models on edge devices, mainly using 8-bits quantization technologies to make acceleration.

Recognition

In recognition tutorial, we use an open-source model: IR50@Glint360K, and use a hard private 1:N testset(N=50000). The metric contains Rank1 and TAR@FAR<=e-3.

Granularity and symmetry both stand for quantization setting, and mostly defined by hardware providers. Symmetric uses INT8 to save quantization results while Asymmetric uses UINT8 type.

Hardware Provider Type Backend Time Granularity Symmetry Rank1-Acc TAR@FAR<=e-3
V100 NVIDIA GPU onnxruntime 4ms - - 80.94 30.77
Jetson NX NVIDIA GPU TensorRT 16ms Per-channel Symmetric 79.26 31.07
A311D Khadas ASIC Tengine 26ms Per-tensor Asymmetric 77.83 26.58
A311D* Khadas ASIC Tengine 26ms Per-tensor Asymmetric 79.38 28.59
NXP-IMX8P NXP ASIC Tengine 24ms Per-tensor Asymmetric 77.87 26.80
NXP-IMX8P* NXP ASIC Tengine 24ms Per-tensor Asymmetric 79.42 28.39
RV1126 Rockchip ASIC RKNN 38ms Per-tensor Asymmetric 75.60 24.23
RV1126* Rockchip ASIC RKNN 38ms Per-tensor Asymmetric 77.82 26.30

Suffix-* means mixed mode: using float32 model for gallery while using quantized model for probe images. Result features are all in float32 type.

The example code of running quantized networks can be now found at Tengine. Later, we will put a copy here and give full tutorial on how to quantize recognition models from 0 to 1.

Detection

TODO