Files
insightface/examples/edge_inference/README.md

34 lines
2.0 KiB
Markdown
Raw Normal View History

2021-08-11 11:16:01 +08:00
# InsightFace Edge Inference and Deployment
2021-08-11 11:15:00 +08:00
2021-08-11 11:16:01 +08:00
In this tutorial, we give examples and benchmarks of running insightface models on edge devices, mainly using 8-bits quantization technologies to make acceleration.
2021-08-11 11:15:00 +08:00
2021-08-11 11:16:01 +08:00
## Recognition
2021-08-11 11:15:00 +08:00
2021-08-11 11:16:01 +08:00
In recognition tutorial, we use an open-source model: *IR50@Glint360K*, and use a hard private 1:N testset(N=50000). The metric contains Rank1 and TAR@FAR<=e-3.
2021-08-11 11:15:00 +08:00
2021-08-11 11:16:01 +08:00
Granularity and symmetry both stand for quantization setting, and mostly defined by hardware providers. Symmetric uses INT8 to save quantization results while Asymmetric uses UINT8 type.
2021-08-11 11:15:00 +08:00
2021-08-11 11:16:01 +08:00
| Hardware | Provider | Type | Backend | Time | Granularity | Symmetry | Rank1-Acc | TAR@FAR<=e-3 |
| ----------- | -------- | ---- | ----------- | ---- | ----------- | ---------- | --------- | ------------ |
2021-08-11 11:16:33 +08:00
| V100 | NVIDIA | GPU | onnxruntime | 4ms | - | - | 80.94 | 30.77 |
2021-08-11 11:16:01 +08:00
| Jetson NX | NVIDIA | GPU | TensorRT | 16ms | Per-channel | Symmetric | 79.26 | 31.07 |
| A311D | Khadas | ASIC | Tengine | 26ms | Per-tensor | Asymmetric | 77.83 | 26.58 |
| A311D* | Khadas | ASIC | Tengine | 26ms | Per-tensor | Asymmetric | 79.38 | 28.59 |
2021-08-11 11:18:24 +08:00
| NXP-IMX8P | NXP | ASIC | Tengine | 24ms | Per-tensor | Asymmetric | 77.87 | 26.80 |
2021-08-11 11:16:01 +08:00
| NXP-IMX8P* | NXP | ASIC | Tengine | 24ms | Per-tensor | Asymmetric | 79.42 | 28.39 |
| RV1126 | Rockchip | ASIC | RKNN | 38ms | Per-tensor | Asymmetric | 75.60 | 24.23 |
| RV1126* | Rockchip | ASIC | RKNN | 38ms | Per-tensor | Asymmetric | 77.82 | 26.30 |
2021-08-11 11:15:00 +08:00
2021-08-11 11:22:46 +08:00
Suffix-* means mixed mode: using float32 model for gallery while using quantized model for probe images. Result features are all in float32 type.
2021-08-11 11:16:01 +08:00
The example code of running quantized networks can be now found at [Tengine](https://github.com/OAID/Tengine/tree/tengine-lite/demos). Later, we will put a copy here and give full tutorial on how to quantize recognition models from 0 to 1.
## Detection
TODO
2021-08-11 11:15:00 +08:00