polish paddle-arcface

2025-12-30 08:02:27 +00:00 · 2021-07-13 07:25:33 +00:00
parent 785f105e19
commit e3dbe007ee
34 changed files with 4464 additions and 0 deletions
--- a/detection/blazeface_paddle/README.md
+++ b/detection/blazeface_paddle/README.md
@@ -0,0 +1 @@
+README_en.md
--- a/detection/blazeface_paddle/README_ch.md
+++ b/detection/blazeface_paddle/README_ch.md
@@ -0,0 +1,356 @@
+简体中文 | [English](README_en.md)
+
+# 人脸检测模型
+
+* [1. 简介](#简介)
+* [2. 模型库](#模型库)
+* [3. 安装](#安装)
+* [4. 数据准备](#数据准备)
+* [5. 参数配置](#参数配置)
+* [6. 训练与评估](#训练与评估)
+  * [6.1 训练](#训练)
+  * [6.2 在WIDER-FACE数据集上评估](#评估)
+  * [6.3 推理部署](#推理部署)
+  * [6.4 推理速度提升](#推理速度提升)
+  * [6.5 人脸检测demo](#人脸检测demo)
+* [7. 参考文献](#参考文献)
+
+<a name="简介"></a>
+
+## 1. 简介
+
+`Arcface-Paddle`是基于PaddlePaddle实现的，开源深度人脸检测、识别工具。`Arcface-Paddle`目前提供了三个预训练模型，包括用于人脸检测的 `BlazeFace`、用于人脸识别的 `ArcFace` 和 `MobileFace`。
+
+- 本部分内容为人脸检测部分，基于PaddleDetection进行开发。
+- 人脸识别相关内容可以参考：[人脸识别](../../recognition/arcface_paddle/README_ch.md)。
+- 基于PaddleInference的Whl包预测部署内容可以参考：[Whl包预测部署](https://github.com/littletomatodonkey/insight-face-paddle)。
+
+
+<a name="模型库"></a>
+
+## 2. 模型库
+
+### WIDER-FACE数据集上的mAP
+
+| 网络结构 | 输入尺寸 | 图片个数/GPU | epoch数量 | Easy/Medium/Hard Set  | CPU预测时延 | GPU 预测时延 | 模型大小(MB) | 预训练模型地址 | inference模型地址 |  配置文件 |
+|:------------:|:--------:|:----:|:-------:|:-------:|:-------:|:---------:|:----------:|:---------:|:---------:|:--------:|
+| BlazeFace-FPN-SSH  | 640  |    8    | 1000    | 0.9187 / 0.8979 / 0.8168 | 31.7ms  |  5.6ms | 0.646 |[下载链接](https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams) | [下载链接](https://paddle-model-ecology.bj.bcebos.com/model/insight-face/blazeface_fpn_ssh_1000e_v1.0_infer.tar) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/face_detection/blazeface_fpn_ssh_1000e.yml) |
+
+
+**注意:**  
+- 我们使用多尺度评估策略得到`Easy/Medium/Hard Set`里的mAP。具体细节请参考[在WIDER-FACE数据集上评估](#评估)。
+- 测量速度时我们使用640*640的分辨，在 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz cpu，CPU线程数设置为5，更多细节请参考[推理速度提升](#推理速度提升)。
+- 测试环境为
+  - CPU: Intel(R) Xeon(R) Gold 6184 CPU @ 2.40GHz
+  - GPU: a single NVIDIA Tesla V100
+
+
+<a name="安装"></a>
+
+## 3. 安装
+
+请参考[安装教程](../../recognition/arcface_paddle/install_ch.md)安装PaddlePaddle以及PaddleDetection。
+
+<a name="数据准备"></a>
+
+## 4. 数据准备
+我们使用[WIDER-FACE数据集](http://shuoyang1213.me/WIDERFACE/)进行训练和模型测试，官方网站提供了详细的数据介绍。
+- WIDER-Face数据源:  
+使用如下目录结构加载`wider_face`类型的数据集：
+
+  ```
+  dataset/wider_face/
+  ├── wider_face_split
+  │   ├── wider_face_train_bbx_gt.txt
+  │   ├── wider_face_val_bbx_gt.txt
+  ├── WIDER_train
+  │   ├── images
+  │   │   ├── 0--Parade
+  │   │   │   ├── 0_Parade_marchingband_1_100.jpg
+  │   │   │   ├── 0_Parade_marchingband_1_381.jpg
+  │   │   │   │   ...
+  │   │   ├── 10--People_Marching
+  │   │   │   ...
+  ├── WIDER_val
+  │   ├── images
+  │   │   ├── 0--Parade
+  │   │   │   ├── 0_Parade_marchingband_1_1004.jpg
+  │   │   │   ├── 0_Parade_marchingband_1_1045.jpg
+  │   │   │   │   ...
+  │   │   ├── 10--People_Marching
+  │   │   │   ...
+  ```
+
+- 手动下载数据集：
+要下载WIDER-FACE数据集，请运行以下命令：
+```
+cd dataset/wider_face && ./download_wider_face.sh
+```
+
+<a name="参数配置"></a>
+
+## 5. 参数配置
+
+我们使用 `configs/face_detection/blazeface_fpn_ssh_1000e.yml`配置进行训练，配置文件摘要如下：
+
+```yaml
+
+_BASE_: [
+  '../datasets/wider_face.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1000e.yml',
+  '_base_/blazeface_fpn.yml',
+  '_base_/face_reader.yml',
+]
+weights: output/blazeface_fpn_ssh_1000e/model_final
+multi_scale_eval: True
+
+```
+
+`blazeface_fpn_ssh_1000e.yml` 配置需要依赖其他的配置文件，在该例子中需要依赖:
+
+```
+wider_face.yml：主要说明了训练数据和验证数据的路径
+
+runtime.yml：主要说明了公共的运行参数，比如是否使用GPU、每多少个epoch存储checkpoint等
+
+optimizer_1000e.yml：主要说明了学习率和优化器的配置
+
+blazeface_fpn.yml：主要说明模型和主干网络的情况
+
+face_reader.yml：主要说明数据读取器配置，如batch size，并发加载子进程数等，同时包含读取后预处理操作，如resize、数据增强等等
+```
+
+根据实际情况，修改上述文件，比如数据集路径、batch size等。
+
+基础模型的配置可以参考`configs/face_detection/_base_/blazeface.yml`；
+改进模型增加FPN和SSH的neck结构，配置文件可以参考`configs/face_detection/_base_/blazeface_fpn.yml`，可以根据需求配置FPN和SSH，具体如下：
+```yaml
+BlazeNet:
+   blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]]
+   double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96],
+                           [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]]
+   act: hard_swish # 配置backbone中BlazeBlock的激活函数，基础模型为relu，增加FPN和SSH时需使用hard_swish
+
+BlazeNeck:
+   neck_type : fpn_ssh # 可选only_fpn、only_ssh和fpn_ssh
+   in_channel: [96,96]
+```
+
+<a name="训练与评估"></a>
+
+## 6. 训练与评估
+
+<a name="训练"></a>
+
+### 6.1 训练
+首先，下载预训练模型文件：
+```bash
+wget https://paddledet.bj.bcebos.com/models/pretrained/blazenet_pretrain.pdparams
+```
+PaddleDetection提供了单卡/多卡训练模式，满足用户多种训练需求
+* GPU单卡训练
+```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml -o pretrain_weight=blazenet_pretrain
+```
+
+* GPU多卡训练
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3 #windows和Mac下不需要执行该命令
+python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml -o pretrain_weight=blazenet_pretrain
+```
+* 模型恢复训练
+
+  在日常训练过程中，有的用户由于一些原因导致训练中断，用户可以使用-r的命令恢复训练
+
+```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml -r output/blazeface_fan_ssh_1000e/100
+ ```
+* 训练策略
+
+`BlazeFace`训练是以每卡`batch_size=32`在4卡GPU上进行训练(总`batch_size`是128),学习率为0.002，并且训练1000epoch。
+
+
+**注意:** 人脸检测模型目前不支持边训练边评估。
+
+<a name="评估"></a>
+
+### 6.2 在WIDER-FACE数据集上评估
+- 步骤一：评估并生成结果文件：
+```shell
+python -u tools/eval.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml \
+       -o weights=output/blazeface_fpn_ssh_1000e/model_final \
+       multi_scale_eval=True BBoxPostProcess.nms.score_threshold=0.1
+```
+设置`multi_scale_eval=True`进行多尺度评估，评估完成后，将在`output/pred`中生成txt格式的测试结果。
+
+- 步骤二：下载官方评估脚本和Ground Truth文件：
+```
+wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
+unzip eval_tools.zip && rm -f eval_tools.zip
+```
+
+- 步骤三：开始评估
+
+方法一：python评估。
+
+```bash
+git clone https://github.com/wondervictor/WiderFace-Evaluation.git
+cd WiderFace-Evaluation
+# 编译
+python3 setup.py build_ext --inplace
+# 开始评估
+python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth
+```
+
+方法二：MatLab评估。
+
+```bash
+# 在`eval_tools/wider_eval.m`中修改保存结果路径和绘制曲线的名称：
+pred_dir = './pred';  
+legend_name = 'Paddle-BlazeFace';
+
+`wider_eval.m` 是评估模块的主要执行程序。运行命令如下：
+matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
+```
+<a name="推理部署"></a>
+
+### 6.3 推理部署
+
+在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程，在实际的工业部署则不需要反向传播，因此需要将模型进行导成部署需要的模型格式。
+在PaddleDetection中提供了 `tools/export_model.py`脚本来导出模型
+
+```bash
+python tools/export_model.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml --output_dir=./inference_model \
+ -o weights=output/blazeface_fpn_ssh_1000e/best_model BBoxPostProcess.nms.score_threshold=0.1
+```
+
+预测模型会导出到`inference_model/blazeface_fpn_ssh_1000e`目录下，分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`,`model.pdmodel` 如果不指定文件夹，模型则会导出在`output_inference`
+
+* 这里将nms后处理`score_threshold`修改为0.1，因为mAP基本没有影响的情况下，GPU预测速度能够大幅提升。更多关于模型导出的文档，请参考[模型导出文档](https://github.com/PaddlePaddle/PaddleDetection/deploy/EXPORT_MODEL.md)
+
+ PaddleDetection提供了PaddleInference、PaddleServing、PaddleLite多种部署形式，支持服务端、移动端、嵌入式等多种平台，提供了完善的Python和C++部署方案。
+* 在这里，我们以Python为例，说明如何使用PaddleInference进行模型部署
+
+```bash
+python deploy/python/infer.py --model_dir=./inference_model/blazeface_fpn_ssh_1000e --image_file=demo/road554.png --use_gpu=True
+```
+* 同时`infer.py`提供了丰富的接口，用户进行接入视频文件、摄像头进行预测，更多内容请参考[Python端预测部署](https://github.com/PaddlePaddle/PaddleDetection/deploy/python.md)
+
+* 更多关于预测部署的文档，请参考[预测部署文档](https://github.com/PaddlePaddle/PaddleDetection/deploy/README.md) 。
+
+<a name="推理速度提升"></a>
+
+### 6.4 推理速度提升
+如果想要复现我们提供的速度指标，请修改预测模型配置文件`./inference_model/blazeface_fpn_ssh_1000e/infer_cfg.yml`中的输入尺寸，如下所示:
+```yaml
+mode: fluid
+draw_threshold: 0.5
+metric: WiderFace
+arch: Face
+min_subgraph_size: 3
+Preprocess:
+- is_scale: false
+  mean:
+  - 123
+  - 117
+  - 104
+  std:
+  - 127.502231
+  - 127.502231
+  - 127.502231
+  type: NormalizeImage
+- interp: 1
+  keep_ratio: false
+  target_size:
+  - 640
+  - 640
+  type: Resize
+- type: Permute
+label_list:
+- face
+```
+如果希望模型在cpu环境下更快推理，可安装[paddlepaddle_gpu-0.0.0](https://paddle-wheel.bj.bcebos.com/develop-cpu-mkl/paddlepaddle-0.0.0-cp37-cp37m-linux_x86_64.whl) （mkldnn的依赖）可开启mkldnn加速推理。
+
+```bash
+# 使用GPU测速：
+python deploy/python/infer.py --model_dir=./inference_model/blazeface_fpn_ssh_1000e --image_dir=./path/images --run_benchmark=True --use_gpu=True
+
+# 使用cpu测速：
+# 下载paddle whl包
+wget https://paddle-wheel.bj.bcebos.com/develop-cpu-mkl/paddlepaddle-0.0.0-cp37-cp37m-linux_x86_64.whl
+# 安装paddlepaddle_gpu-0.0.0
+pip install paddlepaddle-0.0.0-cp37-cp37m-linux_x86_64.whl
+# 推理
+python deploy/python/infer.py --model_dir=./inference_model/blazeface_fpn_ssh_1000e --image_dir=./path/images --enable_mkldnn=True --run_benchmark=True --cpu_threads=5
+```
+
+<a name="人脸检测demo"></a>
+
+### 6.5 人脸检测demo
+
+本节介绍基于提供的BlazeFace模型进行人脸检测。
+
+先下载待检测图像与字体文件。
+
+```bash
+# 下载用于人脸检测的示例图像
+wget https://raw.githubusercontent.com/littletomatodonkey/insight-face-paddle/main/demo/friends/query/friends1.jpg
+# 下载字体，用于可视化
+wget https://raw.githubusercontent.com/littletomatodonkey/insight-face-paddle/main/SourceHanSansCN-Medium.otf
+```
+
+示例图像如下所示。
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/littletomatodonkey/
+insight-face-paddle/main/demo/friends/query/friends1.jpg"  width = "800" />
+</div>
+
+
+检测的示例命令如下。
+
+```shell
+# 同时使用检测+识别
+python3.7 test_blazeface.py --input=friends1.jpg  --output="./output"
+```
+
+最终可视化结果保存在`output`目录下，可视化结果如下所示。
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/littletomatodonkey/
+insight-face-paddle/main/demo/friends/output/friends1.jpg"  width = "800" />
+</div>
+
+
+更多关于参数解释，索引库构建、人脸识别、whl包预测部署的内容可以参考：[Whl包预测部署](https://github.com/littletomatodonkey/insight-face-paddle)。
+
+<a name="参考文献"></a>
+
+## 7. 参考文献
+
+```
+@misc{long2020ppyolo,
+title={PP-YOLO: An Effective and Efficient Implementation of Object Detector},
+author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen},
+year={2020},
+eprint={2007.12099},
+archivePrefix={arXiv},
+primaryClass={cs.CV}
+}
+@misc{ppdet2019,
+title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
+author={PaddlePaddle Authors},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
+year={2019}
+}
+@article{bazarevsky2019blazeface,
+title={BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs},
+author={Valentin Bazarevsky and Yury Kartynnik and Andrey Vakunov and Karthik Raveendran and Matthias Grundmann},
+year={2019},
+eprint={1907.05047},
+ archivePrefix={arXiv}
+}
+```
--- a/detection/blazeface_paddle/README_en.md
+++ b/detection/blazeface_paddle/README_en.md
@@ -0,0 +1,312 @@
+[简体中文](README_ch.md) | English
+
+# FaceDetection
+
+* [1. Introduction](#Introduction)
+* [2. Model Zoo](#Model_Zoo)
+* [3. Installation](#Installation)
+* [4. Data Pipline](#Data_Pipline)
+* [5. Configuration File](#Configuration_File)
+* [6. Training and Inference](#Training_and_Inference)
+  * [6.1 Training](#Training)
+  * [6.2 Evaluate on the WIDER FACE](#Evaluation)
+  * [6.3 Inference deployment](#Inference_deployment)
+  * [6.4 Improvement of inference speed](#Increase_in_inference_speed)
+* [7. Citations](#Citations)
+
+<a name="Introduction"></a>
+
+## 1. Introduction
+
+`Arcface-Paddle` is an open source deep face detection and recognition toolkit, powered by PaddlePaddle. `Arcface-Paddle` provides three related pretrained models now, include `BlazeFace` for face detection, `ArcFace` and `MobileFace` for face recognition.
+
+- This tutorial is mainly about face detection based on `PaddleDetection`.
+- For face recognition task, please refer to: [Face recognition tuturial](../../recognition/arcface_paddle/README_ch.md).
+- For Whl package inference using PaddleInference, please refer to [whl package inference](https://github.com/littletomatodonkey/insight-face-paddle).
+
+<a name="Model_Zoo"></a>
+
+## 2. Model Zoo
+
+### mAP in WIDER FACE
+
+| Model | input size | images/GPU | epochs | Easy/Medium/Hard Set  | CPU time cost | GPU time cost| Model Size(MB) | Pretrained model | Inference model | Config |
+|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:---------:|:----------:|:---------:|:--------:|:--------:|
+| BlazeFace-FPN-SSH  | 640×640  |    8    | 1000     | 0.9187 / 0.8979 / 0.8168 | 31.7ms  |  5.6ms | 0.646 |[download link](https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams)  |  [download link](https://paddle-model-ecology.bj.bcebos.com/model/insight-face/blazeface_fpn_ssh_1000e_v1.0_infer.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/face_detection/blazeface_fpn_ssh_1000e.yml) |
+
+**NOTE:**  
+- Get mAP in `Easy/Medium/Hard Set` by multi-scale evaluation. For details can refer to [Evaluation](#Evaluate-on-the-WIDER-FACE).
+- Measuring the speed, we use the resolution of `640×640`, in Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz environment, cpu-threads are set as 5. For details can refer to [Improvement of inference speed](#Increase_in_inference_speed).
+- The benchmark environment is
+  - CPU: Intel(R) Xeon(R) Gold 6184 CPU @ 2.40GHz
+  - GPU: a single NVIDIA Tesla V100
+
+<a name="Installation"></a>
+
+## 3. Installation
+
+Please refer to [installation tutorial](../install_en.md) to install PaddlePaddle and PaddleDetection.
+
+
+<a name="Data_Pipline"></a>
+
+## 4. Data Pipline
+We use the [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/) to carry out the training
+and testing of the model, the official website gives detailed data introduction.
+- WIDER Face data source:  
+Loads `wider_face` type dataset with directory structures like this:
+
+  ```
+  dataset/wider_face/
+  ├── wider_face_split
+  │   ├── wider_face_train_bbx_gt.txt
+  │   ├── wider_face_val_bbx_gt.txt
+  ├── WIDER_train
+  │   ├── images
+  │   │   ├── 0--Parade
+  │   │   │   ├── 0_Parade_marchingband_1_100.jpg
+  │   │   │   ├── 0_Parade_marchingband_1_381.jpg
+  │   │   │   │   ...
+  │   │   ├── 10--People_Marching
+  │   │   │   ...
+  ├── WIDER_val
+  │   ├── images
+  │   │   ├── 0--Parade
+  │   │   │   ├── 0_Parade_marchingband_1_1004.jpg
+  │   │   │   ├── 0_Parade_marchingband_1_1045.jpg
+  │   │   │   │   ...
+  │   │   ├── 10--People_Marching
+  │   │   │   ...
+  ```
+
+- Download dataset manually:  
+To download the WIDER FACE dataset, run the following commands:
+```
+cd dataset/wider_face && ./download_wider_face.sh
+```
+
+<a name="Configuration_file"></a>
+
+## 5. Configuration file
+
+We use the `configs/face_detection/blazeface_fpn_ssh_1000e.yml` configuration for training. The summary of the configuration file is as follows:
+
+```yaml
+_BASE_: [
+  '../datasets/wider_face.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1000e.yml',
+  '_base_/blazeface_fpn.yml',
+  '_base_/face_reader.yml',
+]
+weights: output/blazeface_fpn_ssh_1000e/model_final
+multi_scale_eval: True
+```
+
+`blazeface_fpn_ssh_1000e.yml` The configuration needs to rely on other configuration files, in this example it needs to rely on:
+
+```
+wider_face.yml：Mainly explains the path of training data and verification data
+
+runtime.yml：Mainly describes the common operating parameters, such as whether to use GPU, how many epochs to store checkpoints, etc.
+
+optimizer_1000e.yml：Mainly explains the configuration of learning rate and optimizer
+
+blazeface_fpn.yml：Mainly explain the situation of the model and the backbone network
+
+face_reader.yml：It mainly describes the configuration of the data reader, such as batch size, the number of concurrent loading subprocesses, etc., and also includes post-reading preprocessing operations, such as resize, data enhancement, etc.
+```
+
+According to the actual situation, modify the above files, such as the data set path, batch size, etc.
+
+For the configuration of the base model, please refer to `configs/face_detection/_base_/blazeface.yml`.
+The improved model adds the neck structure of FPN and SSH. For the configuration file, please refer to `configs/face_detection/_base_/blazeface_fpn.yml`. You can configure FPN and SSH if needed, which is as follows:
+
+```yaml
+BlazeNet:
+   blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]]
+   double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96],
+                           [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]]
+   act: hard_swish # Configure the activation function of BlazeBlock in backbone, the basic model is relu, hard_swish is required when adding FPN and SSH
+
+BlazeNeck:
+   neck_type : fpn_ssh # Optional only_fpn, only_ssh and fpn_ssh
+   in_channel: [96,96]
+```
+
+<a name="Training_and_Inference"></a>
+
+## 6. Training_and_Inference
+
+<a name="Training"></a>
+
+### 6.1 Training
+Firstly, download the pretrained model.
+```bash
+wget https://paddledet.bj.bcebos.com/models/pretrained/blazenet_pretrain.pdparams
+```
+PaddleDetection provides a single-GPU/multi-GPU training mode to meet the various training needs of users.
+* single-GPU training
+```bash
+export CUDA_VISIBLE_DEVICES=0 # Do not need to execute this command under windows and Mac
+python tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml -o pretrain_weight=blazenet_pretrain
+```
+
+* multi-GPU training
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3 # Do not need to execute this command under windows and Mac
+python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml -o pretrain_weight=blazenet_pretrain
+```
+* Resume training from Checkpoint
+
+  In the daily training process, if the training was be interrupted, using the -r command to resume training:
+
+```bash
+export CUDA_VISIBLE_DEVICES=0 # Do not need to execute this command under windows and Mac
+python tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml -r output/blazeface_fan_ssh_1000e/100
+ ```
+* Training hyperparameters
+
+`BlazeFace` training is based on each GPU `batch_size=32` training on 4 GPUs (total `batch_size` is 128), the learning rate is 0.002, and the total training epoch is set as 1000.
+
+
+**NOTE:** Not support evaluation during train.
+
+<a name="Evaluation"></a>
+
+### 6.2 Evaluate on the WIDER FACE
+- Evaluate and generate results files:
+```shell
+python -u tools/eval.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml \
+       -o weights=output/blazeface_fpn_ssh_1000e/model_final \
+       multi_scale_eval=True BBoxPostProcess.nms.score_threshold=0.1
+```
+Set `multi_scale_eval=True` for multi-scale evaluation，after the evaluation is completed, the test result in txt format will be generated in `output/pred`.
+
+- Download the official evaluation script to evaluate the AP metrics:
+
+```bash
+wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
+unzip eval_tools.zip && rm -f eval_tools.zip
+```
+
+- Start evaluation:
+
+Method One: Python evaluation:
+
+```bash
+git clone https://github.com/wondervictor/WiderFace-Evaluation.git
+cd WiderFace-Evaluation
+# Compile
+python3 setup.py build_ext --inplace
+# Start evaluation
+python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth
+```
+
+Method Two: MatLab evaluation:
+
+```bash
+# Modify the result path and the name of the curve to be drawn in `eval_tools/wider_eval.m`:
+pred_dir = './pred';  
+legend_name = 'Paddle-BlazeFace';
+
+`wider_eval.m` is the main execution program of the evaluation module. The run command is as follows:
+matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
+```
+<a name="Inference_deployment"></a>
+
+### 6.3 Inference deployment
+
+The model file saved in the model training process includes forward prediction and back propagation. In actual industrial deployment, back propagation is not required. Therefore, the model needs to be exported into the model format required for deployment.
+The `tools/export_model.py` script is provided in PaddleDetection to export the model:
+
+```bash
+python tools/export_model.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml --output_dir=./inference_model \
+ -o weights=output/blazeface_fpn_ssh_1000e/best_model BBoxPostProcess.nms.score_threshold=0.1
+```
+The inference model will be exported to the `inference_model/blazeface_fpn_ssh_1000e` directory, which are `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` If no folder is specified, the model will be exported In `output_inference`.
+
+* `score_threshold` for nms is modified as 0.1 for inference, because it takes great speed performance improvement while has little effect on mAP. For more documentation about model export, please refer to: [export doc](https://github.com/PaddlePaddle/PaddleDetection/deploy/EXPORT_MODEL.md)
+
+ PaddleDetection provides multiple deployment forms of PaddleInference, PaddleServing, and PaddleLite, supports multiple platforms such as server, mobile, and embedded, and provides a complete deployment plan for Python and C++.
+* Here, we take Python as an example to illustrate how to use PaddleInference for model deployment:
+```bash
+python deploy/python/infer.py --model_dir=./inference_model/blazeface_fpn_ssh_1000e --image_file=demo/road554.png --use_gpu=True
+```
+* `infer.py` provides a rich interface for users to access video files and cameras for prediction. For more information, please refer to: [Python deployment](https://github.com/PaddlePaddle/PaddleDetection/deploy/python.md).
+
+* For more documentation on deployment, please refer to: [deploy doc](https://github.com/PaddlePaddle/PaddleDetection/deploy/README.md).
+
+<a name="Increase_in_inference_speed"></a>
+
+### 6.4 Improvement of inference speed
+
+If you want to reproduce our speed indicators, you need to modify the input size of inference model in the `./inference_model/blazeface_fpn_ssh_1000e/infer_cfg.yml` configuration file. As follows:
+```yaml
+mode: fluid
+draw_threshold: 0.5
+metric: WiderFace
+arch: Face
+min_subgraph_size: 3
+Preprocess:
+- is_scale: false
+  mean:
+  - 123
+  - 117
+  - 104
+  std:
+  - 127.502231
+  - 127.502231
+  - 127.502231
+  type: NormalizeImage
+- interp: 1
+  keep_ratio: false
+  target_size:
+  - 640
+  - 640
+  type: Resize
+- type: Permute
+label_list:
+- face
+```
+
+If you want the model to be inferred faster in the CPU environment, install [paddlepaddle_gpu-0.0.0](https://paddle-wheel.bj.bcebos.com/develop-cpu-mkl/paddlepaddle-0.0.0-cp37-cp37m-linux_x86_64.whl) (dependency of mkldnn) and enable_mkldnn is set to True, when predicting acceleration. 
+
+```bash
+# use GPU:
+python deploy/python/infer.py --model_dir=./inference_model/blazeface_fpn_ssh_1000e --image_dir=./path/images --run_benchmark=True --use_gpu=True
+
+# inference with mkldnn use CPU
+# downdoad whl package
+wget https://paddle-wheel.bj.bcebos.com/develop-cpu-mkl/paddlepaddle-0.0.0-cp37-cp37m-linux_x86_64.whl
+#install paddlepaddle_gpu-0.0.0
+pip install paddlepaddle-0.0.0-cp37-cp37m-linux_x86_64.whl
+python deploy/python/infer.py --model_dir=./inference_model/blazeface_fpn_ssh_1000e --image_dir=./path/images --enable_mkldnn=True --run_benchmark=True --cpu_threads=5
+
+```
+
+## 7. Citations
+
+```
+@misc{long2020ppyolo,
+title={PP-YOLO: An Effective and Efficient Implementation of Object Detector},
+author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen},
+year={2020},
+eprint={2007.12099},
+archivePrefix={arXiv},
+primaryClass={cs.CV}
+}
+@misc{ppdet2019,
+title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
+author={PaddlePaddle Authors},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
+year={2019}
+}
+@article{bazarevsky2019blazeface,
+title={BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs},
+author={Valentin Bazarevsky and Yury Kartynnik and Andrey Vakunov and Karthik Raveendran and Matthias Grundmann},
+year={2019},
+eprint={1907.05047},
+ archivePrefix={arXiv}
+}
+```
--- a/detection/blazeface_paddle/test_blazeface.py
+++ b/detection/blazeface_paddle/test_blazeface.py
@@ -0,0 +1,593 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import requests
+import logging
+import imghdr
+import pickle
+import tarfile
+from functools import partial
+
+import cv2
+import numpy as np
+from sklearn.metrics.pairwise import cosine_similarity
+from tqdm import tqdm
+from prettytable import PrettyTable
+from PIL import Image, ImageDraw, ImageFont
+import paddle
+from paddle.inference import Config
+from paddle.inference import create_predictor
+
+__all__ = ["parser"]
+BASE_INFERENCE_MODEL_DIR = os.path.expanduser("~/.insightface/ppmodels/")
+BASE_DOWNLOAD_URL = "https://paddle-model-ecology.bj.bcebos.com/model/insight-face/{}.tar"
+
+
+def parser(add_help=True):
+    def str2bool(v):
+        return v.lower() in ("true", "t", "1")
+
+    parser = argparse.ArgumentParser(add_help=add_help)
+    
+    parser.add_argument(
+        "--det_model",
+        type=str,
+        default="BlazeFace",
+        help="The detection model.")
+    parser.add_argument(
+        "--use_gpu",
+        type=str2bool,
+        default=True,
+        help="Whether use GPU to predict. Default by True.")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=str2bool,
+        default=True,
+        help="Whether use MKLDNN to predict, valid only when --use_gpu is False. Default by False."
+    )
+    parser.add_argument(
+        "--cpu_threads",
+        type=int,
+        default=1,
+        help="The num of threads with CPU, valid only when --use_gpu is False. Default by 1."
+    )
+    parser.add_argument(
+        "--input",
+        type=str,
+        help="The path or directory of image(s) or video to be predicted.")
+    parser.add_argument(
+        "--output", type=str, default="./output/", help="The directory of prediction result.")
+    parser.add_argument(
+        "--det_thresh",
+        type=float,
+        default=0.8,
+        help="The threshold of detection postprocess. Default by 0.8.")
+    return parser
+
+
+def print_config(args):
+    args = vars(args)
+    table = PrettyTable(['Param', 'Value'])
+    for param in args:
+        table.add_row([param, args[param]])
+    width = len(str(table).split("\n")[0])
+    print("{}".format("-" * width))
+    print("PaddleFace".center(width))
+    print(table)
+    print("Powered by PaddlePaddle!".rjust(width))
+    print("{}".format("-" * width))
+
+
+def download_with_progressbar(url, save_path):
+    """Download from url with progressbar.
+    """
+    if os.path.isfile(save_path):
+        os.remove(save_path)
+    response = requests.get(url, stream=True)
+    total_size_in_bytes = int(response.headers.get("content-length", 0))
+    block_size = 1024  # 1 Kibibyte
+    progress_bar = tqdm(total=total_size_in_bytes, unit="iB", unit_scale=True)
+    with open(save_path, "wb") as file:
+        for data in response.iter_content(block_size):
+            progress_bar.update(len(data))
+            file.write(data)
+    progress_bar.close()
+    if total_size_in_bytes == 0 or progress_bar.n != total_size_in_bytes or not os.path.isfile(
+            save_path):
+        raise Exception(
+            f"Something went wrong while downloading model/image from {url}")
+
+
+def check_model_file(model):
+    """Check the model files exist and download and untar when no exist.
+    """
+    model_map = {
+        "ArcFace": "arcface_iresnet50_v1.0_infer",
+        "BlazeFace": "blazeface_fpn_ssh_1000e_v1.0_infer",
+        "MobileFace": "mobileface_v1.0_infer"
+    }
+
+    if os.path.isdir(model):
+        model_file_path = os.path.join(model, "inference.pdmodel")
+        params_file_path = os.path.join(model, "inference.pdiparams")
+        if not os.path.exists(model_file_path) or not os.path.exists(
+                params_file_path):
+            raise Exception(
+                f"The specifed model directory error. The drectory must include 'inference.pdmodel' and 'inference.pdiparams'."
+            )
+
+    elif model in model_map:
+        storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
+                                    model)
+        url = BASE_DOWNLOAD_URL.format(model_map[model])
+
+        tar_file_name_list = [
+            "inference.pdiparams", "inference.pdiparams.info",
+            "inference.pdmodel"
+        ]
+        model_file_path = storage_directory("inference.pdmodel")
+        params_file_path = storage_directory("inference.pdiparams")
+        if not os.path.exists(model_file_path) or not os.path.exists(
+                params_file_path):
+            tmp_path = storage_directory(url.split("/")[-1])
+            logging.info(f"Download {url} to {tmp_path}")
+            os.makedirs(storage_directory(), exist_ok=True)
+            download_with_progressbar(url, tmp_path)
+            with tarfile.open(tmp_path, "r") as tarObj:
+                for member in tarObj.getmembers():
+                    filename = None
+                    for tar_file_name in tar_file_name_list:
+                        if tar_file_name in member.name:
+                            filename = tar_file_name
+                    if filename is None:
+                        continue
+                    file = tarObj.extractfile(member)
+                    with open(storage_directory(filename), "wb") as f:
+                        f.write(file.read())
+            os.remove(tmp_path)
+        if not os.path.exists(model_file_path) or not os.path.exists(
+                params_file_path):
+            raise Exception(
+                f"Something went wrong while downloading and unzip the model[{model}] files!"
+            )
+    else:
+        raise Exception(
+            f"The specifed model name error. Support 'BlazeFace' for detection. And support local directory that include model files ('inference.pdmodel' and 'inference.pdiparams')."
+        )
+
+    return model_file_path, params_file_path
+
+
+def normalize_image(img, scale=None, mean=None, std=None, order='chw'):
+    if isinstance(scale, str):
+        scale = eval(scale)
+    scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+    mean = mean if mean is not None else [0.485, 0.456, 0.406]
+    std = std if std is not None else [0.229, 0.224, 0.225]
+
+    shape = (3, 1, 1) if order == 'chw' else (1, 1, 3)
+    mean = np.array(mean).reshape(shape).astype('float32')
+    std = np.array(std).reshape(shape).astype('float32')
+
+    if isinstance(img, Image.Image):
+        img = np.array(img)
+
+    assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+    return (img.astype('float32') * scale - mean) / std
+
+
+def to_CHW_image(img):
+    if isinstance(img, Image.Image):
+        img = np.array(img)
+    return img.transpose((2, 0, 1))
+
+
+class ColorMap(object):
+    def __init__(self, num):
+        super().__init__()
+        self.get_color_map_list(num)
+        self.color_map = {}
+        self.ptr = 0
+
+    def __getitem__(self, key):
+        return self.color_map[key]
+
+    def update(self, keys):
+        for key in keys:
+            if key not in self.color_map:
+                i = self.ptr % len(self.color_list)
+                self.color_map[key] = self.color_list[i]
+                self.ptr += 1
+
+    def get_color_map_list(self, num_classes):
+        color_map = num_classes * [0, 0, 0]
+        for i in range(0, num_classes):
+            j = 0
+            lab = i
+            while lab:
+                color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+                color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+                color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+                j += 1
+                lab >>= 3
+        self.color_list = [
+            color_map[i:i + 3] for i in range(0, len(color_map), 3)
+        ]
+
+
+class ImageReader(object):
+    def __init__(self, inputs):
+        super().__init__()
+        self.idx = 0
+        if isinstance(inputs, np.ndarray):
+            self.image_list = [inputs]
+        else:
+            imgtype_list = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff'}
+            self.image_list = []
+            if os.path.isfile(inputs):
+                if imghdr.what(inputs) not in imgtype_list:
+                    raise Exception(
+                        f"Error type of input path, only support: {imgtype_list}"
+                    )
+                self.image_list.append(inputs)
+            elif os.path.isdir(inputs):
+                tmp_file_list = os.listdir(inputs)
+                warn_tag = False
+                for file_name in tmp_file_list:
+                    file_path = os.path.join(inputs, file_name)
+                    if not os.path.isfile(file_path):
+                        warn_tag = True
+                        continue
+                    if imghdr.what(file_path) in imgtype_list:
+                        self.image_list.append(file_path)
+                    else:
+                        warn_tag = True
+                if warn_tag:
+                    logging.warning(
+                        f"The directory of input contine directory or not supported file type, only support: {imgtype_list}"
+                    )
+            else:
+                raise Exception(
+                    f"The file of input path not exist! Please check input: {inputs}"
+                )
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        if self.idx >= len(self.image_list):
+            raise StopIteration
+
+        data = self.image_list[self.idx]
+        if isinstance(data, np.ndarray):
+            self.idx += 1
+            return data, "tmp.png"
+        path = data
+        _, file_name = os.path.split(path)
+        img = cv2.imread(path)
+        if img is None:
+            logging.warning(f"Error in reading image: {path}! Ignored.")
+            self.idx += 1
+            return self.__next__()
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        self.idx += 1
+        return img, file_name
+
+    def __len__(self):
+        return len(self.image_list)
+
+
+class VideoReader(object):
+    def __init__(self, inputs):
+        super().__init__()
+        videotype_list = {"mp4"}
+        if os.path.splitext(inputs)[-1][1:] not in videotype_list:
+            raise Exception(
+                f"The input file is not supported, only support: {videotype_list}"
+            )
+        if not os.path.isfile(inputs):
+            raise Exception(
+                f"The file of input path not exist! Please check input: {inputs}"
+            )
+        self.capture = cv2.VideoCapture(inputs)
+        self.file_name = os.path.split(inputs)[-1]
+
+    def get_info(self):
+        info = {}
+        width = int(self.capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(self.capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+        info["file_name"] = self.file_name
+        info["fps"] = 30
+        info["shape"] = (width, height)
+        info["fourcc"] = cv2.VideoWriter_fourcc(* 'mp4v')
+        return info
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        ret, frame = self.capture.read()
+        if not ret:
+            raise StopIteration
+        return frame, self.file_name
+
+
+class ImageWriter(object):
+    def __init__(self, output_dir):
+        super().__init__()
+        if output_dir is None:
+            raise Exception(
+                "Please specify the directory of saving prediction results by --output."
+            )
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        self.output_dir = output_dir
+
+    def write(self, image, file_name):
+        path = os.path.join(self.output_dir, file_name)
+        cv2.imwrite(path, cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
+
+
+class VideoWriter(object):
+    def __init__(self, output_dir, video_info):
+        super().__init__()
+        if output_dir is None:
+            raise Exception(
+                "Please specify the directory of saving prediction results by --output."
+            )
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        output_path = os.path.join(output_dir, video_info["file_name"])
+        fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+        self.writer = cv2.VideoWriter(output_path, video_info["fourcc"],
+                                      video_info["fps"], video_info["shape"])
+
+    def write(self, frame, file_name):
+        self.writer.write(frame)
+
+    def __del__(self):
+        if hasattr(self, "writer"):
+            self.writer.release()
+
+
+class BasePredictor(object):
+    def __init__(self, predictor_config):
+        super().__init__()
+        self.predictor_config = predictor_config
+        self.predictor, self.input_names, self.output_names = self.load_predictor(
+            predictor_config["model_file"], predictor_config["params_file"])
+
+    def load_predictor(self, model_file, params_file):
+        config = Config(model_file, params_file)
+        if self.predictor_config["use_gpu"]:
+            config.enable_use_gpu(200, 0)
+            config.switch_ir_optim(True)
+        else:
+            config.disable_gpu()
+            config.set_cpu_math_library_num_threads(self.predictor_config[
+                "cpu_threads"])
+
+            if self.predictor_config["enable_mkldnn"]:
+                try:
+                    # cache 10 different shapes for mkldnn to avoid memory leak
+                    config.set_mkldnn_cache_capacity(10)
+                    config.enable_mkldnn()
+                except Exception as e:
+                    logging.error(
+                        "The current environment does not support `mkldnn`, so disable mkldnn."
+                    )
+        config.disable_glog_info()
+        config.enable_memory_optim()
+        # use zero copy
+        config.switch_use_feed_fetch_ops(False)
+        predictor = create_predictor(config)
+        input_names = predictor.get_input_names()
+        output_names = predictor.get_output_names()
+        return predictor, input_names, output_names
+
+    def preprocess(self):
+        raise NotImplementedError
+
+    def postprocess(self):
+        raise NotImplementedError
+
+    def predict(self, img):
+        raise NotImplementedError
+
+
+class Detector(BasePredictor):
+    def __init__(self, det_config, predictor_config):
+        super().__init__(predictor_config)
+        self.det_config = det_config
+        self.target_size = self.det_config["target_size"]
+        self.thresh = self.det_config["thresh"]
+
+    def preprocess(self, img):
+        resize_h, resize_w = self.target_size
+        img_shape = img.shape
+        img_scale_x = resize_w / img_shape[1]
+        img_scale_y = resize_h / img_shape[0]
+        img = cv2.resize(
+            img, None, None, fx=img_scale_x, fy=img_scale_y, interpolation=1)
+        img = normalize_image(
+            img,
+            scale=1. / 255.,
+            mean=[0.485, 0.456, 0.406],
+            std=[0.229, 0.224, 0.225],
+            order='hwc')
+        img_info = {}
+        img_info["im_shape"] = np.array(
+            img.shape[:2], dtype=np.float32)[np.newaxis, :]
+        img_info["scale_factor"] = np.array(
+            [img_scale_y, img_scale_x], dtype=np.float32)[np.newaxis, :]
+
+        img = img.transpose((2, 0, 1)).copy()
+        img_info["image"] = img[np.newaxis, :, :, :]
+        return img_info
+
+    def postprocess(self, np_boxes):
+        expect_boxes = (np_boxes[:, 1] > self.thresh) & (np_boxes[:, 0] > -1)
+        return np_boxes[expect_boxes, :]
+
+    def predict(self, img):
+        inputs = self.preprocess(img)
+        for input_name in self.input_names:
+            input_tensor = self.predictor.get_input_handle(input_name)
+            input_tensor.copy_from_cpu(inputs[input_name])
+        self.predictor.run()
+        output_tensor = self.predictor.get_output_handle(self.output_names[0])
+        np_boxes = output_tensor.copy_to_cpu()
+        # boxes_num = self.detector.get_output_handle(self.detector_output_names[1])
+        # np_boxes_num = boxes_num.copy_to_cpu()
+        box_list = self.postprocess(np_boxes)
+        return box_list
+
+class FaceDetector(object):
+    def __init__(self, args, print_info=True):
+        super().__init__()
+        if print_info:
+            print_config(args)
+
+        self.font_path = os.path.join(
+            os.path.abspath(os.path.dirname(__file__)),
+            "SourceHanSansCN-Medium.otf")
+        self.args = args
+
+        predictor_config = {
+            "use_gpu": args.use_gpu,
+            "enable_mkldnn": args.enable_mkldnn,
+            "cpu_threads": args.cpu_threads
+        }
+
+        model_file_path, params_file_path = check_model_file(
+            args.det_model)
+        det_config = {"thresh": args.det_thresh, "target_size": [640, 640]}
+        predictor_config["model_file"] = model_file_path
+        predictor_config["params_file"] = params_file_path
+        self.det_predictor = Detector(det_config, predictor_config)
+        self.color_map = ColorMap(100)
+
+    def preprocess(self, img):
+        img = img.astype(np.float32, copy=False)
+        return img
+
+    def draw(self, img, box_list, labels):
+        self.color_map.update(labels)
+        im = Image.fromarray(img)
+        draw = ImageDraw.Draw(im)
+
+        for i, dt in enumerate(box_list):
+            bbox, score = dt[2:], dt[1]
+            label = labels[i]
+            color = tuple(self.color_map[label])
+
+            xmin, ymin, xmax, ymax = bbox
+
+            font_size = max(int((xmax - xmin) // 6), 10)
+            font = ImageFont.truetype(self.font_path, font_size)
+
+            text = "{} {:.4f}".format(label, score)
+            th = sum(font.getmetrics())
+            tw = font.getsize(text)[0]
+            start_y = max(0, ymin - th)
+
+            draw.rectangle(
+                [(xmin, start_y), (xmin + tw + 1, start_y + th)], fill=color)
+            draw.text(
+                (xmin + 1, start_y),
+                text,
+                fill=(255, 255, 255),
+                font=font,
+                anchor="la")
+            draw.rectangle(
+                [(xmin, ymin), (xmax, ymax)], width=2, outline=color)
+        return np.array(im)
+
+    def predict_np_img(self, img):
+        input_img = self.preprocess(img)
+        box_list = None
+        np_feature = None
+        if hasattr(self, "det_predictor"):
+            box_list = self.det_predictor.predict(input_img)
+        return box_list, np_feature
+
+    def init_reader_writer(self, input_data):
+        if isinstance(input_data, np.ndarray):
+            self.input_reader = ImageReader(input_data)
+            if hasattr(self, "det_predictor"):
+                self.output_writer = ImageWriter(self.args.output)
+        elif isinstance(input_data, str):
+            if input_data.endswith('mp4'):
+                self.input_reader = VideoReader(input_data)
+                info = self.input_reader.get_info()
+                self.output_writer = VideoWriter(self.args.output, info)
+            else:
+                self.input_reader = ImageReader(input_data)
+                if hasattr(self, "det_predictor"):
+                    self.output_writer = ImageWriter(self.args.output)
+        else:
+            raise Exception(
+                f"The input data error. Only support path of image or video(.mp4) and dirctory that include images."
+            )
+
+    def predict(self, input_data, print_info=False):
+        """Predict input_data.
+
+        Args:
+            input_data (str | NumPy.array): The path of image, or the derectory including images, or the image data in NumPy.array format.
+            print_info (bool, optional): Wheather to print the prediction results. Defaults to False.
+
+        Yields:
+            dict: {
+                "box_list": The prediction results of detection.
+                "features": The output of recognition.
+                "labels": The results of retrieval.
+                }
+        """
+        self.init_reader_writer(input_data)
+        for img, file_name in self.input_reader:
+            if img is None:
+                logging.warning(f"Error in reading img {file_name}! Ignored.")
+                continue
+            box_list, np_feature = self.predict_np_img(img)
+            labels = ["face"] * len(box_list)
+            if box_list is not None:
+                result = self.draw(img, box_list, labels=labels)
+                self.output_writer.write(result, file_name)
+            if print_info:
+                logging.info(f"File: {file_name}, predict label(s): {labels}")
+            yield {
+                "box_list": box_list,
+                "features": np_feature,
+                "labels": labels
+            }
+        logging.info(f"Predict complete!")
+
+
+# for CLI
+def main(args=None):
+    logging.basicConfig(level=logging.INFO)
+
+    args = parser().parse_args()
+    predictor = FaceDetector(args)
+    res = predictor.predict(args.input, print_info=True)
+    for _ in res:
+        pass
+
+
+if __name__ == "__main__":
+    main()
--- a/recognition/arcface_paddle/README.md
+++ b/recognition/arcface_paddle/README.md
@@ -0,0 +1 @@
+README_en.md
--- a/recognition/arcface_paddle/README_ch.md
+++ b/recognition/arcface_paddle/README_ch.md
@@ -0,0 +1,230 @@
+简体中文 | [English](README_en.md)
+
+# Arcface-Paddle
+
+## 1. 简介
+
+`Arcface-Paddle`是基于PaddlePaddle实现的，开源深度人脸检测、识别工具。`Arcface-Paddle`目前提供了三个预训练模型，包括用于人脸检测的 `BlazeFace`、用于人脸识别的 `ArcFace` 和 `MobileFace`。
+
+- 本部分内容为人脸识别部分。
+- 人脸检测相关内容可以参考：[基于BlazeFace的人脸检测](../../detection/blazeface_paddle/README_ch.md)。
+- 基于PaddleInference的Whl包预测部署内容可以参考：[Whl包预测部署](https://github.com/littletomatodonkey/insight-face-paddle)。
+
+Note: 在此非常感谢 [GuoQuanhao](https://github.com/GuoQuanhao) 基于PaddlePaddle复现了 [Arcface的基线模型](https://github.com/GuoQuanhao/arcface-Paddle)。
+
+## 2. 环境准备
+
+请参照 [Installation](./install_ch.md) 配置实验所需环境。
+
+## 3. 数据准备
+
+### 3.1 进入 repo 目录。
+
+```
+cd arcface_paddle/
+```
+
+### 3.2 下载与解压数据集
+
+使用下面的命令下载并解压 MS1M 数据集。
+
+```shell
+# 下载数据集
+wget https://paddle-model-ecology.bj.bcebos.com/data/insight-face/MS1M_bin.tar
+# 解压数据集
+tar -xf MS1M_bin.tar
+```
+
+注意：
+* 如果希望在windows环境下安装wget，可以参考：[链接](https://www.cnblogs.com/jeshy/p/10518062.html)；如果希望在windows环境中安装tar命令，可以参考：[链接](https://www.cnblogs.com/chooperman/p/14190107.html)。
+* 如果macOS环境下没有安装wget命令，可以运行下面的命令进行安装。
+
+```shell
+# 安装 homebrew
+ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
+# 安装wget
+brew install wget
+```
+
+
+解压完成之后，文件夹目录结构如下。
+
+```
+Arcface-Paddle/MSiM_bin
+|_ images
+|  |_ 00000000.bin
+|  |_ ...
+|  |_ 05822652.bin
+|_ label.txt
+|_ agedb_30.bin
+|_ cfp_ff.bin
+|_ cfp_fp.bin
+|_ lfw.bin
+```
+
+* 标签文件格式：
+
+  ```
+   # delimiter: "\t"
+   # the following the content of label.txt
+   images/00000000.bin 0
+   ...
+  ```
+
+如果需要使用自定义数据集，请按照上述格式进行整理，并替换配置文件中的数据集目录。
+
+注意：
+* 这里为了更加方便`Dataloader`读取数据，将原始的`train.rec`文件转化为很多`bin文件`，每个`bin文件`都唯一对应一张原始图像。如果您采集得到的文件均为原始的图像文件，那么可以参考`1.3节`中的内容完成原始图像文件到bin文件的转换。
+
+### 3.3 原始图像文件与bin文件的转换
+
+如果希望将原始的图像文件转换为本文用于训练的bin文件，那么可以使用下面的命令进行转换。
+
+```shell
+python3.7 tools/convert_image_bin.py --image_path="your/input/image/path" --bin_path="your/output/bin/path" --mode="image2bin"
+```
+
+如果希望将bin文件转化为原始的图像文件，那么可以使用下面的命令进行转换。
+
+```shell
+python3.7 tools/convert_image_bin.py --image_path="your/input/bin/path" --bin_path="your/output/image/path" --mode="bin2image"
+```
+
+## 4. 模型训练
+
+准备好配置文件后，可以通过以下方式开始训练过程。
+
+```bash
+python3.7 train.py \
+    --network 'MobileFaceNet_128' \
+    --lr=0.1 \
+    --batch_size 512 \
+    --weight_decay 2e-4 \
+    --embedding_size 128 \
+    --logdir="log" \
+    --output "emore_arcface" \
+    --resume 0
+```
+
+上述命令中，需要传入如下参数:
+
+ `network`: 模型名称, 默认值为 `MobileFaceNet_128`;
+ `lr`: 初始学习率, 默认值为  `0.1`;
+ `batch_size`:  Batch size 的大小, 默认值为  `512`;
+ `weight_decay`:  正则化策略, 默认值为  `2e-4`;
+ `embedding_size`: 人脸 embedding 的长度, 默认值为 `128`;
+ `logdir`: VDL 输出 log 的存储路径, 默认值为 `"log"`;
+ `output`: 训练过程中的模型文件存储路径, 默认值为 `"emore_arcface"`;
+ `resume`: 是否恢复分类层的模型权重。 `1` 表示使用之前好的权重文件进行初始化，  `0` 代表重新初始化。 如果想要恢复分类层的模型权重， 需要保证 `output` 目录下包含： `rank:0_softmax_weight_mom.pkl` 和 `rank:0_softmax_weight.pkl` 两个文件。
+
+* 训练过程中的输出 log 示例如下:
+
+  ```
+  ...
+  Speed 500.89 samples/sec   Loss 55.5692   Epoch: 0   Global Step: 200   Required: 104 hours, lr_backbone_value: 0.000000, lr_pfc_value: 0.000000
+  ...
+  [lfw][2000]XNorm: 9.890562
+  [lfw][2000]Accuracy-Flip: 0.59017+-0.02031
+  [lfw][2000]Accuracy-Highest: 0.59017
+  [cfp_fp][2000]XNorm: 12.920007
+  [cfp_fp][2000]Accuracy-Flip: 0.53329+-0.01262
+  [cfp_fp][2000]Accuracy-Highest: 0.53329
+  [agedb_30][2000]XNorm: 12.188049
+  [agedb_30][2000]Accuracy-Flip: 0.51967+-0.02316
+  [agedb_30][2000]Accuracy-Highest: 0.51967
+  ...
+  ```
+
+
+在训练过程中，可以通过  `VisualDL` 实时查看loss变化，更多信息请参考 [VisualDL](https://github.com/PaddlePaddle/VisualDL/)。
+
+
+## 5. 模型评估
+
+可以通过以下方式开始模型评估过程。
+
+```bash
+python3.7 valid.py
+    --network MobileFaceNet_128  \
+    --checkpoint emore_arcface \
+```
+
+上述命令中，需要传入如下参数:
+
+ `network`: 模型名称, 默认值为 `MobileFaceNet_128`;
+ `checkpoint`: 保存模型权重的目录, 默认值为 `emore_arcface`;
+
+**注意:** 上面的命令将评估模型文件 `./emore_arcface/MobileFaceNet_128.pdparams` .您也可以通过同时修改 `network` 和 `checkpoint` 来修改要评估的模型文件。
+
+## 6. 模型导出
+PaddlePaddle支持使用预测引擎进行预测推理，通过导出inference模型将模型固化：
+
+```bash
+python export_inference_model.py --network MobileFaceNet_128 --output ./inference_model/ --pretrained_model ./emore_arcface/MobileFaceNet_128.pdparams
+```
+
+导出模型后，在 `./inference_model/` 目录下有：
+
+```
+./inference_model/
+|_ inference.pdmodel
+|_ inference.pdiparams
+```
+
+## 7. 模型精度与速度benchmark
+
+在MS1M训练集上进行模型训练，最终得到的模型指标在lfw、cfp_fp、agedb30三个数据集上的精度指标以及CPU、GPU的预测耗时如下。
+
+| 模型结构                  | lfw   | cfp_fp | agedb30  | CPU 耗时 | GPU 耗时 | 模型下载地址 | 
+| ------------------------- | ----- | ------ | ------- |-------|  -------- | ---- |
+| MobileFaceNet-Paddle      | 0.9945 | 0.9343  | 0.9613 | 4.3ms | 2.3ms   | [下载地址](https://paddle-model-ecology.bj.bcebos.com/model/insight-face/mobileface_v1.0_infer.tar) |
+| MobileFaceNet-mxnet | 0.9950 | 0.8894  | 0.9591   |  7.3ms | 4.7ms   | - |
+| ArcFace-Paddle      | 0.9973 | 0.9743  | 0.9788 | - | -   | [下载地址](https://paddle-model-ecology.bj.bcebos.com/model/insight-face/arcface_iresnet50_v1.0_infer.tar) |
+
+* 注：这里`ArcFace-Paddle`的backbone为iResNet50，大小相对较大，在CPU设备或者移动端设备上不推荐使用，因此没有给出具体的预测时间。
+
+**测试环境：**
+  * CPU: Intel(R) Xeon(R) Gold 6184 CPU @ 2.40GHz
+  * GPU: a single NVIDIA Tesla V100
+
+
+## 8. 模型预测
+
+
+融合人脸检测过程，可以完成"检测+识别"的人脸识别过程。
+
+首先下载索引库、待识别图像与字体文件。
+
+```bash
+# 下载用于人脸识别的索引库，这里因为示例图像是老友记中的图像，所以使用老友记中角色的人脸图像构建的底库。
+wget https://raw.githubusercontent.com/littletomatodonkey/insight-face-paddle/main/demo/friends/index.bin
+# 下载用于人脸识别的示例图像
+wget https://raw.githubusercontent.com/littletomatodonkey/insight-face-paddle/main/demo/friends/query/friends2.jpg
+# 下载字体，用于可视化
+wget https://raw.githubusercontent.com/littletomatodonkey/insight-face-paddle/main/SourceHanSansCN-Medium.otf
+```
+
+示例图像如下所示。
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/littletomatodonkey/
+insight-face-paddle/main/demo/friends/query/friends2.jpg"  width = "800" />
+</div>
+
+
+`检测+识别`串联预测的示例脚本如下。
+
+```shell
+# 同时使用检测+识别
+python3.7 test_recognition.py --det --rec --index=index.bin --input=friends2.jpg --output="./output"
+```
+
+最终可视化结果保存在`output`目录下，可视化结果如下所示。
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/littletomatodonkey/
+insight-face-paddle/main/demo/friends/output/friends2.jpg"  width = "800" />
+</div>
+
+
+更多关于参数解释，索引库构建、whl包预测部署的内容可以参考：[Whl包预测部署](https://github.com/littletomatodonkey/insight-face-paddle)。
--- a/recognition/arcface_paddle/README_en.md
+++ b/recognition/arcface_paddle/README_en.md
@@ -0,0 +1,179 @@
+[简体中文](README_ch.md) | English
+
+# Arcface-Paddle
+
+Please refer to [Installation](../install_en.md) to setup environment at first.
+
+
+## 1. Data preparation
+
+### 1.1 Enter recognition dir.
+
+```
+cd arcface_paddle/rec
+```
+
+### 1.2 Download and unzip dataset
+
+Use the following command to download and unzip MS1M dataset.
+
+
+```shell
+cd rec
+# download dataset
+wget https://paddle-model-ecology.bj.bcebos.com/data/insight-face/MS1M_bin.tar
+# unzip dataset
+tar -xf MS1M_bin.tar
+```
+
+**Note:**
+* If you want to install `wget` on Windows, please refer to [link](https://www.cnblogs.com/jeshy/p/10518062.html). If you want to install `tar` on Windows. please refer to [link](https://www.cnblogs.com/chooperman/p/14190107.html).
+* If `wget` is not installed on macOS, you can use the following command to install.
+
+```shell
+# install homebrew
+ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
+# install wget
+brew install wget
+```
+
+After finishing unzipping the dataset, the folder structure is as follows.
+
+```
+Arcface-Paddle/MSiM_bin
+|_ images
+|  |_ 00000000.bin
+|  |_ ...
+|  |_ 05822652.bin
+|_ label.txt
+|_ agedb_30.bin
+|_ cfp_ff.bin
+|_ cfp_fp.bin
+|_ lfw.bin
+```
+
+* 标签文件格式：
+
+  ```
+   # delimiter: "\t"
+   # the following the content of label.txt
+   images/00000000.bin 0
+   ...
+  ```
+
+If you want to use customed dataset, you can arrange your data according to the above format. And should replace data folder in the configuration using yours.
+
+
+
+**Note:**
+* For using `Dataloader` api for reading data, we convert `train.rec` into many little `bin` files, each `bin` file denotes a single image. If your dataset just contains origin image files. You can either rewrite the dataloader file or refer to section 1.3 to convert the original image files to `bin` files.
+
+
+### 1.3 Transform between original image files and bin files
+
+If you want to convert original image files to `bin` files used directly for training process, you can use the following command to finish the conversion.
+
+```shell
+python3.7 tools/convert_image_bin.py --image_path="your/input/image/path" --bin_path="your/output/bin/path" --mode="image2bin"
+```
+
+If you want to convert `bin` files to original image files, you can use the following command to finish the conversion.
+
+```shell
+python3.7 tools/convert_image_bin.py --image_path="your/input/bin/path" --bin_path="your/output/image/path" --mode="bin2image"
+```
+
+## 2 Model training
+
+After preparing the configuration file, The training process can be started in the following way.
+
+```bash
+python3.7 train.py \
+    --network 'MobileFaceNet_128' \
+    --lr=0.1 \
+    --batch_size 512 \
+    --weight_decay 2e-4 \
+    --embedding_size 128 \
+    --logdir="log" \
+    --output "emore_arcface" \
+    --resume 0
+```
+
+Among them:
+
+ `network`: Model name, such as `MobileFaceNet_128`;
+ `lr`: Initial learning rate, default by  `0.1`;
+ `batch_size`:  Batch size, default by  `512`;
+ `weight_decay`:  The strategy of regularization, default by  `2e-4`;
+ `embedding_size`: The length of face embedding, default by `128`;
+ `logdir`: VDL log storage directory, default by `"log"`;
+ `output`: Model stored path, default by: `"emore_arcface"`;
+ `resume`: Restore the classification layer parameters. `1` represents recovery parameters, and `0` represents reinitialization. If you need to resume training, you need to ensure that there are `rank:0_softmax_weight_mom.pkl` and `rank:0_softmax_weight.pkl` in the output directory.
+
+* The output log examples are as follows:
+
+  ```
+  ...
+  Speed 500.89 samples/sec   Loss 55.5692   Epoch: 0   Global Step: 200   Required: 104 hours, lr_backbone_value: 0.000000, lr_pfc_value: 0.000000
+  ...
+  [lfw][2000]XNorm: 9.890562
+  [lfw][2000]Accuracy-Flip: 0.59017+-0.02031
+  [lfw][2000]Accuracy-Highest: 0.59017
+  [cfp_fp][2000]XNorm: 12.920007
+  [cfp_fp][2000]Accuracy-Flip: 0.53329+-0.01262
+  [cfp_fp][2000]Accuracy-Highest: 0.53329
+  [agedb_30][2000]XNorm: 12.188049
+  [agedb_30][2000]Accuracy-Flip: 0.51967+-0.02316
+  [agedb_30][2000]Accuracy-Highest: 0.51967
+  ...
+  ```
+
+
+During training, you can view loss changes in real time through `VisualDL`,  For more information, please refer to [VisualDL](https://github.com/PaddlePaddle/VisualDL/).
+
+
+## 3 Model evaluation
+
+The model evaluation process can be started as follows.
+
+```bash
+python3.7 valid.py
+    --network MobileFaceNet_128  \
+    --checkpoint emore_arcface \
+```
+
+Among them:
+
+ `network`: Model name, such as `MobileFaceNet_128`;
+ `checkpoint`: Directory to save model weights, default by  `emore_arcface`;
+
+**Note:** The above command will evaluate the model `./emore_arcface/MobileFaceNet_128.pdparams` .You can also modify the model to be evaluated by modifying the network name and checkpoint at the same time .
+
+## 4 Model performance
+
+Dataset：MS1M
+
+| Model structure           | lfw   | cfp_fp | agedb30  | CPU time cost | GPU time cost |
+| ------------------------- | ----- | ------ | ------- | -------| -------- |
+| MobileFaceNet-Paddle      | 0.9945 | 0.9343  | 0.9613 | 4.3ms | 2.3ms   |
+| MobileFaceNet-mxnet | 0.9950 | 0.8894  | 0.9591   |  7.3ms | 4.7ms   |
+
+**Envrionment：**
+  * CPU: Intel(R) Xeon(R) Gold 6184 CPU @ 2.40GHz
+  * GPU: a single NVIDIA Tesla V100
+
+
+## 5. Export model
+PaddlePaddle supports inference using prediction engines. Firstly, you should export inference model.
+
+```bash
+python export_inference_model.py --network MobileFaceNet_128 --output ./inference_model/ --pretrained_model ./emore_arcface/MobileFaceNet_128.pdparams
+```
+
+After that, the inference model files are as follow:
+
+```
+./inference_model/
+|_ inference.pdmodel
+|_ inference.pdiparams
+```
--- a/recognition/arcface_paddle/backbones/init.py
+++ b/recognition/arcface_paddle/backbones/init.py
@@ -0,0 +1,16 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .mobilefacenet import MobileFaceNet_128
+from .iresnet import iresnet18, iresnet34, iresnet50, iresnet100, iresnet200
--- a/recognition/arcface_paddle/backbones/iresnet.py
+++ b/recognition/arcface_paddle/backbones/iresnet.py
@@ -0,0 +1,255 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# reference: https://raw.githubusercontent.com/GuoQuanhao/arcface-Paddle/main/backbones/iresnet.py
+
+import paddle
+from paddle import nn
+
+__all__ = ['iresnet18', 'iresnet34', 'iresnet50', 'iresnet100', 'iresnet200']
+
+
+def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
+    """3x3 convolution with padding"""
+    return nn.Conv2D(
+        in_planes,
+        out_planes,
+        kernel_size=3,
+        stride=stride,
+        padding=dilation,
+        groups=groups,
+        bias_attr=False,
+        dilation=dilation)
+
+
+def conv1x1(in_planes, out_planes, stride=1):
+    """1x1 convolution"""
+    return nn.Conv2D(
+        in_planes, out_planes, kernel_size=1, stride=stride, bias_attr=False)
+
+
+class IBasicBlock(nn.Layer):
+    expansion = 1
+
+    def __init__(self,
+                 inplanes,
+                 planes,
+                 stride=1,
+                 downsample=None,
+                 groups=1,
+                 base_width=64,
+                 dilation=1):
+        super(IBasicBlock, self).__init__()
+        if groups != 1 or base_width != 64:
+            raise ValueError(
+                'BasicBlock only supports groups=1 and base_width=64')
+        if dilation > 1:
+            raise NotImplementedError(
+                "Dilation > 1 not supported in BasicBlock")
+        self.bn1 = nn.BatchNorm2D(inplanes, epsilon=1e-05, momentum=0.1)
+        self.conv1 = conv3x3(inplanes, planes)
+        self.bn2 = nn.BatchNorm2D(planes, epsilon=1e-05, momentum=0.1)
+        self.prelu = nn.PReLU(planes)
+        self.conv2 = conv3x3(planes, planes, stride)
+        self.bn3 = nn.BatchNorm2D(planes, epsilon=1e-05, momentum=0.1)
+        self.downsample = downsample
+        self.stride = stride
+
+    def forward(self, x):
+        identity = x
+        out = self.bn1(x)
+        out = self.conv1(out)
+        out = self.bn2(out)
+        out = self.prelu(out)
+        out = self.conv2(out)
+        out = self.bn3(out)
+        if self.downsample is not None:
+            identity = self.downsample(x)
+        out += identity
+        return out
+
+
+class IResNet(nn.Layer):
+    fc_scale = 7 * 7
+
+    def __init__(self,
+                 block,
+                 layers,
+                 dropout=0,
+                 num_features=512,
+                 zero_init_residual=False,
+                 groups=1,
+                 width_per_group=64,
+                 replace_stride_with_dilation=None,
+                 fp16=False):
+        super(IResNet, self).__init__()
+        self.fp16 = fp16
+        self.inplanes = 64
+        self.dilation = 1
+        if replace_stride_with_dilation is None:
+            replace_stride_with_dilation = [False, False, False]
+        if len(replace_stride_with_dilation) != 3:
+            raise ValueError("replace_stride_with_dilation should be None "
+                             "or a 3-element tuple, got {}".format(
+                                 replace_stride_with_dilation))
+        self.groups = groups
+        self.base_width = width_per_group
+        self.conv1 = nn.Conv2D(
+            3,
+            self.inplanes,
+            kernel_size=3,
+            stride=1,
+            padding=1,
+            bias_attr=False)
+        self.bn1 = nn.BatchNorm2D(self.inplanes, epsilon=1e-05, momentum=0.1)
+        self.prelu = nn.PReLU(self.inplanes)
+        self.layer1 = self._make_layer(block, 64, layers[0], stride=2)
+        self.layer2 = self._make_layer(
+            block,
+            128,
+            layers[1],
+            stride=2,
+            dilate=replace_stride_with_dilation[0])
+        self.layer3 = self._make_layer(
+            block,
+            256,
+            layers[2],
+            stride=2,
+            dilate=replace_stride_with_dilation[1])
+        self.layer4 = self._make_layer(
+            block,
+            512,
+            layers[3],
+            stride=2,
+            dilate=replace_stride_with_dilation[2])
+        self.bn2 = nn.BatchNorm2D(
+            512 * block.expansion, epsilon=1e-05, momentum=0.1)
+        self.dropout = nn.Dropout(p=dropout)
+        self.fc = nn.Linear(512 * block.expansion * self.fc_scale,
+                            num_features)
+        self.features = nn.BatchNorm1D(
+            num_features, momentum=0.1, epsilon=1e-05)
+        self.features.weight = paddle.create_parameter(
+            shape=self.features.weight.shape,
+            dtype='float32',
+            default_initializer=nn.initializer.Constant(value=1.0))
+        # nn.init.constant_(self.features.weight, 1.0)
+        # 修改了stop_gradient，将True设为False
+        self.features.weight.stop_gradient = False
+        #self.features.weight.requires_grad = False
+
+        for m in self.sublayers():
+            if isinstance(m, nn.Conv2D):
+                m.weight = paddle.create_parameter(
+                    shape=m.weight.shape,
+                    dtype='float32',
+                    default_initializer=nn.initializer.Normal(
+                        mean=0.0, std=0.1))
+                # nn.init.normal_(m.weight, 0, 0.1)
+            elif isinstance(m, (nn.BatchNorm2D, nn.GroupNorm)):
+                m.weight = paddle.create_parameter(
+                    shape=m.weight.shape,
+                    dtype='float32',
+                    default_initializer=nn.initializer.Constant(value=1.0))
+                m.bias = paddle.create_parameter(
+                    shape=m.bias.shape,
+                    dtype='float32',
+                    default_initializer=nn.initializer.Constant(value=0.0))
+                # nn.init.constant_(m.weight, 1)
+                # nn.init.constant_(m.bias, 0)
+
+        if zero_init_residual:
+            for m in self.sublayers():
+                if isinstance(m, IBasicBlock):
+                    m.bn2.weight = paddle.create_parameter(
+                        shape=m.bn2.weight.shape,
+                        dtype='float32',
+                        default_initializer=nn.initializer.Constant(value=0.0))
+                    # nn.init.constant_(m.bn2.weight, 0)
+
+    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
+        downsample = None
+        previous_dilation = self.dilation
+        if dilate:
+            self.dilation *= stride
+            stride = 1
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                conv1x1(self.inplanes, planes * block.expansion, stride),
+                nn.BatchNorm2D(
+                    planes * block.expansion, epsilon=1e-05, momentum=0.1), )
+        layers = []
+        layers.append(
+            block(self.inplanes, planes, stride, downsample, self.groups,
+                  self.base_width, previous_dilation))
+        self.inplanes = planes * block.expansion
+        for _ in range(1, blocks):
+            layers.append(
+                block(
+                    self.inplanes,
+                    planes,
+                    groups=self.groups,
+                    base_width=self.base_width,
+                    dilation=self.dilation))
+
+        return nn.Sequential(*layers)
+
+    def forward(self, x):
+        with paddle.amp.auto_cast():
+            x = self.conv1(x)
+            x = self.bn1(x)
+            x = self.prelu(x)
+            x = self.layer1(x)
+            x = self.layer2(x)
+            x = self.layer3(x)
+            x = self.layer4(x)
+            x = self.bn2(x)
+            x = paddle.cast(x, dtype='float32')
+            x = paddle.flatten(x, 1)
+            x = self.dropout(x)
+        x = self.fc(paddle.cast(x, dtype='float16') if self.fp16 else x)
+        x = self.features(x)
+        return x
+
+
+def _iresnet(arch, block, layers, pretrained, progress, **kwargs):
+    model = IResNet(block, layers, **kwargs)
+    if pretrained:
+        raise ValueError()
+    return model
+
+
+def iresnet18(pretrained=False, progress=True, **kwargs):
+    return _iresnet('iresnet18', IBasicBlock, [2, 2, 2, 2], pretrained,
+                    progress, **kwargs)
+
+
+def iresnet34(pretrained=False, progress=True, **kwargs):
+    return _iresnet('iresnet34', IBasicBlock, [3, 4, 6, 3], pretrained,
+                    progress, **kwargs)
+
+
+def iresnet50(pretrained=False, progress=True, **kwargs):
+    return _iresnet('iresnet50', IBasicBlock, [3, 4, 14, 3], pretrained,
+                    progress, **kwargs)
+
+
+def iresnet100(pretrained=False, progress=True, **kwargs):
+    return _iresnet('iresnet100', IBasicBlock, [3, 13, 30, 3], pretrained,
+                    progress, **kwargs)
+
+
+def iresnet200(pretrained=False, progress=True, **kwargs):
+    return _iresnet('iresnet200', IBasicBlock, [6, 26, 60, 6], pretrained,
+                    progress, **kwargs)
--- a/recognition/arcface_paddle/backbones/mobilefacenet.py
+++ b/recognition/arcface_paddle/backbones/mobilefacenet.py
@@ -0,0 +1,162 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+from paddle import nn
+import math
+
+__all__ = ['MobileFaceNet_128']
+
+MobileFaceNet_BottleNeck_Setting = [
+    # t, c , n ,s
+    [2, 64, 5, 2],
+    [4, 128, 1, 2],
+    [2, 128, 6, 1],
+    [4, 128, 1, 2],
+    [2, 128, 2, 1]
+]
+
+
+class BottleNeck(nn.Layer):
+    def __init__(self, inp, oup, stride, expansion):
+        super().__init__()
+        self.connect = stride == 1 and inp == oup
+
+        self.conv = nn.Sequential(
+            # 1*1 conv
+            nn.Conv2D(
+                inp, inp * expansion, 1, 1, 0, bias_attr=False),
+            nn.BatchNorm2D(inp * expansion),
+            nn.PReLU(inp * expansion),
+
+            # 3*3 depth wise conv
+            nn.Conv2D(
+                inp * expansion,
+                inp * expansion,
+                3,
+                stride,
+                1,
+                groups=inp * expansion,
+                bias_attr=False),
+            nn.BatchNorm2D(inp * expansion),
+            nn.PReLU(inp * expansion),
+
+            # 1*1 conv
+            nn.Conv2D(
+                inp * expansion, oup, 1, 1, 0, bias_attr=False),
+            nn.BatchNorm2D(oup), )
+
+    def forward(self, x):
+        if self.connect:
+            return x + self.conv(x)
+        else:
+            return self.conv(x)
+
+
+class ConvBlock(nn.Layer):
+    def __init__(self, inp, oup, k, s, p, dw=False, linear=False):
+        super().__init__()
+        self.linear = linear
+        if dw:
+            self.conv = nn.Conv2D(
+                inp, oup, k, s, p, groups=inp, bias_attr=False)
+        else:
+            self.conv = nn.Conv2D(inp, oup, k, s, p, bias_attr=False)
+
+        self.bn = nn.BatchNorm2D(oup)
+        if not linear:
+            self.prelu = nn.PReLU(oup)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        if self.linear:
+            return x
+        else:
+            return self.prelu(x)
+
+
+class MobileFaceNet(nn.Layer):
+    def __init__(self,
+                 feature_dim=128,
+                 bottleneck_setting=MobileFaceNet_BottleNeck_Setting,
+                 **args):
+        super().__init__()
+        self.conv1 = ConvBlock(3, 64, 3, 2, 1)
+        self.dw_conv1 = ConvBlock(64, 64, 3, 1, 1, dw=True)
+
+        self.cur_channel = 64
+        block = BottleNeck
+        self.blocks = self._make_layer(block, bottleneck_setting)
+
+        self.conv2 = ConvBlock(128, 512, 1, 1, 0)
+        self.linear7 = ConvBlock(512, 512, 7, 1, 0, dw=True, linear=True)
+        self.linear1 = ConvBlock(512, feature_dim, 1, 1, 0, linear=True)
+
+        for m in self.sublayers():
+            if isinstance(m, nn.Conv2D):
+                # ks * ks * out_ch
+                n = m.weight.shape[1] * m.weight.shape[2] * m.weight.shape[3]
+                m.weight = paddle.create_parameter(
+                    shape=m.weight.shape,
+                    dtype='float32',
+                    default_initializer=nn.initializer.Normal(
+                        mean=0.0, std=math.sqrt(2.0 / n)))
+                # nn.init.normal_(m.weight, 0, 0.1)
+            elif isinstance(m, (nn.BatchNorm, nn.BatchNorm2D, nn.GroupNorm)):
+                m.weight = paddle.create_parameter(
+                    shape=m.weight.shape,
+                    dtype='float32',
+                    default_initializer=nn.initializer.Constant(value=1.0))
+                m.bias = paddle.create_parameter(
+                    shape=m.bias.shape,
+                    dtype='float32',
+                    default_initializer=nn.initializer.Constant(value=0.0))
+
+    def _make_layer(self, block, setting):
+        layers = []
+        for t, c, n, s in setting:
+            for i in range(n):
+                if i == 0:
+                    layers.append(block(self.cur_channel, c, s, t))
+                else:
+                    layers.append(block(self.cur_channel, c, 1, t))
+                self.cur_channel = c
+
+        return nn.Sequential(*layers)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.dw_conv1(x)
+        x = self.blocks(x)
+        x = self.conv2(x)
+        x = self.linear7(x)
+        x = self.linear1(x)
+        x = x.reshape([x.shape[0], x.shape[1] * x.shape[2] * x.shape[3]])
+        return x
+
+
+def MobileFaceNet_128(feature_dim=128, **args):
+    model = MobileFaceNet(feature_dim=feature_dim, **args)
+    return model
+
+
+# if __name__ == "__main__":
+#     paddle.set_device("cpu")
+#     x = paddle.rand([2, 3, 112, 112])
+#     net = MobileFaceNet()
+#     print(net)
+
+#     x = net(x)
+#     print(x.shape)
--- a/recognition/arcface_paddle/config.py
+++ b/recognition/arcface_paddle/config.py
@@ -0,0 +1,36 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from easydict import EasyDict as edict
+
+config = edict()
+config.dataset = "emore"
+config.sample_rate = 1
+config.momentum = 0.9
+
+config.data_dir = "./MS1M_bin"
+config.file_list = "MS1M_bin/label.txt"
+config.num_classes = 85742
+config.num_image = 5822653
+config.num_epoch = 32
+config.warmup_epoch = 1
+config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
+
+
+def lr_step_func(epoch):
+    return ((epoch + 1) / (4 + 1))**2 if epoch < -1 else 0.1**len(
+        [m for m in [6, 12, 18, 24] if m - 1 <= epoch])
+
+
+config.lr_func = lr_step_func
--- a/recognition/arcface_paddle/dataloader/init.py
+++ b/recognition/arcface_paddle/dataloader/init.py
@@ -0,0 +1,15 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .common_dataset import CommonDataset
--- a/recognition/arcface_paddle/dataloader/common_dataset.py
+++ b/recognition/arcface_paddle/dataloader/common_dataset.py
@@ -0,0 +1,66 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle.io import Dataset
+from paddle.vision import transforms
+import os
+from PIL import Image
+import random
+import paddle
+import numpy as np
+
+from dataloader.kv_helper import read_img_from_bin
+
+
+class CommonDataset(Dataset):
+    def __init__(self, root_dir, label_file):
+        super(CommonDataset, self).__init__()
+        self.root_dir = root_dir
+        self.label_file = label_file
+        self.full_lines = self.get_file_list(label_file)
+        self.delimiter = "\t"
+        self.transform = transforms.Compose([
+            transforms.RandomHorizontalFlip(),
+            transforms.ToTensor(),
+            transforms.Normalize(
+                mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
+        ])
+
+        self.num_samples = len(self.full_lines)
+
+    def get_file_list(self, label_file):
+        with open(label_file, "r") as fin:
+            full_lines = fin.readlines()
+
+        print("finish reading file, image num: {}".format(len(full_lines)))
+        return full_lines
+
+    def __getitem__(self, idx):
+        try:
+            line = self.full_lines[idx]
+
+            img_path, label = line.split(self.delimiter)
+            label = int(label)
+            label = paddle.to_tensor(label, dtype='int64')
+            img_path = os.path.join(self.root_dir, img_path)
+            img = read_img_from_bin(img_path)[:, :, ::-1]
+            img = self.transform(img)
+            return img, label
+
+        except Exception as e:
+            print("data read faild: {}, exception info: {}".format(line, e))
+            return self.__getitem__(random.randint(0, len(self)))
+
+    def __len__(self):
+        return self.num_samples
--- a/recognition/arcface_paddle/dataloader/kv_helper.py
+++ b/recognition/arcface_paddle/dataloader/kv_helper.py
@@ -0,0 +1,68 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import pickle
+import struct
+import random
+import multiprocessing
+import numpy as np
+import cv2
+import json
+
+
+def readkv(f):
+    """readkv"""
+    keylendata = f.read(4)
+    if len(keylendata) != 4:
+        return None
+    keylen = struct.unpack('I', keylendata)[0]
+    if keylen > 5000:
+        raise Exception('wrong key len' + str(keylen))
+    key = f.read(keylen)
+    valuelen = struct.unpack('I', f.read(4))[0]
+    value = f.read(valuelen)
+    return key, value
+
+
+def writekv(f, k, v, flush=True):
+    """writekv"""
+    f.write(struct.pack('I', len(k)))
+    f.write(k)
+    f.write(struct.pack('I', len(v)))
+    f.write(v)
+    if flush:
+        f.flush()
+    return
+
+
+def trans_img_to_bin(img_name, output_path):
+    with open(img_name, "rb") as fin:
+        img = fin.read()
+    key = os.path.split(img_name)[-1]
+    with open(output_path, "wb") as fout:
+        writekv(fout, key.encode(), pickle.dumps(img, -1))
+    return
+
+
+def read_img_from_bin(input_path):
+    # the file can exist many key-vals, but it just save one in fact.
+    with open(input_path, "rb") as fin:
+        r = readkv(fin)
+        assert r is not None
+        _, value = r
+        value = pickle.loads(value)
+        value = np.frombuffer(value, dtype='uint8')
+        img = cv2.imdecode(value, 1)
+    return img
--- a/recognition/arcface_paddle/eval/init.py
+++ b/recognition/arcface_paddle/eval/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/recognition/arcface_paddle/eval/verification.py
+++ b/recognition/arcface_paddle/eval/verification.py
@@ -0,0 +1,275 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import datetime
+import os
+import pickle
+from io import BytesIO
+from PIL import Image
+import cv2
+import numpy as np
+import sklearn
+import paddle
+from scipy import interpolate
+from sklearn.decomposition import PCA
+from sklearn.model_selection import KFold
+
+
+class LFold:
+    def __init__(self, n_splits=2, shuffle=False):
+        self.n_splits = n_splits
+        if self.n_splits > 1:
+            self.k_fold = KFold(n_splits=n_splits, shuffle=shuffle)
+
+    def split(self, indices):
+        if self.n_splits > 1:
+            return self.k_fold.split(indices)
+        else:
+            return [(indices, indices)]
+
+
+def calculate_roc(thresholds,
+                  embeddings1,
+                  embeddings2,
+                  actual_issame,
+                  nrof_folds=10,
+                  pca=0):
+    assert (embeddings1.shape[0] == embeddings2.shape[0])
+    assert (embeddings1.shape[1] == embeddings2.shape[1])
+    nrof_pairs = min(len(actual_issame), embeddings1.shape[0])
+    nrof_thresholds = len(thresholds)
+    k_fold = LFold(n_splits=nrof_folds, shuffle=False)
+
+    tprs = np.zeros((nrof_folds, nrof_thresholds))
+    fprs = np.zeros((nrof_folds, nrof_thresholds))
+    accuracy = np.zeros((nrof_folds))
+    indices = np.arange(nrof_pairs)
+
+    if pca == 0:
+        diff = np.subtract(embeddings1, embeddings2)
+        dist = np.sum(np.square(diff), 1)
+
+    for fold_idx, (train_set, test_set) in enumerate(k_fold.split(indices)):
+        if pca > 0:
+            print('doing pca on', fold_idx)
+            embed1_train = embeddings1[train_set]
+            embed2_train = embeddings2[train_set]
+            _embed_train = np.concatenate((embed1_train, embed2_train), axis=0)
+            pca_model = PCA(n_components=pca)
+            pca_model.fit(_embed_train)
+            embed1 = pca_model.transform(embeddings1)
+            embed2 = pca_model.transform(embeddings2)
+            embed1 = sklearn.preprocessing.normalize(embed1)
+            embed2 = sklearn.preprocessing.normalize(embed2)
+            diff = np.subtract(embed1, embed2)
+            dist = np.sum(np.square(diff), 1)
+
+        # Find the best threshold for the fold
+        acc_train = np.zeros((nrof_thresholds))
+        for threshold_idx, threshold in enumerate(thresholds):
+            _, _, acc_train[threshold_idx] = calculate_accuracy(
+                threshold, dist[train_set], actual_issame[train_set])
+        best_threshold_index = np.argmax(acc_train)
+        for threshold_idx, threshold in enumerate(thresholds):
+            tprs[fold_idx, threshold_idx], fprs[
+                fold_idx, threshold_idx], _ = calculate_accuracy(
+                    threshold, dist[test_set], actual_issame[test_set])
+        _, _, accuracy[fold_idx] = calculate_accuracy(
+            thresholds[best_threshold_index], dist[test_set],
+            actual_issame[test_set])
+
+    tpr = np.mean(tprs, 0)
+    fpr = np.mean(fprs, 0)
+    return tpr, fpr, accuracy
+
+
+def calculate_accuracy(threshold, dist, actual_issame):
+    predict_issame = np.less(dist, threshold)
+    tp = np.sum(np.logical_and(predict_issame, actual_issame))
+    fp = np.sum(np.logical_and(predict_issame, np.logical_not(actual_issame)))
+    tn = np.sum(
+        np.logical_and(
+            np.logical_not(predict_issame), np.logical_not(actual_issame)))
+    fn = np.sum(np.logical_and(np.logical_not(predict_issame), actual_issame))
+
+    tpr = 0 if (tp + fn == 0) else float(tp) / float(tp + fn)
+    fpr = 0 if (fp + tn == 0) else float(fp) / float(fp + tn)
+    acc = float(tp + tn) / dist.size
+    return tpr, fpr, acc
+
+
+def calculate_val(thresholds,
+                  embeddings1,
+                  embeddings2,
+                  actual_issame,
+                  far_target,
+                  nrof_folds=10):
+    assert (embeddings1.shape[0] == embeddings2.shape[0])
+    assert (embeddings1.shape[1] == embeddings2.shape[1])
+    nrof_pairs = min(len(actual_issame), embeddings1.shape[0])
+    nrof_thresholds = len(thresholds)
+    k_fold = LFold(n_splits=nrof_folds, shuffle=False)
+
+    val = np.zeros(nrof_folds)
+    far = np.zeros(nrof_folds)
+
+    diff = np.subtract(embeddings1, embeddings2)
+    dist = np.sum(np.square(diff), 1)
+    indices = np.arange(nrof_pairs)
+
+    for fold_idx, (train_set, test_set) in enumerate(k_fold.split(indices)):
+
+        # Find the threshold that gives FAR = far_target
+        far_train = np.zeros(nrof_thresholds)
+        for threshold_idx, threshold in enumerate(thresholds):
+            _, far_train[threshold_idx] = calculate_val_far(
+                threshold, dist[train_set], actual_issame[train_set])
+        if np.max(far_train) >= far_target:
+            f = interpolate.interp1d(far_train, thresholds, kind='slinear')
+            threshold = f(far_target)
+        else:
+            threshold = 0.0
+
+        val[fold_idx], far[fold_idx] = calculate_val_far(
+            threshold, dist[test_set], actual_issame[test_set])
+
+    val_mean = np.mean(val)
+    far_mean = np.mean(far)
+    val_std = np.std(val)
+    return val_mean, val_std, far_mean
+
+
+def calculate_val_far(threshold, dist, actual_issame):
+    predict_issame = np.less(dist, threshold)
+    true_accept = np.sum(np.logical_and(predict_issame, actual_issame))
+    false_accept = np.sum(
+        np.logical_and(predict_issame, np.logical_not(actual_issame)))
+    n_same = np.sum(actual_issame)
+    n_diff = np.sum(np.logical_not(actual_issame))
+    # print(true_accept, false_accept)
+    # print(n_same, n_diff)
+    val = float(true_accept) / float(n_same)
+    far = float(false_accept) / float(n_diff)
+    return val, far
+
+
+def evaluate(embeddings, actual_issame, nrof_folds=10, pca=0):
+    # Calculate evaluation metrics
+    thresholds = np.arange(0, 4, 0.01)
+    embeddings1 = embeddings[0::2]
+    embeddings2 = embeddings[1::2]
+    tpr, fpr, accuracy = calculate_roc(
+        thresholds,
+        embeddings1,
+        embeddings2,
+        np.asarray(actual_issame),
+        nrof_folds=nrof_folds,
+        pca=pca)
+    thresholds = np.arange(0, 4, 0.001)
+    val, val_std, far = calculate_val(
+        thresholds,
+        embeddings1,
+        embeddings2,
+        np.asarray(actual_issame),
+        1e-3,
+        nrof_folds=nrof_folds)
+    return tpr, fpr, accuracy, val, val_std, far
+
+
+# 返回为numpy
+@paddle.no_grad()
+def load_bin(path, image_size):
+    try:
+        with open(path, 'rb') as f:
+            bins, issame_list = pickle.load(f)  # py2
+    except UnicodeDecodeError as e:
+        with open(path, 'rb') as f:
+            bins, issame_list = pickle.load(f, encoding='bytes')  # py3
+    data_list = []
+    for flip in [0, 1]:
+        data = np.empty(
+            shape=[len(issame_list) * 2, 3, image_size[0], image_size[1]],
+            dtype=np.float32)
+        data_list.append(data)
+    for idx in range(len(issame_list) * 2):
+        _bin = bins[idx]
+        img = np.array(Image.open(BytesIO(_bin)), dtype=np.float32)
+        if img.shape[1] != image_size[0]:
+            img = cv2.resize(img, (image_size[0], image_size[0]))
+        img = img.transpose(2, 0, 1)
+        for flip in [0, 1]:
+            if flip == 1:
+                img = np.flip(img, 2)
+            data_list[flip][idx][:] = img
+        if idx % 1000 == 0:
+            print('loading bin', idx)
+    print(data_list[0].shape)
+    return data_list, issame_list
+
+
+@paddle.no_grad()
+def test(data_set, backbone, batch_size, nfolds=10):
+    print('testing verification..')
+    data_list = data_set[0]
+    issame_list = data_set[1]
+    embeddings_list = []
+    time_consumed = 0.0
+    for i in range(len(data_list)):
+        data = data_list[i]
+        embeddings = None
+        ba = 0
+        while ba < data.shape[0]:
+            bb = min(ba + batch_size, data.shape[0])
+            count = bb - ba
+            _data = data[bb - batch_size:bb]
+            time0 = datetime.datetime.now()
+            img = ((_data / 255) - 0.5) / 0.5
+            # 将numpy转Tensor
+            img = paddle.to_tensor(img)
+            net_out: paddle.Tensor = backbone(img)
+            _embeddings = net_out.detach().cpu().numpy()
+            time_now = datetime.datetime.now()
+            diff = time_now - time0
+            time_consumed += diff.total_seconds()
+            if embeddings is None:
+                embeddings = np.zeros((data.shape[0], _embeddings.shape[1]))
+            embeddings[ba:bb, :] = _embeddings[(batch_size - count):, :]
+            ba = bb
+        embeddings_list.append(embeddings)
+
+    _xnorm = 0.0
+    _xnorm_cnt = 0
+    for embed in embeddings_list:
+        for i in range(embed.shape[0]):
+            _em = embed[i]
+            _norm = np.linalg.norm(_em)
+            _xnorm += _norm
+            _xnorm_cnt += 1
+    _xnorm /= _xnorm_cnt
+
+    embeddings = embeddings_list[0].copy()
+    try:
+        embeddings = sklearn.preprocessing.normalize(embeddings)
+    except:
+        print(embeddings)
+    acc1 = 0.0
+    std1 = 0.0
+    embeddings = embeddings_list[0] + embeddings_list[1]
+    embeddings = sklearn.preprocessing.normalize(embeddings)
+    print(embeddings.shape)
+    print('infer time', time_consumed)
+    _, _, accuracy, val, val_std, far = evaluate(
+        embeddings, issame_list, nrof_folds=nfolds)
+    acc2, std2 = np.mean(accuracy), np.std(accuracy)
+    return acc1, std1, acc2, std2, _xnorm, embeddings_list
--- a/recognition/arcface_paddle/export_inference_model.py
+++ b/recognition/arcface_paddle/export_inference_model.py
@@ -0,0 +1,54 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+
+import paddle
+import paddle.nn.functional as F
+from paddle.jit import to_static
+
+import backbones
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--network", type=str)
+    parser.add_argument("--pretrained_model", type=str)
+    parser.add_argument("--output_path", type=str, default="./inference")
+
+    return parser.parse_args()
+
+
+def load_dygraph_pretrain(model, path=None):
+    if not os.path.exists(path):
+        raise ValueError(f"The path of pretrained model file does not exists: {path}.")
+    param_state_dict = paddle.load(path)
+    model.set_dict(param_state_dict)
+    return
+
+
+def main():
+    args = parse_args()
+
+    net = eval("backbones.{}".format(args.network))()
+    load_dygraph_pretrain(net, path=args.pretrained_model)
+    net.eval()
+
+    net = to_static(net, input_spec=[paddle.static.InputSpec(shape=[None, 3, 112, 112], dtype='float32')])
+    paddle.jit.save(net, os.path.join(args.output_path, "inference"))
+
+
+if __name__ == "__main__":
+    main()
--- a/recognition/arcface_paddle/infer.py
+++ b/recognition/arcface_paddle/infer.py
@@ -0,0 +1,69 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import numpy as np
+import cv2
+import paddle
+import backbones
+
+
+def read_img(img_path=None):
+    if img_path is None:
+        img = np.random.randint(0, 255, size=(112, 112, 3), dtype=np.uint8)
+    else:
+        img = cv2.imread(img_path)
+        img = cv2.resize(img, (112, 112))
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    scale = 1. / 255.
+    mean = [0.5, 0.5, 0.5]
+    std = [0.5, 0.5, 0.5]
+    mean = np.array(mean).reshape((1, 1, 3)).astype('float32')
+    std = np.array(std).reshape((1, 1, 3)).astype('float32')
+    img = (img.astype('float32') * scale - mean) / std
+    img = img.transpose((2, 0, 1))
+    img = np.expand_dims(img, 0)
+    return img
+
+
+def main(args):
+    backbone = eval("backbones.{}".format(args.network))()
+    model_params = args.network + '.pdparams'
+    print('INFO:' + args.network + ' chose! ' + model_params + ' loaded!')
+    state_dict = paddle.load(os.path.join(args.checkpoint, model_params))
+    backbone.set_state_dict(state_dict)
+    backbone.eval()
+    img = read_img(args.img)
+    input_tensor = paddle.to_tensor(img)
+    feat = backbone(input_tensor).numpy()
+    return feat
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Paddle ArcFace Testing')
+    parser.add_argument(
+        '--network',
+        type=str,
+        default='MobileFaceNet_128',
+        help='backbone network')
+    parser.add_argument(
+        '--img', type=str, default='None', help='backbone network')
+    parser.add_argument(
+        '--checkpoint',
+        type=str,
+        default='emore_arcface',
+        help='checkpoint dir')
+    args = parser.parse_args()
+    main(args)
--- a/recognition/arcface_paddle/install_ch.md
+++ b/recognition/arcface_paddle/install_ch.md
@@ -0,0 +1,124 @@
+简体中文 | [English](install_en.md)
+
+# 安装说明
+
+---
+本章将介绍如何安装ArcFace-paddle及其依赖项。
+
+
+## 1. 安装PaddlePaddle
+
+运行ArcFace-paddle需要`PaddlePaddle 2.1`或更高版本。可以参考下面的步骤安装PaddlePaddle。
+
+### 1.1 环境要求
+
+- python 3.x
+- cuda >= 10.1 (如果使用paddlepaddle-gpu)
+- cudnn >= 7.6.4 (如果使用paddlepaddle-gpu)
+- nccl >= 2.1.2 (如果使用分布式训练/评估)
+- gcc >= 8.2
+
+建议使用我们提供的docker运行ArcFace-paddle，有关docker、nvidia-docker使用请参考[链接](https://www.runoob.com/docker/docker-tutorial.html)。
+
+在cuda10.1时，建议显卡驱动版本大于等于418.39；在使用cuda10.2时，建议显卡驱动版本大于440.33，更多cuda版本与要求的显卡驱动版本可以参考[链接](https://docs.nvidia.com/deploy/cuda-compatibility/index.html)。
+
+
+如果不使用docker，可以直接跳过1.2部分内容，从1.3部分开始执行。
+
+
+### 1.2 （建议）准备docker环境。第一次使用这个镜像，会自动下载该镜像，请耐心等待。
+
+```
+# 切换到工作目录下
+cd /home/Projects
+# 首次运行需创建一个docker容器，再次运行时不需要运行当前命令
+# 创建一个名字为face_paddle的docker容器，并将当前目录映射到容器的/paddle目录下
+
+如果您希望在CPU环境下使用docker，使用docker而不是nvidia-docker创建docker，设置docker容器共享内存shm-size为8G，建议设置8G以上
+sudo docker run --name face_paddle -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0 /bin/bash
+
+如果希望使用GPU版本的容器，请运行以下命令创建容器。
+sudo nvidia-docker run --name face_paddle -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
+```
+
+
+您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
+
+```
+# ctrl+P+Q可退出docker 容器，重新进入docker 容器使用如下命令
+sudo docker exec -it face_paddle /bin/bash
+```
+
+### 1.3 通过pip安装PaddlePaddle
+
+运行下面的命令，通过pip安装最新GPU版本PaddlePaddle
+
+```bash
+pip3 install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple
+```
+
+如果希望在CPU环境中使用PaddlePaddle，可以运行下面的命令安装PaddlePaddle。
+
+```bash
+pip3 install paddlepaddle --upgrade -i https://mirror.baidu.com/pypi/simple
+```
+
+**注意：**
+* 如果先安装了CPU版本的paddlepaddle，之后想切换到GPU版本，那么需要首先卸载CPU版本的paddle，再安装GPU版本的paddle，否则容易导致使用的paddle版本混乱。
+* 您也可以从源码编译安装PaddlePaddle，请参照[PaddlePaddle 安装文档](http://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+
+### 1.4 验证是否安装成功
+
+使用以下命令可以验证PaddlePaddle是否安装成功。
+
+```python
+import paddle
+paddle.utils.run_check()
+```
+
+查看PaddlePaddle版本的命令如下：
+
+```bash
+python3 -c "import paddle; print(paddle.__version__)"
+```
+
+注意：
+- 从源码编译的PaddlePaddle版本号为0.0.0，请确保使用了PaddlePaddle 2.0及之后的源码编译。
+- ArcFace-paddle基于PaddlePaddle高性能的分布式训练能力，若您从源码编译，请确保打开编译选项，**WITH_DISTRIBUTE=ON**。具体编译选项参考[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3)。
+- 在docker中运行时，为保证docker容器有足够的共享内存用于Paddle的数据读取加速，在创建docker容器时，请设置参数`--shm_size=8g`，条件允许的话可以设置为更大的值
+- 如果只希望使用识别模块，则可以跳过下面的第3部分；如果只希望使用检测模块，则可以跳过下面的第2部分。
+
+
+## 2. 准备识别模块的环境
+
+进入`rec`文件夹，安装`requiremnts`，命令如下。
+
+```shell
+cd rec
+pip3 install -r requirement.txt
+```
+
+## 3. 准备检测模块的环境
+
+检测模块依赖于PaddleDetection，需要首先下载PaddleDetection的代码，并安装`requiremnts`。具体命令如下。
+
+```bash
+# 克隆PaddleDetection仓库
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection
+# 安装其他依赖
+pip3 install -r requirements.txt
+```
+
+更多安装教程，请参考: [Install doc](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)
+
+## 4. 准备检测+识别推理系统(InsightFacePaddle)环境
+`InsightFacePaddle` 基于 `PaddlePaddle` 预测引擎，提供检测（detection）和识别（recognition）的端到端预测功能，具体请查看[InsightFacePaddle](./system/README_ch.md)。其依赖在 `./system/requirements.txt` 中，可以参考以下命令安装依赖包。
+
+```bash
+cd ./system
+pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
+```
--- a/recognition/arcface_paddle/install_en.md
+++ b/recognition/arcface_paddle/install_en.md
@@ -0,0 +1,120 @@
+[简体中文](install_ch.md) | English
+
+# Installation
+
+---
+This tutorial introduces how to install ArcFace-paddle and its requirements.
+
+## 1. Install PaddlePaddle
+
+`PaddlePaddle 2.1` or later is required for ArcFace-paddle. You can use the following steps to install PaddlePaddle.
+
+### 1.1 Environment requirements
+
+- python 3.x
+- cuda >= 10.1 (necessary if you want to use paddlepaddle-gpu)
+- cudnn >= 7.6.4 (necessary if you want to use paddlepaddle-gpu)
+- nccl >= 2.1.2 (necessary if you want the use distributed training/eval)
+- gcc >= 8.2
+
+Docker is recomended to run ArcFace-paddle, for more detailed information about docker and nvidia-docker, you can refer to the [tutorial](https://www.runoob.com/docker/docker-tutorial.html).
+
+When you use cuda10.1, the driver version needs to be larger or equal than 418.39. When you use cuda10.2, the driver version needs to be larger or equal than 440.33. For more cuda versions and specific driver versions, you can refer to the [link](https://docs.nvidia.com/deploy/cuda-compatibility/index.html).
+
+If you do not want to use docker, you can skip section 1.2 and go into section 1.3 directly.
+
+
+### 1.2 (Recommended) Prepare for a docker environment. The first time you use this docker image, it will be downloaded automatically. Please be patient.
+
+
+```
+# Switch to the working directory
+cd /home/Projects
+# You need to create a docker container for the first run, and do not need to run the current command when you run it again
+# Create a docker container named face_paddle and map the current directory to the /paddle directory of the container
+# It is recommended to set a shared memory greater than or equal to 8G through the --shm-size parameter
+sudo docker run --name face_paddle -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0 /bin/bash
+
+# Use the following command to create a container if you want to use GPU in the container
+sudo nvidia-docker run --name face_paddle -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
+```
+
+You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more docker images.
+
+```
+# use ctrl+P+Q to exit docker, to re-enter docker using the following command:
+sudo docker exec -it face_paddle /bin/bash
+```
+
+### 1.3 Install PaddlePaddle using pip
+
+If you want to use PaddlePaddle on GPU, you can use the following command to install PaddlePaddle.
+
+```bash
+pip3 install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple
+```
+
+If you want to use PaddlePaddle on CPU, you can use the following command to install PaddlePaddle.
+
+```bash
+pip3 install paddlepaddle --upgrade -i https://mirror.baidu.com/pypi/simple
+```
+
+**Note:**
+* If you have already installed CPU version of PaddlePaddle and want to use GPU version now, you should uninstall CPU version of PaddlePaddle and then install GPU version to avoid package confusion.
+* You can also compile PaddlePaddle from source code, please refer to [PaddlePaddle Installation tutorial](http://www.paddlepaddle.org.cn/install/quick) to more compilation options.
+
+### 1.4 Verify Installation process
+
+```python
+import paddle
+paddle.utils.run_check()
+```
+
+Check PaddlePaddle version：
+
+```bash
+python3 -c "import paddle; print(paddle.__version__)"
+```
+
+Note:
+- Make sure the compiled source code is later than PaddlePaddle2.0.
+- If you want to enable distribution ability, you should assign **WITH_DISTRIBUTE=ON** when compiling. For more compilation options, please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
+- When running in docker, in order to ensure that the container has enough shared memory for dataloader acceleration of Paddle, please set the parameter `--shm_size=8g` at creating a docker container, if conditions permit, you can set it to a larger value.
+- If you just want to use recognition module, you can skip section 3. If you just want to use detection module, you can skip section 2.
+
+## 2. Prepare for the environment of recognition
+
+Enter folder `rec`, run the following command to install `requiremnts`.
+
+
+```shell
+cd rec
+pip3 install -r requirement.txt
+```
+
+## 3. Prepare for the environment of detection
+
+The detection module depends on PaddleDetection. You need to download PaddleDetection and install `requiremnts`, the command is as follows.
+
+
+```bash
+# clone PaddleDetection repo
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection
+# install requiremnts
+pip3 install -r requirements.txt
+```
+
+For more installation tutorials, please refer to [Install doc](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL.md)
+
+## 4. Prepare for the environment of InsightFacePaddle (detection & recognition)
+
+`InsightFacePaddle` is a inference system including detection and recognition based PaddlePaddle prediction engine. Please refer to [InsightFacePaddle](./system/README_en.md) for more infomation. The dependency packages of that list in `./system/requirements.txt`. The command is as follows to install.
+
+```bash
+cd ./system
+pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
+```
--- a/recognition/arcface_paddle/losses.py
+++ b/recognition/arcface_paddle/losses.py
@@ -0,0 +1,45 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+from paddle import nn
+
+
+class CosFace(nn.Layer):
+    def __init__(self, s=64.0, m=0.40):
+        super(CosFace, self).__init__()
+        self.s = s
+        self.m = m
+
+    def forward(self, cosine, label):
+        m_hot = paddle.nn.functional.one_hot(
+            label.astype('long'), num_classes=85742) * self.m
+        cosine -= m_hot
+        ret = cosine * self.s
+        return ret
+
+
+class ArcFace(nn.Layer):
+    def __init__(self, s=64.0, m=0.50):
+        super(ArcFace, self).__init__()
+        self.s = s
+        self.m = m
+
+    def forward(self, cosine: paddle.Tensor, label):
+        m_hot = paddle.nn.functional.one_hot(
+            label.astype('long'), num_classes=85742) * self.m
+        cosine = cosine.acos()
+        cosine += m_hot
+        cosine = cosine.cos() * self.s
+        return cosine
--- a/recognition/arcface_paddle/partial_fc.py
+++ b/recognition/arcface_paddle/partial_fc.py
@@ -0,0 +1,168 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import paddle
+import paddle.nn as nn
+from paddle.nn.functional import normalize, linear
+import pickle
+
+
+class PartialFC(nn.Layer):
+    """
+    Author: {Xiang An, Yang Xiao, XuHan Zhu} in DeepGlint,
+    Partial FC: Training 10 Million Identities on a Single Machine
+    See the original paper:
+    https://arxiv.org/abs/2010.05222
+    """
+
+    @paddle.no_grad()
+    def __init__(self,
+                 rank,
+                 world_size,
+                 batch_size,
+                 resume,
+                 margin_softmax,
+                 num_classes,
+                 sample_rate=1.0,
+                 embedding_size=512,
+                 prefix="./"):
+        super(PartialFC, self).__init__()
+        self.num_classes: int = num_classes
+        self.rank: int = rank
+        self.world_size: int = world_size
+        self.batch_size: int = batch_size
+        self.margin_softmax: callable = margin_softmax
+        self.sample_rate: float = sample_rate
+        self.embedding_size: int = embedding_size
+        self.prefix: str = prefix
+        self.num_local: int = num_classes // world_size + int(
+            rank < num_classes % world_size)
+        self.class_start: int = num_classes // world_size * rank + min(
+            rank, num_classes % world_size)
+        self.num_sample: int = int(self.sample_rate * self.num_local)
+
+        self.weight_name = os.path.join(
+            self.prefix, "rank:{}_softmax_weight.pkl".format(self.rank))
+        self.weight_mom_name = os.path.join(
+            self.prefix, "rank:{}_softmax_weight_mom.pkl".format(self.rank))
+
+        if resume:
+            try:
+                self.weight: paddle.Tensor = paddle.load(self.weight_name)
+                print("softmax weight resume successfully!")
+            except (FileNotFoundError, KeyError, IndexError):
+                self.weight = paddle.normal(0, 0.01, (self.num_local,
+                                                      self.embedding_size))
+                print("softmax weight resume fail!")
+
+            try:
+                self.weight_mom: paddle.Tensor = paddle.load(
+                    self.weight_mom_name)
+                print("softmax weight mom resume successfully!")
+            except (FileNotFoundError, KeyError, IndexError):
+                self.weight_mom: paddle.Tensor = paddle.zeros_like(self.weight)
+                print("softmax weight mom resume fail!")
+        else:
+            self.weight = paddle.normal(0, 0.01,
+                                        (self.num_local, self.embedding_size))
+            self.weight_mom: paddle.Tensor = paddle.zeros_like(self.weight)
+            print("softmax weight init successfully!")
+            print("softmax weight mom init successfully!")
+
+        self.index = None
+        if int(self.sample_rate) == 1:
+            self.update = lambda: 0
+            self.sub_weight = paddle.create_parameter(
+                shape=self.weight.shape,
+                dtype='float32',
+                default_initializer=paddle.nn.initializer.Assign(self.weight))
+            self.sub_weight_mom = self.weight_mom
+        else:
+            self.sub_weight = paddle.create_parameter(
+                shape=[1, 1],
+                dtype='float32',
+                default_initializer=paddle.nn.initializer.Assign(
+                    paddle.empty((1, 1))))
+
+    def save_params(self):
+        with open(self.weight_name, 'wb') as file:
+            pickle.dump(self.weight.numpy(), file)
+        with open(self.weight_mom_name, 'wb') as file:
+            pickle.dump(self.weight_mom.numpy(), file)
+
+    @paddle.no_grad()
+    def sample(self, total_label):
+        index_positive = (self.class_start <= total_label).numpy() & (
+            total_label < self.class_start + self.num_local).numpy()
+        total_label = total_label.numpy()
+        total_label[~index_positive] = -1
+        total_label[index_positive] -= self.class_start
+        total_label = paddle.to_tensor(total_label)
+
+    def forward(self, total_features, norm_weight):
+        logits = linear(total_features, paddle.t(norm_weight))
+        return logits
+
+    @paddle.no_grad()
+    def update(self):
+        self.weight_mom[self.index] = self.sub_weight_mom
+        self.weight[self.index] = self.sub_weight
+
+    def prepare(self, label, optimizer):
+        # label [64, 1]
+        total_label = label.detach()
+        self.sample(total_label)
+        optimizer._parameter_list[0] = self.sub_weight
+        norm_weight = normalize(self.sub_weight)
+        return total_label, norm_weight
+
+    def forward_backward(self, label, features, optimizer):
+        total_label, norm_weight = self.prepare(label, optimizer)
+        total_features = features.detach()
+        total_features.stop_gradient = False
+
+        logits = self.forward(total_features, norm_weight)
+        logits = self.margin_softmax(logits, total_label)
+
+        with paddle.no_grad():
+            max_fc = paddle.max(logits, axis=1, keepdim=True)
+
+            # calculate exp(logits) and all-reduce
+            logits_exp = paddle.exp(logits - max_fc)
+            logits_sum_exp = logits_exp.sum(axis=1, keepdim=True)
+
+            # calculate prob
+            logits_exp = logits_exp.divide(logits_sum_exp)
+
+            # get one-hot
+            grad = logits_exp
+            one_hot = paddle.nn.functional.one_hot(
+                total_label.astype('long'), num_classes=85742)
+
+            # calculate loss
+            loss = paddle.nn.functional.one_hot(
+                total_label.astype('long'),
+                num_classes=85742).multiply(grad).sum(axis=1)
+            loss_v = paddle.clip(loss, 1e-30).log().mean() * (-1)
+
+            # calculate grad
+            grad -= one_hot
+            grad = grad.divide(
+                paddle.to_tensor(
+                    self.batch_size * self.world_size, dtype='float32'))
+        (logits.multiply(grad)).backward()
+
+        x_grad = paddle.to_tensor(total_features.grad, stop_gradient=False)
+        return x_grad, loss_v
--- a/recognition/arcface_paddle/requirement.txt
+++ b/recognition/arcface_paddle/requirement.txt
@@ -0,0 +1,8 @@
+paddlepaddle-gpu==2.0.2
+visualdl >= 2.0.0b
+opencv-python
+pillow
+numpy
+easydict
+scipy
+sklearn
--- a/recognition/arcface_paddle/shell/export_inference_model.sh
+++ b/recognition/arcface_paddle/shell/export_inference_model.sh
@@ -0,0 +1 @@
+python export_inference_model.py --network MobileFaceNet_128 --output ./inference_model/ --pretrained_model ./emore_arcface/MobileFaceNet_128.pdparams
--- a/recognition/arcface_paddle/shell/infer.sh
+++ b/recognition/arcface_paddle/shell/infer.sh
@@ -0,0 +1,6 @@
+export CUDA_VISIBLE_DEVICES=1
+
+nohup python3.7 infer.py \
+    --network 'MobileFaceNet_128' \
+    --img='00000000.jpg' \
+    --checkpoint 'emore_arcface' > "infer_log.log" 2>&1 &
--- a/recognition/arcface_paddle/shell/train.sh
+++ b/recognition/arcface_paddle/shell/train.sh
@@ -0,0 +1,15 @@
+export CUDA_VISIBLE_DEVICES=1
+
+log_name="log"
+
+
+# If you want to reduce batchsize because of GPU memory,
+# you can reduce batch size and lr proportionally.
+python3.7 train.py \
+    --network 'MobileFaceNet_128' \
+    --lr=0.1 \
+    --batch_size 512 \
+    --weight_decay 2e-4 \
+    --embedding_size 128 \
+    --logdir="${log_name}" \
+    --output "emore_arcface"
--- a/recognition/arcface_paddle/shell/val.sh
+++ b/recognition/arcface_paddle/shell/val.sh
@@ -0,0 +1,6 @@
+export CUDA_VISIBLE_DEVICES=0
+
+
+nohup python3.7 valid.py \
+    --network 'MobileFaceNet_128' \
+    --checkpoint='emore_arcface' > "valid_log.log" 2>&1 &
--- a/recognition/arcface_paddle/test_recognition.py
+++ b/recognition/arcface_paddle/test_recognition.py
@@ -0,0 +1,720 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import requests
+import logging
+import imghdr
+import pickle
+import tarfile
+from functools import partial
+
+import cv2
+import numpy as np
+from sklearn.metrics.pairwise import cosine_similarity
+from tqdm import tqdm
+from prettytable import PrettyTable
+from PIL import Image, ImageDraw, ImageFont
+import paddle
+from paddle.inference import Config
+from paddle.inference import create_predictor
+
+__all__ = ["InsightFace", "parser"]
+BASE_INFERENCE_MODEL_DIR = os.path.expanduser("~/.insightface/ppmodels/")
+BASE_DOWNLOAD_URL = "https://paddle-model-ecology.bj.bcebos.com/model/insight-face/{}.tar"
+
+
+def parser(add_help=True):
+    def str2bool(v):
+        return v.lower() in ("true", "t", "1")
+
+    parser = argparse.ArgumentParser(add_help=add_help)
+    parser.add_argument(
+        "--det", action="store_true", help="Whether to detect.")
+    parser.add_argument(
+        "--rec", action="store_true", help="Whether to recognize.")
+    
+    parser.add_argument(
+        "--det_model",
+        type=str,
+        default="BlazeFace",
+        help="The detection model.")
+    parser.add_argument(
+        "--rec_model",
+        type=str,
+        default="MobileFace",
+        help="The recognition model.")
+    parser.add_argument(
+        "--use_gpu",
+        type=str2bool,
+        default=True,
+        help="Whether use GPU to predict. Default by True.")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=str2bool,
+        default=True,
+        help="Whether use MKLDNN to predict, valid only when --use_gpu is False. Default by False."
+    )
+    parser.add_argument(
+        "--cpu_threads",
+        type=int,
+        default=1,
+        help="The num of threads with CPU, valid only when --use_gpu is False. Default by 1."
+    )
+    parser.add_argument(
+        "--input",
+        type=str,
+        help="The path or directory of image(s) or video to be predicted.")
+    parser.add_argument(
+        "--output", type=str, default="./output/", help="The directory of prediction result.")
+    parser.add_argument(
+        "--det_thresh",
+        type=float,
+        default=0.8,
+        help="The threshold of detection postprocess. Default by 0.8.")
+    parser.add_argument(
+        "--index", type=str, default=None, help="The path of index file.")
+    parser.add_argument(
+        "--cdd_num",
+        type=int,
+        default=5,
+        help="The number of candidates in the recognition retrieval. Default by 10."
+    )
+    parser.add_argument(
+        "--rec_thresh",
+        type=float,
+        default=0.45,
+        help="The threshold of recognition postprocess. Default by 0.45.")
+    parser.add_argument(
+        "--max_batch_size",
+        type=int,
+        default=1,
+        help="The maxium of batch_size to recognize. Default by 1.")
+
+    return parser
+
+
+def print_config(args):
+    args = vars(args)
+    table = PrettyTable(['Param', 'Value'])
+    for param in args:
+        table.add_row([param, args[param]])
+    width = len(str(table).split("\n")[0])
+    print("{}".format("-" * width))
+    print("PaddleFace".center(width))
+    print(table)
+    print("Powered by PaddlePaddle!".rjust(width))
+    print("{}".format("-" * width))
+
+
+def download_with_progressbar(url, save_path):
+    """Download from url with progressbar.
+    """
+    if os.path.isfile(save_path):
+        os.remove(save_path)
+    response = requests.get(url, stream=True)
+    total_size_in_bytes = int(response.headers.get("content-length", 0))
+    block_size = 1024  # 1 Kibibyte
+    progress_bar = tqdm(total=total_size_in_bytes, unit="iB", unit_scale=True)
+    with open(save_path, "wb") as file:
+        for data in response.iter_content(block_size):
+            progress_bar.update(len(data))
+            file.write(data)
+    progress_bar.close()
+    if total_size_in_bytes == 0 or progress_bar.n != total_size_in_bytes or not os.path.isfile(
+            save_path):
+        raise Exception(
+            f"Something went wrong while downloading model/image from {url}")
+
+
+def check_model_file(model):
+    """Check the model files exist and download and untar when no exist.
+    """
+    model_map = {
+        "ArcFace": "arcface_iresnet50_v1.0_infer",
+        "BlazeFace": "blazeface_fpn_ssh_1000e_v1.0_infer",
+        "MobileFace": "mobileface_v1.0_infer"
+    }
+
+    if os.path.isdir(model):
+        model_file_path = os.path.join(model, "inference.pdmodel")
+        params_file_path = os.path.join(model, "inference.pdiparams")
+        if not os.path.exists(model_file_path) or not os.path.exists(
+                params_file_path):
+            raise Exception(
+                f"The specifed model directory error. The drectory must include 'inference.pdmodel' and 'inference.pdiparams'."
+            )
+
+    elif model in model_map:
+        storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
+                                    model)
+        url = BASE_DOWNLOAD_URL.format(model_map[model])
+
+        tar_file_name_list = [
+            "inference.pdiparams", "inference.pdiparams.info",
+            "inference.pdmodel"
+        ]
+        model_file_path = storage_directory("inference.pdmodel")
+        params_file_path = storage_directory("inference.pdiparams")
+        if not os.path.exists(model_file_path) or not os.path.exists(
+                params_file_path):
+            tmp_path = storage_directory(url.split("/")[-1])
+            logging.info(f"Download {url} to {tmp_path}")
+            os.makedirs(storage_directory(), exist_ok=True)
+            download_with_progressbar(url, tmp_path)
+            with tarfile.open(tmp_path, "r") as tarObj:
+                for member in tarObj.getmembers():
+                    filename = None
+                    for tar_file_name in tar_file_name_list:
+                        if tar_file_name in member.name:
+                            filename = tar_file_name
+                    if filename is None:
+                        continue
+                    file = tarObj.extractfile(member)
+                    with open(storage_directory(filename), "wb") as f:
+                        f.write(file.read())
+            os.remove(tmp_path)
+        if not os.path.exists(model_file_path) or not os.path.exists(
+                params_file_path):
+            raise Exception(
+                f"Something went wrong while downloading and unzip the model[{model}] files!"
+            )
+    else:
+        raise Exception(
+            f"The specifed model name error. Support 'BlazeFace' for detection and 'ArcFace' and 'MobileFace' for recognition. And support local directory that include model files ('inference.pdmodel' and 'inference.pdiparams')."
+        )
+
+    return model_file_path, params_file_path
+
+
+def normalize_image(img, scale=None, mean=None, std=None, order='chw'):
+    if isinstance(scale, str):
+        scale = eval(scale)
+    scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+    mean = mean if mean is not None else [0.485, 0.456, 0.406]
+    std = std if std is not None else [0.229, 0.224, 0.225]
+
+    shape = (3, 1, 1) if order == 'chw' else (1, 1, 3)
+    mean = np.array(mean).reshape(shape).astype('float32')
+    std = np.array(std).reshape(shape).astype('float32')
+
+    if isinstance(img, Image.Image):
+        img = np.array(img)
+
+    assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+    return (img.astype('float32') * scale - mean) / std
+
+
+def to_CHW_image(img):
+    if isinstance(img, Image.Image):
+        img = np.array(img)
+    return img.transpose((2, 0, 1))
+
+
+class ColorMap(object):
+    def __init__(self, num):
+        super().__init__()
+        self.get_color_map_list(num)
+        self.color_map = {}
+        self.ptr = 0
+
+    def __getitem__(self, key):
+        return self.color_map[key]
+
+    def update(self, keys):
+        for key in keys:
+            if key not in self.color_map:
+                i = self.ptr % len(self.color_list)
+                self.color_map[key] = self.color_list[i]
+                self.ptr += 1
+
+    def get_color_map_list(self, num_classes):
+        color_map = num_classes * [0, 0, 0]
+        for i in range(0, num_classes):
+            j = 0
+            lab = i
+            while lab:
+                color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+                color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+                color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+                j += 1
+                lab >>= 3
+        self.color_list = [
+            color_map[i:i + 3] for i in range(0, len(color_map), 3)
+        ]
+
+
+class ImageReader(object):
+    def __init__(self, inputs):
+        super().__init__()
+        self.idx = 0
+        if isinstance(inputs, np.ndarray):
+            self.image_list = [inputs]
+        else:
+            imgtype_list = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff'}
+            self.image_list = []
+            if os.path.isfile(inputs):
+                if imghdr.what(inputs) not in imgtype_list:
+                    raise Exception(
+                        f"Error type of input path, only support: {imgtype_list}"
+                    )
+                self.image_list.append(inputs)
+            elif os.path.isdir(inputs):
+                tmp_file_list = os.listdir(inputs)
+                warn_tag = False
+                for file_name in tmp_file_list:
+                    file_path = os.path.join(inputs, file_name)
+                    if not os.path.isfile(file_path):
+                        warn_tag = True
+                        continue
+                    if imghdr.what(file_path) in imgtype_list:
+                        self.image_list.append(file_path)
+                    else:
+                        warn_tag = True
+                if warn_tag:
+                    logging.warning(
+                        f"The directory of input contine directory or not supported file type, only support: {imgtype_list}"
+                    )
+            else:
+                raise Exception(
+                    f"The file of input path not exist! Please check input: {inputs}"
+                )
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        if self.idx >= len(self.image_list):
+            raise StopIteration
+
+        data = self.image_list[self.idx]
+        if isinstance(data, np.ndarray):
+            self.idx += 1
+            return data, "tmp.png"
+        path = data
+        _, file_name = os.path.split(path)
+        img = cv2.imread(path)
+        if img is None:
+            logging.warning(f"Error in reading image: {path}! Ignored.")
+            self.idx += 1
+            return self.__next__()
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        self.idx += 1
+        return img, file_name
+
+    def __len__(self):
+        return len(self.image_list)
+
+
+class VideoReader(object):
+    def __init__(self, inputs):
+        super().__init__()
+        videotype_list = {"mp4"}
+        if os.path.splitext(inputs)[-1][1:] not in videotype_list:
+            raise Exception(
+                f"The input file is not supported, only support: {videotype_list}"
+            )
+        if not os.path.isfile(inputs):
+            raise Exception(
+                f"The file of input path not exist! Please check input: {inputs}"
+            )
+        self.capture = cv2.VideoCapture(inputs)
+        self.file_name = os.path.split(inputs)[-1]
+
+    def get_info(self):
+        info = {}
+        width = int(self.capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(self.capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+        info["file_name"] = self.file_name
+        info["fps"] = 30
+        info["shape"] = (width, height)
+        info["fourcc"] = cv2.VideoWriter_fourcc(* 'mp4v')
+        return info
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        ret, frame = self.capture.read()
+        if not ret:
+            raise StopIteration
+        return frame, self.file_name
+
+
+class ImageWriter(object):
+    def __init__(self, output_dir):
+        super().__init__()
+        if output_dir is None:
+            raise Exception(
+                "Please specify the directory of saving prediction results by --output."
+            )
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        self.output_dir = output_dir
+
+    def write(self, image, file_name):
+        path = os.path.join(self.output_dir, file_name)
+        cv2.imwrite(path, cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
+
+
+class VideoWriter(object):
+    def __init__(self, output_dir, video_info):
+        super().__init__()
+        if output_dir is None:
+            raise Exception(
+                "Please specify the directory of saving prediction results by --output."
+            )
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        output_path = os.path.join(output_dir, video_info["file_name"])
+        fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+        self.writer = cv2.VideoWriter(output_path, video_info["fourcc"],
+                                      video_info["fps"], video_info["shape"])
+
+    def write(self, frame, file_name):
+        self.writer.write(frame)
+
+    def __del__(self):
+        if hasattr(self, "writer"):
+            self.writer.release()
+
+
+class BasePredictor(object):
+    def __init__(self, predictor_config):
+        super().__init__()
+        self.predictor_config = predictor_config
+        self.predictor, self.input_names, self.output_names = self.load_predictor(
+            predictor_config["model_file"], predictor_config["params_file"])
+
+    def load_predictor(self, model_file, params_file):
+        config = Config(model_file, params_file)
+        if self.predictor_config["use_gpu"]:
+            config.enable_use_gpu(200, 0)
+            config.switch_ir_optim(True)
+        else:
+            config.disable_gpu()
+            config.set_cpu_math_library_num_threads(self.predictor_config[
+                "cpu_threads"])
+
+            if self.predictor_config["enable_mkldnn"]:
+                try:
+                    # cache 10 different shapes for mkldnn to avoid memory leak
+                    config.set_mkldnn_cache_capacity(10)
+                    config.enable_mkldnn()
+                except Exception as e:
+                    logging.error(
+                        "The current environment does not support `mkldnn`, so disable mkldnn."
+                    )
+        config.disable_glog_info()
+        config.enable_memory_optim()
+        # use zero copy
+        config.switch_use_feed_fetch_ops(False)
+        predictor = create_predictor(config)
+        input_names = predictor.get_input_names()
+        output_names = predictor.get_output_names()
+        return predictor, input_names, output_names
+
+    def preprocess(self):
+        raise NotImplementedError
+
+    def postprocess(self):
+        raise NotImplementedError
+
+    def predict(self, img):
+        raise NotImplementedError
+
+
+class Detector(BasePredictor):
+    def __init__(self, det_config, predictor_config):
+        super().__init__(predictor_config)
+        self.det_config = det_config
+        self.target_size = self.det_config["target_size"]
+        self.thresh = self.det_config["thresh"]
+
+    def preprocess(self, img):
+        resize_h, resize_w = self.target_size
+        img_shape = img.shape
+        img_scale_x = resize_w / img_shape[1]
+        img_scale_y = resize_h / img_shape[0]
+        img = cv2.resize(
+            img, None, None, fx=img_scale_x, fy=img_scale_y, interpolation=1)
+        img = normalize_image(
+            img,
+            scale=1. / 255.,
+            mean=[0.485, 0.456, 0.406],
+            std=[0.229, 0.224, 0.225],
+            order='hwc')
+        img_info = {}
+        img_info["im_shape"] = np.array(
+            img.shape[:2], dtype=np.float32)[np.newaxis, :]
+        img_info["scale_factor"] = np.array(
+            [img_scale_y, img_scale_x], dtype=np.float32)[np.newaxis, :]
+
+        img = img.transpose((2, 0, 1)).copy()
+        img_info["image"] = img[np.newaxis, :, :, :]
+        return img_info
+
+    def postprocess(self, np_boxes):
+        expect_boxes = (np_boxes[:, 1] > self.thresh) & (np_boxes[:, 0] > -1)
+        return np_boxes[expect_boxes, :]
+
+    def predict(self, img):
+        inputs = self.preprocess(img)
+        for input_name in self.input_names:
+            input_tensor = self.predictor.get_input_handle(input_name)
+            input_tensor.copy_from_cpu(inputs[input_name])
+        self.predictor.run()
+        output_tensor = self.predictor.get_output_handle(self.output_names[0])
+        np_boxes = output_tensor.copy_to_cpu()
+        # boxes_num = self.detector.get_output_handle(self.detector_output_names[1])
+        # np_boxes_num = boxes_num.copy_to_cpu()
+        box_list = self.postprocess(np_boxes)
+        return box_list
+
+
+class Recognizer(BasePredictor):
+    def __init__(self, rec_config, predictor_config):
+        super().__init__(predictor_config)
+        if rec_config["index"] is not None:
+            self.load_index(rec_config["index"])
+        self.rec_config = rec_config
+        self.cdd_num = self.rec_config["cdd_num"]
+        self.thresh = self.rec_config["thresh"]
+        self.max_batch_size = self.rec_config["max_batch_size"]
+
+    def preprocess(self, img, box_list=None):
+        img = normalize_image(
+            img,
+            scale=1. / 255.,
+            mean=[0.5, 0.5, 0.5],
+            std=[0.5, 0.5, 0.5],
+            order='hwc')
+        if box_list is None:
+            height, width = img.shape[:2]
+            box_list = [np.array([0, 0, 0, 0, width, height])]
+        batch = []
+        input_batches = []
+        cnt = 0
+        for idx, box in enumerate(box_list):
+            box[box < 0] = 0
+            xmin, ymin, xmax, ymax = list(map(int, box[2:]))
+            face_img = img[ymin:ymax, xmin:xmax, :]
+            face_img = cv2.resize(face_img, (112, 112)).transpose(
+                (2, 0, 1)).copy()
+            batch.append(face_img)
+            cnt += 1
+            if cnt % self.max_batch_size == 0 or (idx + 1) == len(box_list):
+                input_batches.append(np.array(batch))
+                batch = []
+        return input_batches
+
+    def postprocess(self):
+        pass
+
+    def retrieval(self, np_feature):
+        labels = []
+        for feature in np_feature:
+            similarity = cosine_similarity(self.index_feature,
+                                           feature).squeeze()
+            abs_similarity = np.abs(similarity)
+            candidate_idx = np.argpartition(abs_similarity,
+                                            -self.cdd_num)[-self.cdd_num:]
+            remove_idx = np.where(abs_similarity[candidate_idx] < self.thresh)
+            candidate_idx = np.delete(candidate_idx, remove_idx)
+            candidate_label_list = list(np.array(self.label)[candidate_idx])
+            if len(candidate_label_list) == 0:
+                maxlabel = ""
+            else:
+                maxlabel = max(candidate_label_list,
+                               key=candidate_label_list.count)
+            labels.append(maxlabel)
+        return labels
+
+    def load_index(self, file_path):
+        with open(file_path, "rb") as f:
+            index = pickle.load(f)
+        self.label = index["label"]
+        self.index_feature = np.array(index["feature"]).squeeze()
+
+    def predict(self, img, box_list=None):
+        batch_list = self.preprocess(img, box_list)
+        feature_list = []
+        for batch in batch_list:
+            for input_name in self.input_names:
+                input_tensor = self.predictor.get_input_handle(input_name)
+                input_tensor.copy_from_cpu(batch)
+            self.predictor.run()
+            output_tensor = self.predictor.get_output_handle(self.output_names[
+                0])
+            np_feature = output_tensor.copy_to_cpu()
+            feature_list.append(np_feature)
+        return np.array(feature_list)
+
+
+class InsightFace(object):
+    def __init__(self, args, print_info=True):
+        super().__init__()
+        if print_info:
+            print_config(args)
+
+        self.font_path = os.path.join(
+            os.path.abspath(os.path.dirname(__file__)),
+            "SourceHanSansCN-Medium.otf")
+        self.args = args
+
+        predictor_config = {
+            "use_gpu": args.use_gpu,
+            "enable_mkldnn": args.enable_mkldnn,
+            "cpu_threads": args.cpu_threads
+        }
+        if args.det:
+            model_file_path, params_file_path = check_model_file(
+                args.det_model)
+            det_config = {"thresh": args.det_thresh, "target_size": [640, 640]}
+            predictor_config["model_file"] = model_file_path
+            predictor_config["params_file"] = params_file_path
+            self.det_predictor = Detector(det_config, predictor_config)
+            self.color_map = ColorMap(100)
+
+        if args.rec:
+            model_file_path, params_file_path = check_model_file(
+                args.rec_model)
+            rec_config = {
+                "max_batch_size": args.max_batch_size,
+                "resize": 112,
+                "thresh": args.rec_thresh,
+                "index": args.index,
+                "cdd_num": args.cdd_num
+            }
+            predictor_config["model_file"] = model_file_path
+            predictor_config["params_file"] = params_file_path
+            self.rec_predictor = Recognizer(rec_config, predictor_config)
+
+    def preprocess(self, img):
+        img = img.astype(np.float32, copy=False)
+        return img
+
+    def draw(self, img, box_list, labels):
+        self.color_map.update(labels)
+        im = Image.fromarray(img)
+        draw = ImageDraw.Draw(im)
+
+        for i, dt in enumerate(box_list):
+            bbox, score = dt[2:], dt[1]
+            label = labels[i]
+            color = tuple(self.color_map[label])
+
+            xmin, ymin, xmax, ymax = bbox
+
+            font_size = max(int((xmax - xmin) // 6), 10)
+            font = ImageFont.truetype(self.font_path, font_size)
+
+            text = "{} {:.4f}".format(label, score)
+            th = sum(font.getmetrics())
+            tw = font.getsize(text)[0]
+            start_y = max(0, ymin - th)
+
+            draw.rectangle(
+                [(xmin, start_y), (xmin + tw + 1, start_y + th)], fill=color)
+            draw.text(
+                (xmin + 1, start_y),
+                text,
+                fill=(255, 255, 255),
+                font=font,
+                anchor="la")
+            draw.rectangle(
+                [(xmin, ymin), (xmax, ymax)], width=2, outline=color)
+        return np.array(im)
+
+    def predict_np_img(self, img):
+        input_img = self.preprocess(img)
+        box_list = None
+        np_feature = None
+        if hasattr(self, "det_predictor"):
+            box_list = self.det_predictor.predict(input_img)
+        if hasattr(self, "rec_predictor"):
+            np_feature = self.rec_predictor.predict(input_img, box_list)
+        return box_list, np_feature
+
+    def init_reader_writer(self, input_data):
+        if isinstance(input_data, np.ndarray):
+            self.input_reader = ImageReader(input_data)
+            if hasattr(self, "det_predictor"):
+                self.output_writer = ImageWriter(self.args.output)
+        elif isinstance(input_data, str):
+            if input_data.endswith('mp4'):
+                self.input_reader = VideoReader(input_data)
+                info = self.input_reader.get_info()
+                self.output_writer = VideoWriter(self.args.output, info)
+            else:
+                self.input_reader = ImageReader(input_data)
+                if hasattr(self, "det_predictor"):
+                    self.output_writer = ImageWriter(self.args.output)
+        else:
+            raise Exception(
+                f"The input data error. Only support path of image or video(.mp4) and dirctory that include images."
+            )
+
+    def predict(self, input_data, print_info=False):
+        """Predict input_data.
+
+        Args:
+            input_data (str | NumPy.array): The path of image, or the derectory including images, or the image data in NumPy.array format.
+            print_info (bool, optional): Wheather to print the prediction results. Defaults to False.
+
+        Yields:
+            dict: {
+                "box_list": The prediction results of detection.
+                "features": The output of recognition.
+                "labels": The results of retrieval.
+                }
+        """
+        self.init_reader_writer(input_data)
+        for img, file_name in self.input_reader:
+            if img is None:
+                logging.warning(f"Error in reading img {file_name}! Ignored.")
+                continue
+            box_list, np_feature = self.predict_np_img(img)
+            if np_feature is not None:
+                labels = self.rec_predictor.retrieval(np_feature)
+            else:
+                labels = ["face"] * len(box_list)
+            if box_list is not None:
+                result = self.draw(img, box_list, labels=labels)
+                self.output_writer.write(result, file_name)
+            if print_info:
+                logging.info(f"File: {file_name}, predict label(s): {labels}")
+            yield {
+                "box_list": box_list,
+                "features": np_feature,
+                "labels": labels
+            }
+        logging.info(f"Predict complete!")
+
+
+# for CLI
+def main(args=None):
+    logging.basicConfig(level=logging.INFO)
+
+    args = parser().parse_args()
+    predictor = InsightFace(args)
+    res = predictor.predict(args.input, print_info=True)
+    for _ in res:
+        pass
+
+
+if __name__ == "__main__":
+    main()
--- a/recognition/arcface_paddle/test_time.py
+++ b/recognition/arcface_paddle/test_time.py
@@ -0,0 +1,115 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import numpy as np
+import time
+import argparse
+from paddle.inference import Config
+from paddle.inference import create_predictor
+
+
+def parse_args():
+    def str2bool(v):
+        return v.lower() in ("true", "t", "1")
+
+    # general params
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--use_gpu", type=str2bool, default=False)
+    parser.add_argument("--gpu_mem", type=int, default=1000)
+
+    # params for predict
+    parser.add_argument("--model_file", type=str)
+    parser.add_argument("--params_file", type=str)
+    parser.add_argument("-b", "--batch_size", type=int, default=1)
+    parser.add_argument("--ir_optim", type=str2bool, default=True)
+    parser.add_argument("--use_mkldnn", type=str2bool, default=True)
+    parser.add_argument("--cpu_num_threads", type=int, default=10)
+    parser.add_argument("--model", type=str)
+
+    return parser.parse_args()
+
+
+def create_paddle_predictor(args):
+    config = Config(args.model_file, args.params_file)
+
+    if args.use_gpu:
+        config.enable_use_gpu(args.gpu_mem, 0)
+    else:
+        config.disable_gpu()
+
+    if args.use_mkldnn:
+        config.enable_mkldnn()
+        config.set_cpu_math_library_num_threads(args.cpu_num_threads)
+        config.set_mkldnn_cache_capacity(100)
+
+    config.disable_glog_info()
+    config.switch_ir_optim(args.ir_optim)  # default true
+
+    config.enable_memory_optim()
+    # use zero copy
+    config.switch_use_feed_fetch_ops(False)
+    predictor = create_predictor(config)
+
+    return predictor
+
+
+class Predictor(object):
+    def __init__(self, args):
+
+        self.args = args
+
+        self.paddle_predictor = create_paddle_predictor(args)
+        input_names = self.paddle_predictor.get_input_names()
+        self.input_tensor = self.paddle_predictor.get_input_handle(input_names[
+            0])
+
+        output_names = self.paddle_predictor.get_output_names()
+        self.output_tensor = self.paddle_predictor.get_output_handle(
+            output_names[0])
+
+    def predict(self, batch_input):
+        self.input_tensor.copy_from_cpu(batch_input)
+        self.paddle_predictor.run()
+        batch_output = self.output_tensor.copy_to_cpu()
+        return batch_output
+
+    def benchmark_predict(self):
+        test_num = 500
+        test_time = 0.0
+        for i in range(0, test_num + 10):
+            inputs = np.random.rand(args.batch_size, 3, 112,
+                                    112).astype(np.float32)
+            start_time = time.time()
+            batch_output = self.predict(inputs).flatten()
+            if i >= 10:
+                test_time += time.time() - start_time
+            # time.sleep(0.01)  # sleep for T4 GPU
+
+        print("{0}\tbatch size: {1}\ttime(ms): {2}".format(
+            args.model, args.batch_size, 1000 * test_time / test_num))
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    assert os.path.exists(
+        args.model_file), "The path of 'model_file' does not exist: {}".format(
+            args.model_file)
+    assert os.path.exists(
+        args.params_file
+    ), "The path of 'params_file' does not exist: {}".format(args.params_file)
+
+    predictor = Predictor(args)
+    assert args.model is not None
+    predictor.benchmark_predict()
--- a/recognition/arcface_paddle/train.py
+++ b/recognition/arcface_paddle/train.py
@@ -0,0 +1,170 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataloader import CommonDataset
+
+from paddle.io import DataLoader
+from config import config as cfg
+from partial_fc import PartialFC
+from utils.utils_callbacks import CallBackVerification, CallBackLogging, CallBackModelCheckpoint
+from utils.utils_logging import AverageMeter
+import paddle.nn.functional as F
+from paddle.nn import ClipGradByNorm
+from visualdl import LogWriter
+import paddle
+import backbones
+import argparse
+import losses
+import time
+import os
+import sys
+
+
+def main(args):
+    world_size = int(1.0)
+    rank = int(0.0)
+
+    if not os.path.exists(args.output):
+        os.makedirs(args.output)
+    else:
+        time.sleep(2)
+
+    writer = LogWriter(logdir=args.logdir)
+
+    # if you 
+    trainset = CommonDataset(root_dir=cfg.data_dir, label_file=cfg.file_list)
+    # trainset = BinDataset(root_dir=cfg.data_dir)
+    train_loader = DataLoader(
+        dataset=trainset,
+        batch_size=args.batch_size,
+        shuffle=True,
+        drop_last=True,
+        num_workers=0)
+
+    backbone = eval("backbones.{}".format(args.network))()
+    backbone.train()
+
+    clip_by_norm = ClipGradByNorm(5.0)
+    margin_softmax = eval("losses.{}".format(args.loss))()
+
+    module_partial_fc = PartialFC(
+        rank=0,
+        world_size=1,
+        resume=0,
+        batch_size=args.batch_size,
+        margin_softmax=margin_softmax,
+        num_classes=cfg.num_classes,
+        sample_rate=cfg.sample_rate,
+        embedding_size=args.embedding_size,
+        prefix=args.output)
+
+    scheduler_backbone_decay = paddle.optimizer.lr.LambdaDecay(
+        learning_rate=args.lr, lr_lambda=cfg.lr_func, verbose=True)
+    scheduler_backbone = paddle.optimizer.lr.LinearWarmup(
+        learning_rate=scheduler_backbone_decay,
+        warmup_steps=cfg.warmup_epoch,
+        start_lr=0,
+        end_lr=args.lr / 512 * args.batch_size,
+        verbose=True)
+    opt_backbone = paddle.optimizer.Momentum(
+        parameters=backbone.parameters(),
+        learning_rate=scheduler_backbone,
+        momentum=0.9,
+        weight_decay=args.weight_decay,
+        grad_clip=clip_by_norm)
+
+    scheduler_pfc_decay = paddle.optimizer.lr.LambdaDecay(
+        learning_rate=args.lr, lr_lambda=cfg.lr_func, verbose=True)
+    scheduler_pfc = paddle.optimizer.lr.LinearWarmup(
+        learning_rate=scheduler_pfc_decay,
+        warmup_steps=cfg.warmup_epoch,
+        start_lr=0,
+        end_lr=args.lr / 512 * args.batch_size,
+        verbose=True)
+    opt_pfc = paddle.optimizer.Momentum(
+        parameters=module_partial_fc.parameters(),
+        learning_rate=scheduler_pfc,
+        momentum=0.9,
+        weight_decay=args.weight_decay,
+        grad_clip=clip_by_norm)
+
+    start_epoch = 0
+    total_step = int(
+        len(trainset) / args.batch_size / world_size * cfg.num_epoch)
+    if rank == 0:
+        print("Total Step is: %d" % total_step)
+
+    callback_verification = CallBackVerification(2000, rank, cfg.val_targets,
+                                                 cfg.data_dir)
+    callback_logging = CallBackLogging(10, rank, total_step, args.batch_size,
+                                       world_size, writer)
+    callback_checkpoint = CallBackModelCheckpoint(rank, args.output,
+                                                  args.network)
+
+    loss = AverageMeter()
+    global_step = 0
+    for epoch in range(start_epoch, cfg.num_epoch):
+        for step, (img, label) in enumerate(train_loader):
+            label = label.flatten()
+            global_step += 1
+            sys.stdout.flush()
+            features = F.normalize(backbone(img))
+            x_grad, loss_v = module_partial_fc.forward_backward(
+                label, features, opt_pfc)
+            sys.stdout.flush()
+            (features.multiply(x_grad)).backward()
+            sys.stdout.flush()
+            opt_backbone.step()
+            opt_pfc.step()
+            module_partial_fc.update()
+            opt_backbone.clear_gradients()
+            opt_pfc.clear_gradients()
+            sys.stdout.flush()
+
+            lr_backbone_value = opt_backbone._global_learning_rate().numpy()[0]
+            lr_pfc_value = opt_backbone._global_learning_rate().numpy()[0]
+
+            loss.update(loss_v, 1)
+            callback_logging(global_step, loss, epoch, lr_backbone_value,
+                             lr_pfc_value)
+            sys.stdout.flush()
+            callback_verification(global_step, backbone)
+        callback_checkpoint(global_step, backbone, module_partial_fc)
+        scheduler_backbone.step()
+        scheduler_pfc.step()
+    writer.close()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='PyTorch ArcFace Training')
+    parser.add_argument(
+        '--network',
+        type=str,
+        default='MobileFaceNet_128',
+        help='backbone network')
+    parser.add_argument(
+        '--loss', type=str, default='ArcFace', help='loss function')
+    parser.add_argument('--lr', type=float, default=0.1, help='learning rate')
+    parser.add_argument(
+        '--batch_size', type=int, default=512, help='batch size')
+    parser.add_argument(
+        '--weight_decay', type=float, default=2e-4, help='weight decay')
+    parser.add_argument(
+        '--embedding_size', type=int, default=128, help='embedding size')
+    parser.add_argument('--logdir', type=str, default='./log', help='log dir')
+    parser.add_argument(
+        '--output', type=str, default='emore_arcface', help='output dir')
+    parser.add_argument('--resume', type=int, default=0, help='model resuming')
+    args = parser.parse_args()
+    main(args)
--- a/recognition/arcface_paddle/utils/init.py
+++ b/recognition/arcface_paddle/utils/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/recognition/arcface_paddle/utils/utils_callbacks.py
+++ b/recognition/arcface_paddle/utils/utils_callbacks.py
@@ -0,0 +1,144 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import List
+import paddle
+import logging
+from eval import verification
+from utils.utils_logging import AverageMeter
+from partial_fc import PartialFC
+import time
+
+
+class CallBackVerification(object):
+    def __init__(self,
+                 frequent,
+                 rank,
+                 val_targets,
+                 rec_prefix,
+                 image_size=(112, 112)):
+        self.frequent: int = frequent
+        self.rank: int = rank
+        self.highest_acc: float = 0.0
+        self.highest_acc_list: List[float] = [0.0] * len(val_targets)
+        self.ver_list: List[object] = []
+        self.ver_name_list: List[str] = []
+        if self.rank == 0:
+            self.init_dataset(
+                val_targets=val_targets,
+                data_dir=rec_prefix,
+                image_size=image_size)
+
+    def ver_test(self,
+                 backbone: paddle.nn.Layer,
+                 global_step: int,
+                 batch_size: int):
+        results = []
+        for i in range(len(self.ver_list)):
+            acc1, std1, acc2, std2, xnorm, embeddings_list = verification.test(
+                self.ver_list[i], backbone, batch_size, 10)
+            logging.info('[%s][%d]XNorm: %f' %
+                         (self.ver_name_list[i], global_step, xnorm))
+            logging.info('[%s][%d]Accuracy-Flip: %1.5f+-%1.5f' %
+                         (self.ver_name_list[i], global_step, acc2, std2))
+            if acc2 > self.highest_acc_list[i]:
+                self.highest_acc_list[i] = acc2
+            logging.info('[%s][%d]Accuracy-Highest: %1.5f' % (
+                self.ver_name_list[i], global_step, self.highest_acc_list[i]))
+            results.append(acc2)
+
+    def init_dataset(self, val_targets, data_dir, image_size):
+        for name in val_targets:
+            path = os.path.join(data_dir, name + ".bin")
+            if os.path.exists(path):
+                data_set = verification.load_bin(path, image_size)
+                self.ver_list.append(data_set)
+                self.ver_name_list.append(name)
+
+    def __call__(self, num_update, backbone: paddle.nn.Layer, batch_size=10):
+        if self.rank == 0 and num_update > 0 and num_update % self.frequent == 0:
+            backbone.eval()
+            self.ver_test(backbone, num_update, batch_size)
+            backbone.train()
+
+
+class CallBackLogging(object):
+    def __init__(self,
+                 frequent,
+                 rank,
+                 total_step,
+                 batch_size,
+                 world_size,
+                 writer=None):
+        self.frequent: int = frequent
+        self.rank: int = rank
+        self.time_start = time.time()
+        self.total_step: int = total_step
+        self.batch_size: int = batch_size
+        self.world_size: int = world_size
+        self.writer = writer
+
+        self.init = False
+        self.tic = 0
+
+    def __call__(self,
+                 global_step,
+                 loss: AverageMeter,
+                 epoch: int,
+                 lr_backbone_value,
+                 lr_pfc_value):
+        if self.rank is 0 and global_step > 0 and global_step % self.frequent == 0:
+            if self.init:
+                try:
+                    speed: float = self.frequent * self.batch_size / (
+                        time.time() - self.tic)
+                    speed_total = speed * self.world_size
+                except ZeroDivisionError:
+                    speed_total = float('inf')
+
+                time_now = (time.time() - self.time_start) / 3600
+                time_total = time_now / ((global_step + 1) / self.total_step)
+                time_for_end = time_total - time_now
+                if self.writer is not None:
+                    self.writer.add_scalar('time_for_end', time_for_end,
+                                           global_step)
+                    self.writer.add_scalar('loss', loss.avg, global_step)
+                msg = "Speed %.2f samples/sec   Loss %.4f   Epoch: %d   Global Step: %d   Required: %1.f hours, lr_backbone_value: %f, lr_pfc_value: %f" % (
+                    speed_total, loss.avg, epoch, global_step, time_for_end,
+                    lr_backbone_value, lr_pfc_value)
+                logging.info(msg)
+                loss.reset()
+                self.tic = time.time()
+            else:
+                self.init = True
+                self.tic = time.time()
+
+
+class CallBackModelCheckpoint(object):
+    def __init__(self, rank, output="./", model_name="mobilefacenet"):
+        self.rank: int = rank
+        self.output: str = output
+        self.model_name: str = model_name
+
+    def __call__(self,
+                 global_step,
+                 backbone: paddle.nn.Layer,
+                 partial_fc: PartialFC=None):
+        if global_step > 100 and self.rank is 0:
+            paddle.save(backbone.state_dict(),
+                        os.path.join(self.output,
+                                     self.model_name + ".pdparams"))
+        if global_step > 100 and partial_fc is not None:
+            partial_fc.save_params()
--- a/recognition/arcface_paddle/utils/utils_logging.py
+++ b/recognition/arcface_paddle/utils/utils_logging.py
@@ -0,0 +1,55 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import sys
+
+
+class AverageMeter(object):
+    """Computes and stores the average and current value
+    """
+
+    def __init__(self):
+        self.val = None
+        self.avg = None
+        self.sum = None
+        self.count = None
+        self.reset()
+
+    def reset(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count
+
+
+def init_logging(log_root, rank, models_root):
+    if rank is 0:
+        log_root.setLevel(logging.INFO)
+        formatter = logging.Formatter("Training: %(asctime)s-%(message)s")
+        handler_file = logging.FileHandler(
+            os.path.join(models_root, "training.log"))
+        handler_stream = logging.StreamHandler(sys.stdout)
+        handler_file.setFormatter(formatter)
+        handler_stream.setFormatter(formatter)
+        log_root.addHandler(handler_file)
+        log_root.addHandler(handler_stream)
+        log_root.info('rank_id: %d' % rank)
--- a/recognition/arcface_paddle/valid.py
+++ b/recognition/arcface_paddle/valid.py
@@ -0,0 +1,53 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import paddle
+import argparse
+import backbones
+from utils.utils_callbacks import CallBackVerification
+
+
+def main(args):
+    '''
+    For the CallBackVerification class, you can place you val_dataset,
+    like ["lfw"], also you can use ["lfw", "cplfw", "calfw"].
+    
+    For the callback_verification function, the batch_size must be divisible by 12000!
+    Cause the length of dataset is 12000.
+    '''
+    backbone = eval("backbones.{}".format(args.network))()
+    model_params = args.network + '.pdparams'
+    print('INFO:' + args.network + ' chose! ' + model_params + ' loaded!')
+    state_dict = paddle.load(os.path.join(args.checkpoint, model_params))
+    backbone.set_state_dict(state_dict)
+    callback_verification = CallBackVerification(
+        1, 0, ["lfw", "cfp_fp", "agedb_30"], "MS1M_v2")
+    callback_verification(1, backbone, batch_size=50)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Paddle ArcFace Testing')
+    parser.add_argument(
+        '--network',
+        type=str,
+        default='MobileFaceNet_128',
+        help='backbone network')
+    parser.add_argument(
+        '--checkpoint',
+        type=str,
+        default='emore_arcface',
+        help='checkpoint dir')
+    args = parser.parse_args()
+    main(args)
				`@@ -0,0 +1 @@`
				`python export_inference_model.py --network MobileFaceNet_128 --output ./inference_model/ --pretrained_model ./emore_arcface/MobileFaceNet_128.pdparams`