diff --git a/benchmarks/train/nvidia_a10.md b/benchmarks/train/nvidia_a10.md
new file mode 100644
index 0000000..f18620b
--- /dev/null
+++ b/benchmarks/train/nvidia_a10.md
@@ -0,0 +1,48 @@
+# Training performance report on NVIDIA A10
+
+[NVIDIA A10 Tensor Core GPU](https://www.nvidia.com/en-us/data-center/products/a10-gpu/) 
+
+We can use A10 to train deep learning models by its FP16 and TF32 supports.
+
+
+
+## Test Server Spec
+
+| Key          | Value                                            |
+| ------------ | ------------------------------------------------ |
+| System       | ServMax G408-X2 Rackmountable Server             |
+| CPU          | 2 x Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz    |
+| Memory       | 384GB, 12 x Samsung 32GB DDR4-2933               |
+| GPU          | 8 x NVIDIA A10 22GB                              |
+| Cooling      | 2x Customized GPU Kit for GPU support FAN-1909L2 |
+| Hard Drive   | Intel SSD S4500 1.9TB/SATA/TLC/2.5"              |
+| OS           | Ubuntu 16.04.7 LTS                               |
+| Installation | CUDA 11.1, cuDNN 8.0.5                           |
+| Installation | Python 3.7.10                                    |
+| Installation | PyTorch 1.9.0 (conda)                            |
+
+This server is donated by [AMAX](https://www.amaxchina.com/), many thanks!
+
+
+
+## Experiments on arcface_torch
+
+We report training speed in following table, please also note that:
+
+1. The training dataset is in mxnet record format and located on SSD hard drive.
+
+2. Embedding-size are all set to 512.
+
+3. We use a large dataset which contains about 618K identities to simulate real cases.
+
+| Dataset     | Classes | Backbone    | Batch-size | FP16 | TF32 | Samples/sec |
+| ----------- | ------- | ----------- | ---------- | ---- | ---- | ----------- |
+| WebFace600K | 618K    | IResNet-50  | 1024       | ×    | ×    | ~2040       |
+| WebFace600K | 618K    | IResNet-50  | 1024       | ×    | √    | ~2255       |
+| WebFace600K | 618K    | IResNet-50  | 1024       | √    | ×    | ~3300       |
+| WebFace600K | 618K    | IResNet-50  | 1024       | √    | √    | ~3360       |
+| WebFace600K | 618K    | IResNet-50  | 2048       | √    | √    | ~3940       |
+| WebFace600K | 618K    | IResNet-100 | 1024       | √    | √    | ~2210       |
+| WebFace600K | 618K    | IResNet-180 | 1024       | √    | √    | ~1410       |
+
+