Files
insightface/benchmarks/train/nvidia_a100.md
2022-01-21 08:17:30 +08:00

54 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Training performance report on NVIDIA A100
[NVIDIA A100 Tensor Core GPU](https://www.nvidia.com/en-us/data-center/a100/)
## Test Server Spec
| Key | Value |
| ------------ | ------------------------------------------------ |
| System | ServMax G408-X2 Rackmountable Server |
| CPU | 2 x Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz |
| Memory | 384GB, 12 x Samsung 32GB DDR4-2933 |
| GPU | 8 x NVIDIA A100 80GB |
| Cooling | 2x Customized GPU Kit for GPU support FAN-1909L2 |
| Hard Drive | Intel SSD S4500 1.9TB/SATA/TLC/2.5" |
| OS | Ubuntu 16.04.7 LTS |
| Installation | CUDA 11.1, cuDNN 8.0.5 |
| Installation | Python 3.7.10 |
| Installation | PyTorch 1.9.0 (conda) |
This server is donated by [AMAX](https://www.amaxchina.com/), many thanks!
## Experiments on arcface_torch
We report training speed in following table, please also note that:
1. The training dataset is in mxnet record format and located on SSD hard drive.
2. Embedding-size are all set to 512.
3. We use large datasets with about 618K/2M identities to simulate real cases.
4. We test the 10K batch-size on real dataset to take the full advantage of 80GB memory.
5. We also test on huge synthetic datasets which include 50M~80M classes.
| Dataset | Classes | Backbone | Batch-size | PFC | FP16 | TF32 | Samples/sec | GPU Mem(GB) |
| ----------- | ------- | ----------- | ---------- | ---- | ---- | ---- | ----------- | ----------- |
| WebFace600K | 618K | IResNet-50 | 1024 | × | × | × | ~3670 | ~18.2 |
| WebFace600K | 618K | IResNet-50 | 1024 | × | × | √ | ~4760 | ~15.0 |
| WebFace600K | 618K | IResNet-50 | 1024 | × | √ | × | ~5170 | ~10.1 |
| WebFace600K | 618K | IResNet-50 | 1024 | × | √ | √ | ~5400 | ~10.1 |
| WebFace600K | 618K | IResNet-50 | 2048 | × | √ | √ | ~7780 | ~16.4 |
| WebFace600K | 618K | IResNet-50 | 10240 | × | √ | √ | ~9400 | ~66.7 |
| WebFace600K | 618K | IResNet-100 | 1024 | × | √ | √ | ~3700 | ~13.1 |
| WebFace600K | 618K | IResNet-180 | 1024 | × | √ | √ | ~2380 | ~17.5 |
| WebFace2M | 2M | IResNet-100 | 1024 | × | √ | √ | ~3480 | ~20.5 |
| WebFace2M | 2M | IResNet-180 | 1024 | × | √ | √ | ~2350 | ~25.0 |
| WebFace2M | 2M | IResNet-300 | 1024 | × | √ | √ | ~1541 | ~32.6 |
| Virtual | 50M | IResNet-50 | 1024 | 0.1 | √ | √ | ~2700 | ~54.1 |
| Virtual | 70M | IResNet-50 | 1024 | 0.1 | √ | √ | ~2170 | ~73.7 |
| Virtual | 80M | IResNet-50 | 1024 | 0.1 | √ | √ | ~1080 | ~79.6 |