mirror of
https://github.com/deepinsight/insightface.git
synced 2025-12-30 08:02:27 +00:00
3.1 KiB
3.1 KiB
Training performance report on NVIDIA A100
Test Server Spec
| Key | Value |
|---|---|
| System | ServMax G408-X2 Rackmountable Server |
| CPU | 2 x Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz |
| Memory | 384GB, 12 x Samsung 32GB DDR4-2933 |
| GPU | 8 x NVIDIA A100 80GB |
| Cooling | 2x Customized GPU Kit for GPU support FAN-1909L2 |
| Hard Drive | Intel SSD S4500 1.9TB/SATA/TLC/2.5" |
| OS | Ubuntu 16.04.7 LTS |
| Installation | CUDA 11.1, cuDNN 8.0.5 |
| Installation | Python 3.7.10 |
| Installation | PyTorch 1.9.0 (conda) |
This server is donated by AMAX, many thanks!
Experiments on arcface_torch
We report training speed in following table, please also note that:
- The training dataset is in mxnet record format and located on SSD hard drive.
- Embedding-size are all set to 512.
- We use large datasets with about 618K/2M identities to simulate real cases.
- We test the 10K batch-size on real dataset to take the full advantage of 80GB memory.
- We also test on huge synthetic datasets which include 50M~80M classes.
| Dataset | Classes | Backbone | Batch-size | PFC | FP16 | TF32 | Samples/sec | GPU Mem(GB) |
|---|---|---|---|---|---|---|---|---|
| WebFace600K | 618K | IResNet-50 | 1024 | × | × | × | ~3670 | ~18.2 |
| WebFace600K | 618K | IResNet-50 | 1024 | × | × | √ | ~4760 | ~15.0 |
| WebFace600K | 618K | IResNet-50 | 1024 | × | √ | × | ~5170 | ~10.1 |
| WebFace600K | 618K | IResNet-50 | 1024 | × | √ | √ | ~5400 | ~10.1 |
| WebFace600K | 618K | IResNet-50 | 2048 | × | √ | √ | ~7780 | ~16.4 |
| WebFace600K | 618K | IResNet-50 | 10240 | × | √ | √ | ~9400 | ~66.7 |
| WebFace600K | 618K | IResNet-100 | 1024 | × | √ | √ | ~3700 | ~13.1 |
| WebFace600K | 618K | IResNet-180 | 1024 | × | √ | √ | ~2380 | ~17.5 |
| WebFace2M | 2M | IResNet-100 | 1024 | × | √ | √ | ~3480 | ~20.5 |
| WebFace2M | 2M | IResNet-180 | 1024 | × | √ | √ | ~2350 | ~25.0 |
| WebFace2M | 2M | IResNet-300 | 1024 | × | √ | √ | ~1541 | ~32.6 |
| Virtual | 50M | IResNet-50 | 1024 | 0.1 | √ | √ | ~2700 | ~54.1 |
| Virtual | 70M | IResNet-50 | 1024 | 0.1 | √ | √ | ~2170 | ~73.7 |
| Virtual | 80M | IResNet-50 | 1024 | 0.1 | √ | √ | ~1080 | ~79.6 |