insightface

ZF/insightface

Fork 0

mirror of https://github.com/deepinsight/insightface.git synced 2025-12-30 08:02:27 +00:00

Files

History

dream 00220d85af Update requirements.txt

2022-04-23 16:20:57 +08:00

evaluation

update eval IJBC.

2020-12-26 00:36:54 -08:00

hosts

Update Folder.

2020-10-19 01:07:30 +08:00

setup-utils

Update requirements.txt

2022-04-23 16:20:57 +08:00

symbol

refine repo structure

2020-11-06 13:59:21 +08:00

callbacks.py

Update configs

2021-01-02 19:19:50 +08:00

config.sh

Update configs

2021-01-02 19:19:50 +08:00

default.py

Update configs

2021-01-02 19:19:50 +08:00

image_iter.py

refine repo structure

2020-11-06 13:59:21 +08:00

memory_bank.py

refine repo structure

2020-11-06 13:59:21 +08:00

memory_module.py

updated paper link

2022-03-30 19:03:03 +08:00

memory_samplers.py

refine repo structure

2020-11-06 13:59:21 +08:00

memory_scheduler.py

refine repo structure

2020-11-06 13:59:21 +08:00

memory_softmax.py

refine repo structure

2020-11-06 13:59:21 +08:00

optimizer.py

refine repo structure

2020-11-06 13:59:21 +08:00

README_CN.md

updated paper link

2022-03-30 19:03:03 +08:00

README.md

Update README.md

2021-01-13 19:38:33 +08:00

run.sh

Update configs

2021-01-02 19:19:50 +08:00

train_memory.py

updated paper link

2022-03-30 19:03:03 +08:00

README.md

Training

1.Requirements

python==3.6
cuda==10.1
cudnn==765
mxnet-cu101==1.6.0.post0
pip install easydict mxboard opencv-python tqdm
nccl
openmpi==4.0.0
horovod==0.19.2

2.Run with horovodrun

Typically one GPU will be allocated per process, so if a server has 8 GPUs, you will run 8 processes. In horovodrun, the number of processes is specified with the -np flag.

To run on a machine with 8 GPUs:

horovodrun -np 8 -H localhost:8 bash config.sh

To run on two machine with 16 GPUs:

horovodrun -np 16 -H ip1:8,ip2:8 bash config.sh

3.Run with mpi

bash run.sh

Failures due to SSH issues

The host where horovodrun is executed must be able to SSH to all other hosts without any prompts.

Troubleshooting

1. Horovod installed successfully?

Run horovodrun --check to check the installation of horovod.

# Horovod v0.19.2:
# 
# Available Frameworks:
#     [ ] TensorFlow
#     [X] PyTorch
#     [X] MXNet
# 
# Available Controllers:
#     [X] MPI
#     [X] Gloo
# 
# Available Tensor Operations:
#     [X] NCCL
#     [ ] DDL
#     [ ] CCL
#     [X] MPI
#     [X] Gloo

2. Mxnet Version!

Some versions of mxnet with horovod have bug.
It is recommended to try version 1.5 or 1.6.

The community has found that mxnet1.5.1 cannot install horovod.

3. Check CUDA version!

# Make sure your cuda version is same as mxnet, such as mxnet-cu101 (CUDA 10.1)

/usr/local/cuda/bin/nvcc -V
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2019 NVIDIA Corporation
# Built on Wed_Apr_24_19:10:27_PDT_2019
# Cuda compilation tools, release 10.1, V10.1.168

4. Block IO

You can turn on the debug mode to check whether your slow training speed is the cause of IO.

5. Training Speed.

If you find that your training speed is the io bottleneck, you can mount dataset to RAM, using the following command.

# If your RAM has 256G
sudo mkdir /train_tmp
mount -t tmpfs -o size=140G  tmpfs /train_tmp

Our Method

1. The classification layer model is parallel

Class centers are evenly distributed across different GPUs. It only takes three communications to complete loss-free Softmax calculations.

1. Synchronization of features

Make sure each GPU has all the GPU features on it, as is shown in AllGather(x_i).

2. Synchronization of denominator of the softmax function

We can first calculate the local sum of each GPU, and then compute the global sum through communication, as is shown in Allreduce(sum(exp(logits_i)))

3. Synchronization the gradients of feature

The gradient of logits can be calculated independently, so is the gradient of the feature. finally, we collect all the gradients on GPU and send them back to backbone, as is shown in Allreduce(deta(X))

2. Softmax approximate

Just a subset of class centers can approximate softmax's computation(positive class centers must in these class centers), this can be done with the following code:

centers_p = func_positive(label)                 # select the positive class centers by the label of the sample
centers_n = func_negative(centers_p)             # negative class centers are randomly sampled after excluding positive classes
centers_final = concat(centers_n, centers_p)     # class centers that participate in softmax calculations