Files
insightface/recognition/arcface_torch/run.sh
anxiang 0cc88e2a24 Simplify running commands(single node and multi nodes)
1. Update training logs(glint360k)
2. Update install docs
3. Fix distributed training
2021-03-21 15:07:33 +08:00

3 lines
241 B
Bash

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py
ps -ef | grep "train" | grep -v grep | awk '{print "kill -9 "$2}' | sh