Sync from bytedesk-private: update

2026-05-22 05:07:47 +00:00 · 2024-12-14 10:43:18 +08:00
parent 476eebb101
commit 5e082909e4
3421 changed files with 812709 additions and 0 deletions
--- a/modules/python/vendors/FunASR/runtime/triton_gpu/README.md
+++ b/modules/python/vendors/FunASR/runtime/triton_gpu/README.md
@@ -0,0 +1,81 @@
+## Triton Inference Serving Best Practice for SenseVoice
+
+### Quick Start
+Directly launch the service using docker compose.
+```sh
+docker compose up --build
+```
+
+### Build Image
+Build the docker image from scratch. 
+```sh
+# build from scratch, cd to the parent dir of Dockerfile.server
+docker build . -f Dockerfile/Dockerfile.sensevoice -t soar97/triton-sensevoice:24.05
+```
+
+### Create Docker Container
+```sh
+your_mount_dir=/mnt:/mnt
+docker run -it --name "sensevoice-server" --gpus all --net host -v $your_mount_dir --shm-size=2g soar97/triton-sensevoice:24.05
+```
+
+### Export SenseVoice Model to Onnx
+Please follow the official guide of FunASR to export the sensevoice onnx file. Also, you need to download the tokenizer file by yourself. 
+### Launch Server
+Log of directory tree:
+```sh
+model_repo_sense_voice_small
+|-- encoder
+|   |-- 1
+|   |   `-- model.onnx -> /your/path/model.onnx
+|   `-- config.pbtxt
+|-- feature_extractor
+|   |-- 1
+|   |   `-- model.py
+|   |-- am.mvn
+|   |-- config.pbtxt
+|   `-- config.yaml
+|-- scoring
+|   |-- 1
+|   |   `-- model.py
+|   |-- chn_jpn_yue_eng_ko_spectok.bpe.model -> /your/path/chn_jpn_yue_eng_ko_spectok.bpe.model
+|   `-- config.pbtxt
+`-- sensevoice
+    |-- 1
+    `-- config.pbtxt
+
+8 directories, 10 files
+
+
+# launch the service 
+tritonserver --model-repository /workspace/model_repo_sensevoice_small \
+             --pinned-memory-pool-byte-size=512000000 \
+             --cuda-memory-pool-byte-size=0:1024000000
+```
+
+
+### Benchmark using Dataset
+```sh
+git clone https://github.com/yuekaizhang/Triton-ASR-Client.git
+cd Triton-ASR-Client
+num_task=32
+python3 client.py \
+    --server-addr localhost \
+    --server-port 10086 \
+    --model-name sensevoice \
+    --compute-cer \
+    --num-tasks $num_task \
+    --batch-size 16 \
+    --manifest-dir ./datasets/aishell1_test
+```
+
+Benchmark results below were based on Aishell1 test set with a single V100, the total audio duration is 36108.919 seconds.
+|concurrent-tasks | batch-size-per-task | processing time(s) | RTF |
+|----------|--------------------|------------|---------------------|
+| 32 (onnx fp32)                | 16 | 67.09 | 0.0019|
+| 32 (onnx fp32)                | 1 | 82.04  | 0.0023|
+
+(Note: for batch-size-per-task=1 cases, tritonserver could use dynamic batching to improve throughput.)
+
+## Acknowledge
+This part originates from NVIDIA CISI project. We also have TTS and NLP solutions deployed on triton inference server. If you are interested, please contact us.