當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

kaldi解码器在嵌入式平台运行

發(fā)布時間：2023/12/29 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 kaldi解码器在嵌入式平台运行小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

該博文屬于系列文章，其他文章參考總覽：?kaldi嵌入式平臺的移植及實現(xiàn)

前言

? ? 前面的博文kaldi源碼的交叉編譯已經(jīng)將嵌入式平臺的解碼器編譯完成，解碼器有GMM、nnet2、nnet3等等，GMM解碼器又分為單音素、三音素解碼，本博文介紹如何設(shè)置解碼器參數(shù)，并在嵌入式平臺運行解碼器。

?

GMM解碼器

1. 在線識別online-gmm-decode-faster

以單音素模型為例，在訓(xùn)練結(jié)果s5/exp/mono目錄下，需要的解碼模型文件為：40.mdl(或者final.mdl，兩者是同一個文件) graph/HCLG.fst graph/words.txt

./online-gmm-decode-faster --rt-min=0.5 --rt-max=0.7 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 ./40.mdl ./graph/HCLG.fst ./graph/words.txt 1:2:3:4:5

測試結(jié)果：

2. wav文件解碼online-wav-gmm-decode-faster

同樣以單音素模型為例，在訓(xùn)練結(jié)果s5/exp/mono目錄下，需要的解碼模型文件為：40.mdl(或者final.mdl，兩者是同一個文件) graph/HCLG.fst graph/words.txt ，相比于在線解碼，此處需要提供一個input.scp文件，結(jié)構(gòu)是說話人和wav文件的對應(yīng)，結(jié)構(gòu)如下：

解碼命令：

./online-wav-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:./input.scp ./final.mdl ./graph/HCLG.fst ./graph/words.txt 1:2:3:4:5 ark,t:./trans.txt ark,t:ali.txt?

測試結(jié)果：
File: utt1
WARNING (online-wav-gmm-decode-faster[5.4.0~14-dffb]:Read():wave-reader.cc:260) Expected 28304 bytes in RIFF chunk, but after first data block there will be 36 + 28152 bytes (we do not support reading multiple data chunks).
二十七鄧?

File: utt2
WARNING (online-wav-gmm-decode-faster[5.4.0~14-dffb]:Read():wave-reader.cc:260) Expected 26636 bytes in RIFF chunk, but after first data block there will be 36 + 26484 bytes (we do not support reading multiple data chunks).
二十較多?

注：如果是 tri1 tri2 tri3解碼，以上兩個解碼器需要多一些參數(shù)，具體如下：

./online-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85 --max-active=14000 --beam=1.0 --acoustic-scale=0.1 --left-context=3 --right-context=3 tri2b/final.mdl tri2b/HCLG.fst tri2b/words.txt '1:2:3:4:5' tri2b/12.mat

./online-wav-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 --left-context=3 --right-context=3 scp:./input.scp ./final.mdl ./HCLG.fst ./words.txt 1:2:3:4:5 ark,t:./trans.txt ark,t:ali.txt ./12.mat

------------------------------------------------------------------------------------------------------------------------------------------------------------

3. online2bin中的GMM解碼器 online2-wav-gmm-latgen-faster

與onlinebin中解碼器有所不同的是，需要先生成一些配置文件，具體命令如下(單音素為例):

steps/online/prepare_online_decoding.sh data/train/ data/lang exp/mono exp/mono/final.mdl exp/mono_online

配置文件生成在mono_online文件夾中，然后執(zhí)行下面命令運行解碼器：

./online2-wav-gmm-latgen-faster --do-endpointing=false --config=./mono_online/conf/online_decoding.conf --add-pitch=false?--max-active=7000 --beam=13 --lattice-beam=6.0 --acoustic-scale=0.083333 --word-symbol-table=./mono/words.txt ./mono/HCLG.fst 'ark:echo utterance-id1 utterance-id1|' "scp:echo utterance-id1 1.wav|" 'ark:/dev/null'

其中--add-pitch=false，具體需要看訓(xùn)練時是否有添加pitch選項，如果有就需要設(shè)置成true，并且設(shè)置--online-pitch-config=? ，該參數(shù)為pitch.conf的路徑。

DNN解碼器

1.?online2bin中的DNN解碼器?online2-wav-nnet3-latgen-faster

同3中的解碼器需要生成配置文件，由于訓(xùn)練時未加入ivector，所以生成配置文件時也沒有添加ivector的參數(shù)

./steps/online/nnet3/prepare_online_decoding.sh data/lang_chain exp/chain/tdnn_1b_all_sp ./chain_conf ?

./online2-wav-nnet3-latgen-faster --do-endpointing=false --online=false --frame-subsampling-factor=3 --config=./model/chain_conf/conf/online.conf --add-pitch=false --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=./model/graph/words.txt ./model/chain_conf/final.mdl ./model/graph/HCLG.fst 'ark:echo utterance-id1 utterance-id1|' "scp:echo utterance-id1 test.wav|" 'ark:/dev/null'