當(dāng)前位置：首頁(yè) > 人文社科 > 生活经验 >内容正文

生活经验

mxnet speech_recognition踩坑记

發(fā)布時(shí)間：2023/11/27 生活经验 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 mxnet speech_recognition踩坑记小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

mxnet的example下有speech_recognition，基于百度的?DeepSpeech2，嘗試用該工程進(jìn)行語(yǔ)音識(shí)別的訓(xùn)練

https://github.com/apache/incubator-mxnet/tree/master/example/speech_recognition

該工程需要自己編譯帶 WarpCTC 的 mxnet 版本

我首先使用XUbuntu19.10操作系統(tǒng)，在編譯warp-ctc時(shí)遇到如下報(bào)錯(cuò)：

/usr/local/cuda/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
我這個(gè)電腦上有GPU，報(bào)錯(cuò)說不支持版本高于8的gcc，XUbuntu的19.10的gcc版本為9.2.1

然后，我就只能在XUbuntu18.04版本上進(jìn)行編譯，在編譯mxnet時(shí)報(bào)如下錯(cuò)誤：

CMake 3.13 or higher is required. ?You are running version 3.10.2
又嫌棄cmake版本太低，我再次換操作系統(tǒng)

操作系統(tǒng)XUbuntu19.10的cmake版本為3.13.4，gcc版本為8.3.0，看來(lái)符合上面苛刻的條件

編譯安裝warp-ctc工程

git clone https://github.com/baidu-research/warp-ctc.git
cd warp-ctc
mkdir build
cd build
cmake ../
make
sudo make install

編譯mxnet

https://mxnet.apache.org/get_started/ubuntu_setup

sudo apt install ninja-build ccache libopenblas-dev libopencv-dev
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet
cd mxnet

修改文件make/config.mk，取消注釋下面兩行

WARPCTC_PATH = $(HOME)/warp-ctc
MXNET_PLUGINS += plugin/warpctc/warpctc.mk

如果要添加CUDA支持，設(shè)置

USE_CUDA = 1
USE_CUDNN = 1

開始編譯

make -j8

支持GPU版本，參考命令

make -j8 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1

安裝到系統(tǒng)中

cd python
pip install --user -e .

安裝配套

pip install mxboard
pip install soundfile

先下載兩個(gè)音頻文件，訓(xùn)練試一下：

https://github.com/samsungsds-rnd/deepspeech.mxnet/tree/master/Libri_sample

把上面鏈接整個(gè)目錄都下載到目錄example/speech_recognition/下，并新建文件Libri_sample.json，內(nèi)容如下：

{"duration": 2.9450625, "text": "and sharing her house which was near by", "key": "./Libri_sample/3830-12531-0030.wav"}
{"duration": 3.94, "text": "we were able to impart the information that we wanted", "key": "./Libri_sample/3830-12529-0005.wav"}

執(zhí)行如下命令：

mkdir checkpoints
mkdir log
python main.py --configfile default.cfg

如果報(bào)錯(cuò)：libwarpctc.so: cannot open shared object file: No such file or directory

則執(zhí)行命令：

export LD_LIBRARY_PATH=/usr/local/lib

如果上面編譯mxnet的時(shí)候，沒有開啟支持GPU，會(huì)報(bào)錯(cuò)MXNetError: Compile with USE_CUDA=1 to enable GPU usage，

如果是想在CPU上進(jìn)行訓(xùn)練，需要修改default.cfg，

context = cpu0

訓(xùn)練完成后就可以預(yù)測(cè)了，復(fù)制default.cfg文件為predict.cfg，把

mode = train
改為

mode = predict
然后執(zhí)行命令:

python main.py --configfile predict.cfg

我重新在我的神船筆記本(帶RTX2060)上訓(xùn)練，預(yù)測(cè)結(jié)果為：

[    INFO][2020/01/13 16:54:22.581] label: we were able to impart the information that we wanted
[    INFO][2020/01/13 16:54:22.581] pred : we were able to impart the information that we wanted , cer: 0.000000 (distance: 0/ label length: 53)
[    INFO][2020/01/13 16:54:22.582] label: and sharing her house which was near by
[    INFO][2020/01/13 16:54:22.582] pred : and sharing her house which was near by , cer: 0.000000 (distance: 0/ label length: 39)

完全正確，好激動(dòng)，真的可以預(yù)測(cè)

下面開始玩大的，多下載一些訓(xùn)練數(shù)據(jù)，準(zhǔn)備好250G的磁盤空間：

http://www.openslr.org/resources/12/dev-clean.tar.gz
#http://www.openslr.org/resources/12/dev-other.tar.gz
http://www.openslr.org/resources/12/test-clean.tar.gz
#http://www.openslr.org/resources/12/test-other.tar.gz
http://www.openslr.org/resources/12/train-clean-100.tar.gz
#http://www.openslr.org/resources/12/train-clean-360.tar.gz
#http://www.openslr.org/resources/12/train-other-500.tar.gz

然后全部解壓縮，

tar xvf dev-clean.tar.gz
#tar xvf dev-other.tar.gz
tar xvf test-clean.tar.gz
#tar xvf test-other.tar.gz
tar xvf train-clean-100.tar.gz
#tar xvf train-clean-360.tar.gz
#tar xvf train-other-500.tar.gz

然后轉(zhuǎn)碼為wav，執(zhí)行example/speech_recognition/flac_to_wav.sh

./flac_to_wav.sh

這個(gè)命令會(huì)把子目錄下的所有flac文件轉(zhuǎn)碼為wav，保存到同目錄下

然后下載工程?https://github.com/baidu-research/ba-dls-deepspeech.git?，沒辦法，還是要用到該工程下的create_desc_json.py文件：

python create_desc_json.py ~/LibriSpeech/train-clean-100 train_corpus.json
python create_desc_json.py ~/LibriSpeech/dev-clean validation_corpus.json
python create_desc_json.py ~/LibriSpeech/test-clean test_corpus.json

然后稍微修改一下deepspeech.cfg，

train_json = train_corpus.json
test_json = test_corpus.json
val_json = validation_corpus.json

開始訓(xùn)練

mkdir checkpoints
mkdir log
python main.py --configfile deepspeech.cfg

我測(cè)試感覺這個(gè)訓(xùn)練極耗內(nèi)存，我神船的16G ddr4根本扛不住，運(yùn)行起來(lái)就會(huì)報(bào)“已殺死”，所以只好想辦法增加了64G的虛擬內(nèi)存

sudo dd if=/dev/zero of=swapfile bs=64M count=1024
sudo mkswap swapfile
sudo swapon swapfile

看起來(lái)差不多虛擬內(nèi)存也要耗費(fèi)46G左右，加上本身的16G內(nèi)存，估計(jì)共需要64G以上的內(nèi)存才能運(yùn)行

把該配置文件按照上面的方法，修改為predict.cfg，可以嘗試下語(yǔ)音識(shí)別的效果

總結(jié)

以上是生活随笔為你收集整理的mxnet speech_recognition踩坑记的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： github分段下载
下一篇： mxnet deepspeech网络结构

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

mxnet speech_recognition踩坑记

總結(jié)