使用TorchElastic训练DeepSpeech
Reduce cost and horizontally scale deepspeech.pytorch using TorchElastic with Kubernetes.
使用TorchElastic和Kubernetes降低成本并水平擴展deepspeech.py??torch。
使用Deepspeech.py??torch的端到端語音到文本模型 (End-to-End Speech To Text Models Using Deepspeech.pytorch)
Deepspeech.pytorch provides training, evaluation and inference of End-to-End (E2E) speech to text models, in particular the highly popularised DeepSpeech2 architecture. Deepspeech.pytorch was developed to provide users the flexibility and simplicity to scale, train and deploy their own speech recognition models, whilst maintaining a minimalist design. Deepspeech.pytorch is a lightweight package for research iterations and integrations that fills the gap between audio research and production.
Deepspeech.py??torch為文本模型(尤其是高度流行的DeepSpeech2體系結構)提供了端到端(E2E)語音的培訓,評估和推斷。 Deepspeech.py??torch的開發旨在為用戶提供靈活性和簡便性,以擴展,訓練和部署自己的語音識別模型,同時保持簡約的設計。 Deepspeech.py??torch是用于研究迭代和集成的輕量級軟件包,可填補音頻研究與生產之間的空白。
使用TorchElastic進行水平縮放訓練 (Scale Training Horizontally Using TorchElastic)
Training production E2E speech-to-text models currently requires thousands of hours of labelled transcription data. In recent cases, we see numbers exceeding 50k hours of labelled audio data. To train with these datasets requires optimised multi-GPU training and hyper-parameters configurations. As we move towards leveraging unlabelled audio data for our speech recognition models with the announcement of wav2vec 2.0, scaling and throughput will continue to be crucial to train larger models across larger datasets.
培訓生產端到端語音到文本模型當前需要數千小時的標記轉錄數據。 在最近的情況下,我們看到超過5萬小時的標記音頻數據 。 要使用這些數據集進行訓練,需要優化的多GPU訓練和超參數配置。 隨著wav2vec 2.0的發布,我們將為語音識別模型利用未標記的音頻數據,縮放和吞吐量對于在更大的數據集中訓練更大的模型仍然至關重要。
Multiple advancements in the field have improved training iteration times, such as the growth of cuDNN, introduction of Automatic Mixed Precision and in particular, multi-machine training. Many implementations have appeared to assist in multi-machine training such as KubeFlow, but usually come with a vast feature set to replace the entire training workflow. Implementations from scratch require significant engineering efforts, and from experience do not offer the robustness required to reliably scale. TorchElastic provides native PyTorch scaling capabilities and fits the lightweight paradigm of deepspeech.pytorch whilst giving enough customisation and freedom to users. In essence, TorchElastic provides integration to scale training in PyTorch with minimal effort, saving time from having to implement complex custom infrastructure and accelerating research to production times.
該領域的多項進步改善了訓練迭代時間,例如cuDNN的增長, 自動混合精度的引入,尤其是多機訓練。 已經出現了許多實現輔助多機器訓練的實現,例如KubeFlow ,但是通常帶有廣泛的功能集來代替整個訓練工作流程。 從零開始的實現需要大量的工程工作,而從經驗上不能提供可靠擴展所需的魯棒性。 TorchElastic提供本機PyTorch縮放功能,并適合deepspeech.py??torch的輕量級范例,同時為用戶提供了足夠的自定義和自由度。 本質上,TorchElastic以最小的工作量提供了集成,可以在PyTorch中進行規模培訓,從而節省了必須實施復雜的自定義基礎架構的時間,并縮短了研究到生產的時間。
通過使用可搶占實例來降低擴展成本 (Reduce Scaling Cost By Using Preemptible Instances)
One method to reduce costs when scaling is to utilise preemptible instances; virtual machines that are not being utilised by on demand users can be obtained at a substantially lower price. Comparing NVIDIA V100 prices on Google Cloud, it’s a 3x cost saving. A DeepSpeech training run using the popular LibriSpeech dataset costs around $510 using V100s on Google Cloud. Utilizing preemptible instances reduces this to $153, a massive cost reduction allowing for more research training cycles.
一種在擴展時降低成本的方法是利用可搶占實例。 可以以低得多的價格獲得未被按需用戶使用的虛擬機。 在Google Cloud上比較NVIDIA V100的價格,可節省3倍的成本。 在Google Cloud上使用V100,使用流行的LibriSpeech數據集進行的DeepSpeech培訓費用約為510美元。 利用可搶占實例可以將其減少到153美元,從而大大降低了成本,從而可以進行更多的研究培訓。
However due to their short life cycle, preemptible instances come with the caveat that interruptions can happen anytime and your code needs to manage this. This can be complex based on the training pipeline, as tracking the state of training to re-initialise may require keeping track of vast amounts of variables. One way to solve this is to abstract the “state” of training to save, load and continue training upon failures in the cluster, making it simpler to track new variables in the future.
但是,由于它們的生命周期短,因此可搶占的實例帶有警告,即隨時可能發生中斷,并且您的代碼需要對此進行管理。 基于訓練流水線,這可能很復雜,因為跟蹤訓練狀態以重新初始化可能需要跟蹤大量變量。 解決此問題的一種方法是抽象出訓練的“狀態”,以保存,加載并在集群發生故障時繼續進行訓練,從而使將來更容易跟蹤新變量。
Implementing state in training deepspeech.pytorch. See in full here.訓練deepspeech.py??torch的實施狀態。 在這里完整看到 。TorchElastic makes abstracting state really easy. Following example guidelines, the crucial functions to implement require state to be saved/resumed within the training code, then TorchElastic handles the rest. After integration, deepspeech.pytorch is able to handle interruptions seamlessly from previously saved state checkpoints. Deepspeech.pytorch also supports saving and loading state from a Google Cloud Storage bucket automatically, allowing us to mount read-only data drives to the node(s) and store our final models within an object store. TorchElastic cleans up a lot of boilerplate code, relieving the need to worry about distributed ranks, local GPU devices and distributed communication with other nodes.
TorchElastic使抽象狀態變得非常容易。 按照示例準則,要實現的關鍵功能需要在訓練代碼中保存/恢復狀態,然后由TorchElastic處理其余部分。 集成之后,deepspeech.py??torch能夠無縫處理來自先前保存的狀態檢查點的中斷。 Deepspeech.py??torch還支持自動從Google Cloud Storage存儲桶中保存和加載狀態,從而使我們可以將只讀數據驅動器安裝到節點上,并將最終模型存儲在對象存儲中。 TorchElastic清除了許多樣板代碼,從而無需擔心分布式行,本地GPU設備以及與其他節點的分布式通信。
deepspeech.pytorch Training Configdeepspeech.py??torch培訓配置We rely on Kubernetes to handle interruptions and node management. TorchElastic supplies us PyTorch distributed integration to ensure we’re able to scale across GPU enabled machines using Elastic Jobs. With the TorchElastic Kubernetes Operator (TEK) we’re able to transparently integrate distributed deepspeech.pytorch within our K8s GKE cluster.
我們依靠Kubernetes來處理中斷和節點管理。 TorchElastic向我們提供了PyTorch分布式集成,以確保我們能夠使用Elastic Jobs在支持GPU的計算機上進行擴展。 使用TorchElastic Kubernetes Operator(TEK),我們可以在我們的K8s GKE集群中透明地集成分布式deepspeech.py??torch。
Supported are both fault tolerant jobs (where nodes can fail at any moment) as well as node pools that are dynamically changing based on demand. This is particularly useful when using an auto-scaling node pool of preemptible GPU instances to train DeepSpeech models, whilst utilising as much of the pool as possible.
受支持的是容錯作業(節點隨時可能出現故障)以及根據需求動態變化的節點池。 當使用可搶占式GPU實例的自動縮放節點池來訓練DeepSpeech模型時,這特別有用,同時要盡可能多地利用該池。
縮放和管理超參數 (Scaling and Managing Hyper-parameters)
When node pools are dynamic, stability in hyper-parameters are key in handling variable sized compute pools. AdamW is a popular adaptive learning rate algorithm that provides stability when initially tuned, especially when node pools are dynamic and resources can be terminated or introduced at any time. When scaling to a substantial number of GPUs, fault tolerance has to be taken into consideration to ensure the pool is utilised and training completes even with disruptions. There are many other mini-batch specific and learning rate scheduler hyper-parameters that are also crucial, but thankfully with the recent addition of Hydra to deepspeech.pytorch, keeping track of these hyper-parameters is incredibly easy. In the future, deepspeech.pytorch will support various scaling hyper-parameter configurations for users to extend.
當節點池是動態的時,超參數的穩定性是處理可變大小的計算池的關鍵。 AdamW是一種流行的自適應學習速率算法,當最初進行調整時,尤其是在節點池是動態的并且可以隨時終止或引入資源時,它可以提供穩定性。 在擴展到大量GPU時,必須考慮容錯能力,以確保使用內存池,即使發生中斷也能完成培訓。 還有許多其他的特定于小批量的速率和學習速率調度程序的超參數也很重要,但是值得慶幸的是,隨著最近在deepspeech.py??torch中添加了Hydra ,跟蹤這些超參數非常容易。 將來,deepspeech.py??torch將支持各種縮放超參數配置,以供用戶擴展。
未來的步驟 (Future Steps)
As dataset sizes increase and research continues to show exciting ways to leverage unlabelled audio with new architectures inspired from NLP, scaling and throughput will be key to training speech-to-text models. Deepspeech.pytorch aims to be transparent, simple and allow users to build and extend the library for their use cases.
隨著數據集大小的增加以及研究繼續顯示出令人興奮的方法,這些方法可以利用未標記的音頻和受NLP啟發的新架構,縮放和吞吐量將是訓練語音到文本模型的關鍵。 Deepspeech.py??torch旨在透明,簡單,并允許用戶針對其用例構建和擴展庫。
Here are a few future directions for deepspeech.pytorch:
這是deepspeech.py??torch的一些未來方向:
Integrate TorchScript to introduce seamless production integration across Python and C++ back-ends
集成TorchScript以在Python和C ++后端之間引入無縫的生產集成
Introduce Trains by Allegro AI for Visualisation and Job Scheduling
介紹Allegro AI的火車以進行可視化和作業調度
Benchmark DeepSpeech using the new A2 A100 VMs available on Google Cloud, for further throughput/cost benefits
使用Google Cloud上可用的新A2 A100 VM對DeepSpeech進行基準測試,以進一步提高吞吐量/成本優勢
Move towards abstracting the model, integrating recent advancements in model architectures such as ContextNet, and additional loss functions such as the ASG criterion
趨向于抽象模型,整合模型體系結構中的最新進展(例如ContextNet )和其他損失函數(例如ASG準則)
To get started with training your own DeepSpeech models using TorchElastic, have a look at the k8s template and the README. Feel free to reach out with any questions or create an issue here!
要開始使用TorchElastic訓練自己的DeepSpeech模型,請查看k8s模板和README 。 歡迎與任何問題聯系或在此處創建問題!
翻譯自: https://medium.com/pytorch/training-deepspeech-using-torchelastic-ad013539682
總結
以上是生活随笔為你收集整理的使用TorchElastic训练DeepSpeech的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 饥荒联机版夏天怎么过(饥荒中文版下载)
- 下一篇: 苹果xr电信卡能用吗