如何用dds实现线性调频_用神经网络生成线性调频
如何用dds實現(xiàn)線性調(diào)頻
The sound of birdsong is varied, beautiful, and relaxing. In the pre-Covid times, I made a focus timer which would play some recorded bird sounds during breaks, and I always wondered whether such sounds could be generated. After some trial and error, I landed on a proof-of-concept architecture which can both successfully reproduce a single chirp and has parameters which can be adjusted to alter the generated sound.
鳥鳴聲多變,優(yōu)美而輕松。 在Covid之前的時期,我制作了一個對焦計時器 ,該計時器會在休息時播放一些錄制的鳥類聲音,而我一直想知道是否會產(chǎn)生這樣的聲音。 經(jīng)過一番嘗試和錯誤之后,我進(jìn)入了概念驗證架構(gòu),該架構(gòu)既可以成功復(fù)制單個chi聲,又可以調(diào)整參數(shù)以更改生成的聲音。
Since generating bird sounds seems like a somewhat novel application, I think it is worth sharing this approach. Along the way, I also learned how to take TensorFlow models apart and graft parts of them together. The code blocks below show how this is done. The full code can be found here.
由于生成鳥的聲音似乎是一種新穎的應(yīng)用程序,因此我認(rèn)為值得分享這種方法。 在此過程中,我還學(xué)習(xí)了如何將TensorFlow模型分開并將它們的一部分移植在一起。 下面的代碼塊顯示了如何完成此操作。 完整的代碼可以在這里找到。
理論上的方法 (The approach in theory)
The generator will be composed two parts. The first part will take the entire sounds and encode key pieces of information about its overall shape in a small number of parameters.
發(fā)電機將由兩部分組成。 第一部分將提取全部聲音,并以少量參數(shù)對有關(guān)其總體形狀的關(guān)鍵信息進(jìn)行編碼。
The second part will take a small bit of sound, along with the information about the overall shape, and predict the next little bit of sound.
第二部分將吸收少量聲音以及有關(guān)整體形狀的信息,并預(yù)測下一個聲音。
The second part can be called iteratively on itself with adjusted parameters to produce an entirely new chirp!
第二部分可以通過調(diào)整后的參數(shù)自行調(diào)用,以產(chǎn)生全新的an!
編碼參數(shù) (Encoding the parameters)
An autoencoder structure is used for deriving the key parameters of the sound. This structure takes the entire soundwave and reduces it, through a series of (encoding) layers, down to a small number of components (the waist), before reproducing the sound in full from a series of expanding (decoding) layers. Once trained, the autoencoder model is cut off at the waist so that all it does is reduce the full sound down to the key parameters.
自動編碼器結(jié)構(gòu)用于導(dǎo)出聲音的關(guān)鍵參數(shù)。 這種結(jié)構(gòu)吸收了整個聲波,并通過一系列(編碼)層將其減小到少數(shù)組件(腰部),然后再從一系列擴展(解碼)層中完全再現(xiàn)聲音。 接受訓(xùn)練后,自動編碼器模型會在腰部被切斷,從而將整個聲音降低到關(guān)鍵參數(shù)。
For the proof of concept, a single chirp was used; this chirp:
為了證明概念,使用了一個線性調(diào)頻脈沖。 此chi:
Soundwave representation of employed chirp.所用chi的聲波表示。It comes from the Cornell Guide to Bird Sounds: Essential Set for North America. The same set used for the Birds Sounds Chrome Experiment.
它來自《 康奈爾鳥的聲音指南:北美必讀》 。 與Birds Sounds Chrome實驗所用的相同。
One problem with using just a single sound is that the autoencoder might simply hide all the information about the sound in the biases of the decoding layers, leaving the waist with all zero weights. To mitigate this, the sounds was morphed during training by altering its amplitude and shifting it around a little.
僅使用單一聲音的一個問題是,自動編碼器可能會將所有關(guān)于聲音的信息隱藏在解碼層的偏置中,而使腰部的權(quán)重全部為零。 為了減輕這種情況,在訓(xùn)練過程中,聲音會通過改變其振幅并略微移動一些而變形。
The encoder portion of the autoencoder consists of a series of convolutional layers which compress a 3000-ish long sounds wave down to around 20 numbers, hopefully retaining important information along the way. Since sounds are composed of many different sine waves, allowing many convolutional filters of different sizes to pass over the sound can in theory capture key information about the composite waves. A waist size of 20 was chosen mainly because this seems like a somewhat surmountable number of adjustable parameters.
自動編碼器的編碼器部分由一系列卷積層組成,這些卷積層將3000道長的聲音壓縮成大約20個數(shù)字,希望在此過程中保留重要信息。 由于聲音由許多不同的正弦波組成,因此理論上允許許多大小不同的卷積濾波器通過聲音可以捕獲有關(guān)復(fù)合波的關(guān)鍵信息。 選擇腰圍尺寸為20的主要原因是,這似乎是一些無法解決的可調(diào)參數(shù)。
In this first approach, the layers are stacked sequentially. In a future version, it may be advantageous to use a structure akin to inception-net blocks to run convolutions of different sizes in parallel.
在該第一種方法中,各層順序堆疊。 在將來的版本中,使用類似于初始網(wǎng)塊的結(jié)構(gòu)并行運行不同大小的卷積可能會比較有利。
The decoder portion of the model consists of two dense layers, one of length 400, and one of length 3000 — the same length as the input sound. The activation function of the final layer is tanh, as the sound wave representations have values between -1 and 1.
模型的解碼器部分由兩個密集層組成,其中一個長度為400,另一個長度為3000,與輸入聲音的長度相同。 最后一層的激活函數(shù)為tanh,因為聲波表示的值介于-1和1之間。
Here is what this looks like visualized:
這看起來像是可視化的:
PlotNeuralNet.PlotNeuralNet制作。And here is the code:
這是代碼:
訓(xùn)練發(fā)電機 (Training the Generator)
The structure of the generator begins with the encoding portion of the autoencoder network. The output at the waist is combined with some fresh input representing the bit of the sound wave immediately preceding that which is to be predicted. In this case, the previous 200 values of the sound wave are used as input, and the next 10 are predicted.
生成器的結(jié)構(gòu)從自動編碼器網(wǎng)絡(luò)的編碼部分開始。 腰部的輸出與一些新鮮的輸入相結(jié)合,這些輸入代表了聲波的比特,緊接在要預(yù)測的比特之前。 在這種情況下,將聲波的前200個值用作輸入,并預(yù)測下一個10個值。
The combined inputs are fed into a series of dense layers. The sequential dense layers allow the network to learn the relationship between the previous values, information on the overall shape of the sound, and the following values. The final dense layer is of length 10 and activated with a tanh function.
組合的輸入被饋送到一系列密集的層中。 順序的密集層允許網(wǎng)絡(luò)學(xué)習(xí)先前值,有關(guān)聲音總體形狀的信息和后續(xù)值之間的關(guān)系。 最終的致密層的長度為10,并激活了tanh功能。
Here is what this network looks like:
該網(wǎng)絡(luò)如下所示:
PlotNeuralNet.PlotNeuralNet制作。The layers coming from the autoencoder network are frozen so that additional training resources are not spent on them.
來自自動編碼器網(wǎng)絡(luò)的層被凍結(jié),因此不會在它們上花費額外的培訓(xùn)資源。
產(chǎn)生一些聲音 (Generating some sounds)
Training this network takes only a couple of minutes as the data is not very varied and therefore relatively easy to learn, particularly for the autoencoder network. One final flourish is to produce two new networks from the trained models.
訓(xùn)練該網(wǎng)絡(luò)僅需花費幾分鐘,因為數(shù)據(jù)變化不大,因此相對容易學(xué)習(xí),尤其是對于自動編碼器網(wǎng)絡(luò)而言。 最后的成功是從訓(xùn)練有素的模型中產(chǎn)生兩個新的網(wǎng)絡(luò)。
The first is simply the encoder portion of the autoencoder, but now separated. We need this part to produce some initial good parameters.
第一個只是自動編碼器的編碼器部分,但現(xiàn)在是分開的。 我們需要這部分來產(chǎn)生一些初始的良好參數(shù)。
The second model is same as the generator network, but with the parts from the autoencoder network replaced with a new input source. This is done so that the trained generator no longer requires the entire soundwave as input, but only the encoded parameters capturing the key information about the sound. With these separated out as a new input, we can freely manipulate them when generating chirps.
第二種模型與生成器網(wǎng)絡(luò)相同,但是自動編碼器網(wǎng)絡(luò)中的部件已替換為新的輸入源。 這樣做是為了使訓(xùn)練有素的生成器不再需要整個聲波作為輸入,而只需編碼參數(shù)即可捕獲有關(guān)聲音的關(guān)鍵信息。 通過將這些作為新輸入分離出來,我們可以在生成chi時自由操作它們。
The following sounds were generated without modifying the parameters, they are very close to the original sound, but are not perfect reproductions. The generator network is only able to reach an accuracy of between 60% and 70%, so some variability is to be expected.
以下聲音是在不修改參數(shù)的情況下生成的,它們與原始聲音非常接近,但不是完美的復(fù)制品。 發(fā)電機網(wǎng)絡(luò)只能達(dá)到60%到70%的精度,因此可能會有一些變化。
Sounds generated without modifying the encoded parameters.無需修改編碼參數(shù)即可生成聲音。修改參數(shù) (Modifying the parameters)
The advantage of generating bird sounds is in part that new variations on a theme can be produced. This can be done by modifying the parameters produced by the encoder network. In the above case, the encoder produced these parameters:
產(chǎn)生鳥聲的優(yōu)點部分是可以在主題上產(chǎn)生新的變化。 這可以通過修改編碼器網(wǎng)絡(luò)產(chǎn)生的參數(shù)來完成。 在上述情況下,編碼器產(chǎn)生了以下參數(shù):
Not all of the 20 nodes produced non-zero parameters, but there are enough of them to experiment with. There is a lot of complexity to be explored with 12 adjustable parameters that all can be adjusted to arbitrary degrees in both directions. Since this is a proof of concept, it will suffice to present some choice sounds generated by adjusting just a single parameter in each case:
并非所有20個節(jié)點都產(chǎn)生非零參數(shù),但是有足夠的參數(shù)可以進(jìn)行試驗。 通過12個可調(diào)參數(shù)可以探索很多復(fù)雜性,所有這些參數(shù)都可以在兩個方向上任意調(diào)整為任意角度。 由于這是一個概念證明,因此在每種情況下僅需調(diào)整一個參數(shù)就可以呈現(xiàn)一些選擇聲音:
Sounds generated after modifying one of the encoded parameters in each case.在每種情況下修改編碼參數(shù)之一后產(chǎn)生的聲音。Here are the soundwave representations of the three examples:
這是三個示例的聲波表示:
Soundwave representation of generated chirps.生成的chi的聲波表示。下一步 (Next Steps)
It seems that generating bird sounds using a neural networks is possible, although it remains to be seen how practicable it is. The above approach uses just a single sound, so a nearby next step would be to attempt to train the model on multiple different sounds. It is not clear from the outset that this would work. However, if the model as constructed fails on multiple sounds, it would still be possible to train different models on different sounds and simply stack them to produce different sounds.
似乎可以使用神經(jīng)網(wǎng)絡(luò)生成鳥的聲音,盡管還有待觀察它是多么實用。 上述方法僅使用單個聲音,因此附近的下一步將是嘗試在多種不同的聲音上訓(xùn)練模型。 從一開始還不清楚這是否行得通。 但是,如果所構(gòu)建的模型在多個聲音上失敗,則仍然有可能在不同的聲音上訓(xùn)練不同的模型,然后簡單地將它們堆疊以產(chǎn)生不同的聲音。
A larger problem is that not all produced sounds are viable, particularly when modifying the parameters. A fair share of produced sounds are more akin to computer beeps than bird song. Some sound like an angry computer that really doesn’t want you to do what you just tried to do. One way to mitigate this would be to train a separate model to detect bird sounds (perhaps along these lines), and use that to reject or accept generated output.
更大的問題是并非所有產(chǎn)生的聲音都是可行的,尤其是在修改參數(shù)時。 相當(dāng)一部分產(chǎn)生的聲音更像是計算機發(fā)出的嗶嗶聲,而不是鳥鳴。 有些聽起來像是一臺生氣的計算機,但實際上并不想讓您去做剛剛嘗試做的事。 減輕這種情況的一種方法是訓(xùn)練一個單獨的模型來檢測鳥的聲音(也許沿著這些線 ),并使用它來拒絕或接受生成的輸出。
Computational costs are also a constraint with the current approach; generating a chirp takes an order of magnitude longer than playing the sound, which is not ideal if the idea is to generate beautiful soundscapes on the fly. The main mitigation which comes to mind here is to increase the length of each prediction, possibly at the cost of accuracy. One could also, of course, simply spend the time to pre-generate acceptable soundscapes.
計算成本也是當(dāng)前方法的制約因素。 產(chǎn)生a聲比播放聲音要耗費一個數(shù)量級,如果要在飛行中產(chǎn)生優(yōu)美的音景,這是不理想的。 這里想到的主要緩解措施是增加每個預(yù)測的長度,可能會以準(zhǔn)確性為代價。 當(dāng)然,人們也可以簡單地花費時間來預(yù)先生成可接受的音景。
結(jié)論 (Conclusion)
A combination of an autoencoder network, and a short-term prediction network can be grafted together to produce a bird sound generator with some adjustable parameters which can be manipulated to create new and interesting bird sounds.
自動編碼器網(wǎng)絡(luò)和短期預(yù)測網(wǎng)絡(luò)的組合可以嫁接到一起,以產(chǎn)生具有一些可調(diào)整參數(shù)的鳥聲發(fā)生器,可以對這些參數(shù)進(jìn)行操作以創(chuàng)建新的有趣的鳥聲。
As with many projects, part of the motivation is to learn in the process. In particular, I did not know how to pull apart trained models and graft parts of them together. The models used above can be used as an example to guide other learners who want to experiment with such approaches.
與許多項目一樣,部分動機是在過程中學(xué)習(xí)。 特別是,我不知道如何將訓(xùn)練有素的模型分開并將它們的一部分移植在一起。 上面使用的模型可以用作示例,指導(dǎo)其他想嘗試這種方法的學(xué)習(xí)者。
翻譯自: https://towardsdatascience.com/generating-chirps-with-neural-networks-41628e72efb2
如何用dds實現(xiàn)線性調(diào)頻
總結(jié)
以上是生活随笔為你收集整理的如何用dds实现线性调频_用神经网络生成线性调频的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: opengl层次建模_层次建模简介
- 下一篇: 世界第三个!中国成功向用户交付量子计算机