當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Opensmile 简介

發(fā)布時間：2025/4/5 编程问答 13 豆豆

生活随笔收集整理的這篇文章主要介紹了 Opensmile 简介小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

OpenSMILE軟件簡介

--此部分內容轉載自他人，并進行筆記梳理。
--裝載連接：http://blog.sina.com.cn/s/blog_8d351dfc0102w85j.html 一、簡介

1. OpenSMILE軟件介紹

openSMILE軟件是一種以命令行形式運行的而不是圖形界面的操作軟件，通過配置config文件對音頻進行特征提取。現(xiàn)在openSMILE?被世界上的研究學者和公司廣泛應用。

openSMILE適用的領域有：speech recognition (feature extraction front-end, keyword spotting, etc.), the area of?affective computing（情感計算） (emotion recognition, affect sensitive virtual agents, etc.),?Music Information Retrieval_r(chord labeling（和弦標記）, beat tracking）（節(jié)拍追蹤）, onset detection（起始點檢測） etc.). With the 2.0 open-source release we target the wider multi-media community by including the popular openCV library for video processing and video feature extraction.

Figure.1?語音識別系統(tǒng)基本原理框圖及openSMILE的應用

2. OpenSMILE軟件的輸入輸出文件格式

Data input:?openSMILE can read data from the following file formats

–???RIFF-WAVE (PCM) (for MP3, MP4, OGG, etc. a converter needs to be used)

–???Comma Separated Value?(CSV)

–???HTK parameter _les

–???WEKA's ARFF format（由htk工具產生）

–???Video streams via openCV.（opencv產生的視頻流數(shù)據）

Data output:?For writing data data to _les, the same formats as on the input side are supported, except for an additional binary matrix format:

–???RIFF-WAVE (PCM uncompressed audio)

–???Comma Separated Value (CSV)

–???HTK parameter _le

–???WEKA ARFF _le?（WEKA 工具中的 ARFF 文件）

–???LibSVM feature _le format（LibSVM 工具的 feature 信息）

–???Binary float matrix format

3. OpenSMILE可以對數(shù)據進行以下四類的特征提取操作：

1)????????Signal Processing:?The following functionality is provided for general signal processing or signal pre-processing (prior to feature extraction):

–???Windowing-functions (Rectangular, Hamming, Hann (raised cosine), Gauss, Sine, Triangular,Bartlett, Bartlett-Hann, Blackmann, Blackmann-Harris, Lanczos)（WF）

–???Pre-/De-emphasis (i.e. 1st order high/low-pass)

–???Re-sampling (spectral domain algorithm)

–???FFT (magnitude, phase, complex) and inverse（快速傅里葉變換--幅度、相和 complex fft--及反變換）

–???Scaling of spectral axis via spline interpolation (open-source version only)（通過樣條插值進行頻譜軸的縮放）

–???dbA weighting of magnitude spectrum（幅度譜加權）

–???Autocorrelation function (ACF) (via IFFT of power spectrum)（自相關函數(shù)）

–???Average magnitude difference function (AMDF)（平均幅值差分函數(shù)?）

2)????????Data Processing:?openSMILE can perform a number of operations for feature normalization, modification, and differentiation:

–???Mean-Variance normalization (o_-line and on-line)(均值方差標準化)

–???Range normalization (o_-line and on-line)（幅度標準化）

–???Delta-Regression coefficients (and simple differential)（Delta 回歸系數(shù)和簡易的微分）

–???Weighted Differential（加權微分）

–???Various vector operations: length, element-wise addition, multiplication, logarithm, and power.（各種各樣的向量運算）

–???Moving average filter for smoothing of contour over time.（？）

3)????????Audio features (low-level):?The following (audio specific) low-level descriptors can be computed by openSMILE:

–???Frame Energy（幀能量）

–???Frame Intensity / Loudness (approximation)（幀強度）

–???Critical Band spectra (Mel/Bark/Octave, triangular masking filters)（臨界頻帶譜）

–???Mel-/Bark-Frequency-Cepstral Coefficients (MFCC)（倒譜系數(shù)）

–???Auditory Spectra（聽覺譜）

–???Loudness approximated from auditory spectra.（聽覺譜近似強度）

–???Perceptual Linear Predictive (PLP) Coe_cients（？）

–???Perceptual Linear Predictive Cepstral Coe_cients (PLP-CC)（？）

–???Linear Predictive Coefficients (LPC)（線性預測系數(shù)）

–???Line Spectral Pairs (LSP, aka. LSF)）（線光譜對）

–???Fundamental Frequency (via ACF/Cepstrum method and via Subharmonic-Summation (SHS))（基礎頻率）

–???Probability of Voicing from ACF and SHS spectrum peak）（ACF 和 SHS 譜峰的概率）

–???Voice-Quality: Jitter and Shimmer）（聲音質量：緊張和支支吾吾）

–???Formant frequencies and bandwidths（共振頻率和帶寬）

–???Zero- and Mean-Crossing rate（過零率和平均穿越率）

–???Spectral features (arbitrary band energies--任意波段能量, roll-off points--轉出點, centroid--幾何中心, entropy--熵, maxpos, minpos, variance (=spread), skewness--偏度, kurtosis--峰值, slope--斜率)（聲譜特征）

–???Psychoacoustic sharpness, spectral harmonicity（心理聲學銳度和聲譜調和性）

–???CHROMA (octave warped semitone spectra) and CENS features (energy normalised and smoothed CHROMA)arbitrary band energies（？）

–???CHROMA-derived Features for Chord and Key recognition（用于和弦、聲調識別的 CHROMA 產生的特征）

4)????????Functionals:?In order to map contours of audio and video low-level descriptors onto a vector of fixed dimensionality, the following functionals can be applied:

–???Extreme values and positions

–???Means (arithmetic, quadratic, geometric)

–???Moments (standard deviation, variance, kurtosis, skewness)

–???Percentiles and percentile ranges

–???Regression (linear and quadratic approximation, regression error)

–???Centroid

–???Peaks

–???Segments

–???Sample values

–???Times/durations

–???Onsets/Offsets

–???Discrete Cosine Transformation (DCT)（離散余弦變換）

–???Zero-Crossings

–???Linear Predictive Coding (LPC) coefficients and gain

4. config文件格式和運行方式

1)????????config文件格式

Figure.2 Overview on openSMILE's component types and openSMILE's basic architecture

?Figure.2 shows the overall data-flow architecture of openSMILE, where the data memory is the central link between all dataSource, dataProcessor, and dataSink components.

Figure.3 Incremental processing with ring-buffers. Partially filled buffers (left) and filled

buffers with warped read/write pointers (right).

?The ring-buffer based incremental processing is illustrated in Figure 3. Three levels are present in this setup:?wave,?frames, and?pitch. A cWaveSource component writes samples to the 'wave' level. The write positions in the levels are indicated by a red arrow. A cFramer produces frames of size 3 from the wave samples (non-overlapping), and writes these frames to the ?'frames ' level. A cPitch (a component with this name does not exist, it has been chosen here only for illustration purposes) component extracts pitch features from the frames and writes them to the ?'pitch' level. In figure 3 (right) the buffers have been filled, and the write pointers have been?warped. Data that lies more than ?'buffersize' frames in the past has been overwritten.

2) openSMILE執(zhí)行方式

openSMILE軟件是通過命令行形式運行提取音頻特征的。命令行格式如下：

SMILExtract -C config/demo/demo1nenergy.conf -I wav_samples/speech01.wav -O speech01.energy.csv

其中，-C?說明提取特征的配置文件，-I?說明輸入的數(shù)據源，-O?說明輸出的特征文件，另，執(zhí)行?SMILExtraction –h?命令，可以顯示openSMILE軟件所有使用信息并退出。

3) config文件示例

openSMILE軟件的配置文件示例如下：

[ component Instances : cComponentManager ]????< don't change this
; configure the default data memory :
instance [ dataMemory ] . type=cDataMemory
;configure an example data source(name = source1 ) :
instance [ source1 ] . type= cWaveSource
instance [frame ] . type= cFramer
instance[pe].type=cVectorPreemphasis
……
///???component configuration??
/
; the following sections configure the components listed above
[ source1 : cWaveSource ]
; the following sets the level this component writes to
; the leval will be created by this component
; no other components may write to a level having the same name
writer . dmLevel = wave
filename = input .wav
?
[frame : cFramer ]
reader . dmLevel=wave
writer . dmLevel=frames
frameSize = 0.0250
frameStep = 0.010
?
[pe:cVectorPreemphasis]
reader.dmLevel=frames
writer.dmLevel=framespe
k = 0.97
de = 0
……
data output configuration //
// ----- you might need to customize the arff output to suit your needs: ------
[arffsink:cArffSink]
reader.dmLevel= framespe
; do not print "frameIndex" attribute to ARFF file
frameIndex=0
frameTime=1
; name of output file as commandline option
filename=cm[arffout(O){output.arff}:name of WEKA Arff output file]
; name of @relation in the ARFF file
relation=cm[corpus{SMILEfeatures}:corpus name, arff relation]
?
; name of the current instance (usually file name of input wave file)
instanceName=cm[instname(N){noname}:name of arff instance]
;; name of class label
class[0].name = emotion
class[0].type = cm[classes{unknown}:all classes for arff file attribute]
target[0].all = cm[classlabel(a){unknown}:instance class label]
?; append to an existing file, so multiple calls of SMILExtract on different
?; input files append to the same output ARFF file

append=1

通過以上簡單的config文件示例，可以清楚的看到配置文件的書寫方式，根據自己想要的音頻特征修改配置文件可以提取相應的音頻特征。其中，各類特征提取的參數(shù)可以根據的需要進行修改。

5.?延伸拓展

openSMILE軟件是一個開源的數(shù)據庫，所有的程序都是由C++語言編寫，并且openSMILE軟件可以適用于分析各種時序數(shù)據。只要根據自己的數(shù)據信息，可以修改openSMILE軟件的源代碼生成自己的.exe程序就可以用于處理相應數(shù)據。

openSMILE軟件對于音頻處理的特征提取是一款很有效的工具，我們可以借助工具找到自己的創(chuàng)新點，而不是僅僅局限于開發(fā)一個特征提取程序，有了這些有效工具的幫助我們可以很快的找到自己需要著重研究的點。在各個領域內，我們都要善于利用各種工具用于自己的開發(fā)研究，站在巨人的肩膀上開拓創(chuàng)新一定會比閉門造車更能收獲成功。

注：更多關于openSMILE軟件的信息，可以在官網http://openSMILE.sourceforge.net/下載openSMILE_book_2.0-rc1.pdf查閱。

openSMILE 開發(fā)站點：http://audeering.com/research/opensmile/

總結

以上是生活随笔為你收集整理的Opensmile 简介的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：深入浅出ES6（一）：ES6是什么
下一篇： python 慕课课程笔记（一）