python pcm 分贝_语音文件 pcm 静默(静音)判断
pcm 文件存儲的是 原始的聲音波型二進制流,沒有文件頭。
(1)首先要確認 pcm文件的每個采樣數據 采樣位數,一般為8bit或16bit。
(2)然后確定是雙聲道還是單聲道,雙聲道是兩個聲道的數據交互排列,需要單獨提取出每個聲道的數據。
(3)然后確定有沒有符號位,如采樣點位16bit有符號位的的范圍為-32768~32767
(4)確定當前操作系統的內存方式是大端,還是小端存儲。具體看http://blog.csdn.net/u013378306/article/details/78904238
(5)根據以上四條對pcm文件進行解析,轉化為10進制文件
注意:對于1-3可以在windows使用cooledit 工具設置參數播放pcm文件來確定具體參數,也可以使用以下java代碼進行測試:
本例子的語音為: 靜默1秒,然后說 “你好”,然后靜默兩秒。pcm文件下載路徑:http://download.csdn.net/download/u013378306/10175068
packagetest;importjava.io.File;importjava.io.FileInputStream;importjava.io.FileNotFoundException;importjava.io.IOException;importjava.io.InputStream;importjavax.sound.sampled.AudioFormat;importjavax.sound.sampled.AudioSystem;importjavax.sound.sampled.DataLine;importjavax.sound.sampled.LineUnavailableException;importjavax.sound.sampled.SourceDataLine;public classtest {/***@paramargs
*@throwsException*/
public static void main(String[] args) throwsException {//TODO Auto-generated method stub
File file= new File("3.pcm");
System.out.println(file.length());int offset = 0;int bufferSize =Integer.valueOf(String.valueOf(file.length())) ;byte[] audioData = new byte[bufferSize];
InputStream in= newFileInputStream(file);
in.read(audioData);float sampleRate = 20000;int sampleSizeInBits = 16;int channels = 1;boolean signed = true;boolean bigEndian = false;//sampleRate - 每秒的樣本數//sampleSizeInBits - 每個樣本中的位數//channels - 聲道數(單聲道 1 個,立體聲 2 個)//signed - 指示數據是有符號的,還是無符號的//bigEndian -是否為大端存儲, 指示是否以 big-endian 字節順序存儲單個樣本中的數據(false 意味著//little-endian)。
AudioFormat af = newAudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian);
SourceDataLine.Info info= new DataLine.Info(SourceDataLine.class, af, bufferSize);
SourceDataLine sdl=(SourceDataLine) AudioSystem.getLine(info);
sdl.open(af);
sdl.start();for(int i=0;i
audioData[i]*=1;while (offset
offset+=sdl.write(audioData, offset, bufferSize);
}
}
}
如果測試通過確定了參數就可以對pcm文件進行解析,如下java代碼對每個采樣數據為16bits,單聲道的pcm,在操作系統內存為小端存儲下解析為10進制文件。
packagetest;importjava.io.File;importjava.io.FileInputStream;importjava.io.FileWriter;importjava.io.InputStream;importjava.math.BigInteger;public classffff {/*** 采樣位為16bits,小端存儲,單聲道解析為10進制文件
*@paramargs*/
public static voidmain(String[] args) {try{
File file= new File("3.pcm");
System.out.println(file.length());
System.out.println(file.length());int bufferSize =Integer.valueOf(String.valueOf(file.length()));byte[] buffers = new byte[bufferSize];
InputStream in= newFileInputStream(file);
in.read(buffers);
String rs= "";for (int i = 0; i < buffers.length; i++) {byte[] bs = new byte[2];
bs[0]=buffers[i+1];//小端存儲,
bs[1]=buffers[i];int s = Integer.valueOf(binary(bs, 10));
i= i + 1;
rs+= " " +s;
}
writeFile(rs);
in.close();
}catch(Exception e) {
e.printStackTrace();
}
}public static voidwriteFile(String s) {try{
FileWriter fw= new FileWriter("hello3.txt");
fw.write(s,0, s.length());
fw.flush();
fw.close();
}catch(Exception e) {
e.printStackTrace();
}
}public static String binary(byte[] bytes, intradix) {return new BigInteger(bytes).toString(radix);//這里的1代表正數
}
}
執行完可以查看hello.txt ,可以看到一開始振幅很小,如下,基本不超過100:
-15 -12 -18 -24 -17 -8 -8 -17 -22 -14 -5 -18 -47 -67 -60 -41 -28 -28 -23 -12 -6 -9 -13 -8 0 6 21 49 68 48 -2 -43 -47 -32 -22 -10 22 56
但說你好的時候,振幅變得很大:
-2507 -2585 -2600 -2596 -2620 -2670 -2703 -2674 -2581 -2468 -2378 -2305 -2200 -2018 -1774 -1523 -1307 -1127 -962 -806 -652 -505 -384 -313 -281 -241 -163
然后靜默兩秒,振幅又變的很小:
5 3 0 -4 -5 -6 -6 -7 -7 -8 -9 -8 -10 -10 -11 -10 -11 -11 -11 -11 -11 -11 -10 -9 -7 -6 -3 -2 -2 -3 -3 -3 -1 2 4 4
具體波形圖可以使用python代碼顯示:
importnumpy as npimportpylab as plimportmathimportcodecs
file=codecs.open("hello3.txt","r") //原文代碼file=codecs.open("hello3.txt","rb"),b是binary,以二進制方式讀取,是錯誤的。
lines=" "
for line infile.readlines():
lines=lines+line
ys=lines.split(" ")
yss=[]
ays=list()
axs=list()
i=0
max1=pow(2,16)-1
for y inys:if y.strip()=="":continueyss.append(y)for index inrange(len(yss)):
y1=yss[index]
i+=1;
y=int(y1)
ays.append(y)
axs.append(i)#print i
file.close()
pl.plot(axs, ays,"ro")#use pylab to plot x and y
pl.show()#show the plot on the screen
得到波形圖
這里音頻振幅與audacity中呈現的結果吻合,只是這里把振幅放大以便用肉眼去觀察。
2019-11-20 更新:
經過實踐發展,可以使用時間單位來檢測該時間內的數據是否檢測振幅。
(數學不太好,隨便用一個字符代替說明一下)
設時間單位為t,音頻采樣率為S,如果連續的時間單位t時間內振幅很小(也可以計算分貝數),可以認為是靜音(沒有聲音錄入) 。
待檢驗數據長度L=S*t,則檢測目標是長度為L的數組,如果這個時間類振幅(分貝)數據小于閾值(threshold),則認為近似靜音。
例:采樣率16000,2秒以外則認為沒有聲音輸入。即 2*16000長度的數組內,所有數組低于一個閾值。
stackoverflow答案:
How can I detect silence when recording operation is started in Java?
Calculate the?dB?or?RMS?value for a group of sound frames and decide at what level it is considered to be 'silence'.
What is PCM data?
Data that is in?Pulse-code modulation?format.
How can I calculate PCM data in Java?
I do not understand that question. But guessing it has something to do with the?speech-recognition?tag, I have some bad news.
This might theoretically be done using the?Java Speech API. But there are apparently no 'speech to text' implementations available for the API (only 'text to speech').
I have to calculate rms for speech-recognition project. But I do not know how can I calculate in Java.
For a single channel that is represented by signal sizes in a?double?ranging from -1 to 1, you might use this method.
/**Computes the RMS volume of a group of signal sizes ranging from -1 to 1.*/
public double volumeRMS(double[] raw) {double sum =0d;if (raw.length==0) {returnsum;
}else{for (int ii=0; ii
sum+=raw[ii];
}
}double average = sum/raw.length;double sumMeanSquare =0d;for (int ii=0; ii
sumMeanSquare+= Math.pow(raw[ii]-average,2d);
}double averageMeanSquare = sumMeanSquare/raw.length;double rootMeanSquare =Math.sqrt(averageMeanSquare);returnrootMeanSquare;
}
There is a byte buffer to save input values from the line, and what I should have to do with this buffer?
If using the?volumeRMS(double[])?method, convert the?byte?values to an array of?double?values ranging from -1 to 1. ;)
筆者的思路是計算音頻分貝值,可以參考通過pcm音頻數據計算分貝
很多場合我們需要動態顯示實時聲音分貝,下面列舉三種計算分貝的算法。(以雙聲道為例,也就是一個short類型,最大能量值為32767)
1:計算分貝 音頻數據與大小
首先我們分別累加每個采樣點的數值,除以采樣個數,得到聲音平均能量值。
然后再將其做100與32767之間的等比量化。得到1-100的量化值。
通常情況下,人聲分布在較低的能量范圍,這樣就會使量化后的數據大致分布在1-20的較小區間,不能夠很敏感的感知變化。
所以我們將其做了5倍的放大,當然計算后大于100的值,我們將其賦值100.
//參數為數據,采樣個數//返回值為分貝
#define VOLUMEMAX 32767
int SimpleCalculate_DB(short* pcmData, intsample)
{
signedshort ret = 0;if (sample > 0){int sum = 0;
signedshort* pos = (signed short *)pcmData;for (int i = 0; i < sample; i++){
sum+= abs(*pos);
pos++;
}
ret= sum * 500.0 / (sample *VOLUMEMAX);if (ret >= 100){
ret= 100;
}
}returnret;
}
2:計算均方根(RMS) 即能量值
static const float kMaxSquaredLevel = 32768 * 32768;
constexprfloat kMinLevel = 30.f;void Process(const int16_t*data, size_t length)
{float sum_square_ = 0;
size_t sample_count_= 0;for (size_t i = 0; i < length; ++i) {
sum_square_+= data[i] *data[i];
}
sample_count_+=length;.float rms = sum_square_ / (sample_count_ *kMaxSquaredLevel);//20log_10(x^0.5) = 10log_10(x)
rms = 10 *log10(rms);if (rms < -kMinLevel)
rms= -kMinLevel;
rms= -rms;return static_cast(rms + 0.5);
}
3:獲取音頻數據最大的振幅(即絕對值最大)(0-32767),除以1000,得到(0-32)。從數組中獲取相應索引所對應的分貝值。(提取自webrtc)
const int8_t permutation[33] ={0,1,2,3,4,4,5,5,5,5,6,6,6,6,6,7,7,7,7,8,8,8,9,9,9,9,9,9,9,9,9,9,9};
int16_t WebRtcSpl_MaxAbsValueW16C(const int16_t*vector, size_t length)
{
size_t i= 0;int absolute = 0, maximum = 0;for (i = 0; i < length; i++) {
absolute= abs((int)vector[i]);if (absolute >maximum) {
maximum=absolute;
}
}if (maximum > 32767) {
maximum= 32767;
}return(int16_t)maximum;
}void ComputeLevel(const int16_t*data, size_t length)
{
int16_t _absMax= 0;
int16_t _count= 0;
int8_t _currentLevel= 0;
int16_t absValue(0);
absValue=WebRtcSpl_MaxAbsValueW16(data,length);if (absValue >_absMax)
_absMax=absValue;if (_count++ == 10) {
_count= 0;
int32_t position= _absMax/1000;if ((position == 0) && (_absMax > 250)){
position= 1;
}
_currentLevel=permutation[position];
_absMax>>= 2;
}
}
總結
以上是生活随笔為你收集整理的python pcm 分贝_语音文件 pcm 静默(静音)判断的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python和sas哪个有用考研_金融学
- 下一篇: 基于python的搜索引擎论文_技术分享