音频wav文件格式分析
一、音頻文件
? /usr/share/sounds/deepin/stereo/desktop-login.wav
二、文件信息
syli@syli-PC:~/work/repo/Demo/pa$ soxi desktop-login.wav Input File : 'desktop-login.wav' Channels : 2 Sample Rate : 44100 Precision : 16-bit Duration : 00:00:07.00 = 308700 samples = 525 CDDA sectors File Size : 1.23M Bit Rate : 1.41M Sample Encoding: 16-bit Signed Integer PCMsyli@syli-PC:~/work/repo/Demo/pa$ ls -al desktop-login.wav -rw-r--r-- 1 root root 1234878 6月 14 14:53 desktop-login.wavsyli@syli-PC:~/work/repo/Demo/pa$ file desktop-login.wav desktop-login.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz三、分析佐證
1. 音頻時長
duration = samples / (sample rate) 7 s = 308700 samples / 441002. 文件大小
1 sample = (Sample Encoding) * Channels / 8 bit 1 sample = 16(采樣深度) * 2 / 8(bit) = 4 (字節)size = (1 sample size) * samples size = 4 * 308700 = 1,234,800 (字節)整個文件大小 = 1234878 (字節)非數據文件大小 = 1,234,878 - 1,234,800 = 78(字節)3. 記錄速率
Bit Rate = (Sample Rate) * (1 sample size) (kb/s)= (Sample Rate) * ((Sample Encoding) * Channels) (kb/s) 1.41M = 44100 * 16 * 2 / 1000 / 1000 (Mb/s)4. 報文頭數據
? 查看十六進制數據
hexdump -C desktop-login.wav? 16bit 雙聲道示例
| 1 - 4 | “RIFF” | Marks the file as a riff file. Characters are each 1 byte long. 固定為0x52494646,標識為RIFF格式 |
| 5 - 8 | File size (integer) | Size of the overall file - 8 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation. 塊數據域大小(Chunk Size),即從下一個地址開始,到文件末尾的總字節數,或者文件總字節數-8。 從0x08開始一直到文件末尾,都是ID為"RIFF"塊的內容,其中會包含兩個子塊,"fmt “和"data” 0x0012d7b6 = 1,234,870 = 整個文件大小 - 8 |
| 9 -12 | “WAVE” | File Type Header. For our purposes, it always equals “WAVE”. 類型碼(Form Type),WAV文件格式標記,即"WAVE"四個字母 |
| 13-16 | “fmt " | Format chunk marker. Includes trailing null "fmt "子塊(0x666D7420),注意末尾的空格; |
| 17-20 | 16 | Length of format data as listed above 前面報文數據(SubChunk Size)的長度 |
| 21-22 | 1 | Type of format (1 is PCM) - 2 byte integer 編碼格式(Audio Format),1代表PCM無損格式; |
| 23-24 | 2 | Number of Channels - 2 byte integer 通道channels數量:2 |
| 25-28 | 44100 | Sample Rate - 32 byte integer. Common values are 44100 (CD), 48000 (DAT). Sample Rate = Number of Samples per second, or Hertz. 采樣率0xAC44 = 44100 采樣率也就是每秒的采樣數,或者HZ; |
| 29-32 | 176400 | (Sample Rate * BitsPerSample * Channels) / 8. 傳輸速率(Byte Rate),每秒數據字節數,SampleRate * Channels * BitsPerSample / 8 0x02 B110 = 176400 |
| 33-34 | 4 | (BitsPerSample * Channels) / 8 每個采樣所需的字節數,BitsPerSample*Channels/8 |
| 35-36 | 16 | Bits per sample 單個采樣位深(Bits Per Sample),可選8、16或32 |
| 37-40 | “data” | “data” chunk header. Marks the beginning of the data section. "data"子塊,標識數據部分的開始;0xs64 61 74 61 對應data字符串 |
| 41-44 | File size (data) | Size of the data section. 子塊數據域大小(SubChunk Size)0x 12 d7 70 = 1,234,800 |
? 如果fmt SubChunk Size等于0x10(16),表示頭部不包含附加信息,即WAV頭部信息長度為44;如果等于0x12(18),則包含附加信息,此時頭部信息長度大于44。
? 當WAV頭部包含附加信息時,fmt SubChunk Size長度為18,并且緊隨是另一個子塊,這個包含了一些自定義的附加信息,接著往下才是"data"子塊。
5. PCM數據
pcm size = (bytes per sample) * samples= ((Sample Encoding) * Channels / 8 bits) * samples= 16 * 2 / 8 * 308700= 1,234,800 bytes6. 文件末尾格式
77176 0012d770 ff ff 00 00 01 00 02 00 01 00 01 00 ff ff 00 00 |................| 77177 0012d780 00 00 01 00 00 00 ff ff ff ff ff ff 02 00 02 00 |................| 77178 0012d790 01 00 ff ff ff ff 00 00 00 00 00 00 4c 49 53 54 |............LIST| 77179 0012d7a0 1a 00 00 00 49 4e 46 4f 49 53 46 54 0e 00 00 00 |....INFOISFT....| 77180 0012d7b0 4c 61 76 66 35 36 2e 34 30 2e 31 30 31 00 |Lavf56.40.101.| 77181 0012d7be計算文件大小:
0x77180 * 16 - 2 = 1,234,878 PCM音頻數據大小 = 1,234,878 - 44(報文頭) - 34(報文尾) = 1234800Lavf56.40.101:說明這個音頻文件是用ffmpeg編碼的,lavf指的是libavformat,是ffmpeg的一個組件,后面數字是版本號;
四、音頻基本概念
? PCM(Pulse Code Modulation):脈沖編碼調制(PCM)是一種用于數字表示采樣模擬信號的方法。它是計算機、光盤、數字電話和其他數字音頻應用中的標準數字音頻形式。在PCM流中,模擬信號的振幅以均勻的間隔被定期采樣,每個樣本被量化為數字步長范圍內最接近的值。
? channel:聲道數,常見單聲道(mono)、立體聲(stereo)、環繞聲;
? sample:一次采樣,通常的sample bit指的是一個channnel上,一次采樣的bit數(常見的sample bit 8/16/24/32bits)
? rate:采樣率,即每秒的采樣次數,單位是frame;
? frame:一個frame是一次采樣時所有channel上的sample bit.即frame = channels * (sample bit)
? Interleaved:交錯模式,一種音頻數據的記錄方式,在交錯模式下,數據以連續楨的形式存放,即首先記錄完楨1的左聲道樣本和右聲道樣本(假設為立體聲),再開始楨2的記錄。而在非交錯模式下,首先記錄的是一個周期內所有楨的左聲道樣本,再記錄右聲道樣本,數據是以連續通道的方式存儲。多數情況下使用交錯模式。
? period:每當hardware buffer 中有peroid size個frame的空間時,硬件就產生中斷,來通知alsa driver來往硬件寫數據;
? Period size:周期,每次硬件中斷處理音頻數據的Frame個數,對于音頻設備的數據讀寫,單位是Frame。
? buffer size:數據緩沖區大小,是由多個peroid組成。buffer size = peroid size * peroids,peroids相當于處理完一個buffer數據所需的硬件中斷次數。
? xrun指的是,聲卡period一到,引發一個中斷,告訴alsa驅動,要填入數據,或讀走數據,但是,問題在于alsa的讀取和寫入操作必須用戶調用writei和readi才會發生的,它不會去緩存數據。如果上層沒有用戶調用writei和readi,那么就會產生 overrun(錄制時,數據都滿了,還沒被alsa驅動讀走)和underrun(需要數據來播放,alsa驅動卻不寫入數據),統稱為xrun。
? softvol:Softvol是一個高級Linux聲音架構(ALSA)插件,它將基于軟件的音量控制添加到ALSA音頻混音器(alsamixer)。當聲卡沒有硬件音量控制時,這是很有用的。softvol插件內置在ALSA中,不需要單獨安裝;軟音量的另一個用例是當硬件音量控制無法將聲音放大到超過某個閾值時,從而使音頻文件變得過于安靜。在這種情況下,可以創建軟件放大器,以提高音量水平,犧牲一些質量的代價。
? UCM:Alsa用例管理器(Use Case Manager)描述了如何為特定的用例(usecases)(如“播放音頻”,“呼叫”)設置混音器。它還描述了如何修改混頻器狀態,以路由音頻到某些輸出和輸入,以及如何控制這些設備。
frame計算示例:
Here is an alternative example for the above discussion.Say we want to work with a stereo, 16-bit, 44.1 KHz stream, one-way (meaning, either in playback or in capture direction). Then we have:'stereo' = number of channels: 2 1 analog sample is represented with 16 bits = 2 bytes 1 frame represents 1 analog sample from all channels; here we have 2 channels, and so: 1 frame = (num_channels) * (1 sample in bytes) = (2 channels) * (2 bytes (16 bits) per sample) = 4 bytes (32 bits) To sustain 2x 44.1 KHz analog rate - the system must be capable of data transfer rate, in Bytes/sec: Bps_rate = (num_channels) * (1 sample in bytes) * (analog_rate) = (1 frame) * (analog_rate) = ( 2 channels ) * (2 bytes/sample) * (44100 samples/sec) = 2*2*44100 = 176400 Bytes/sec五、精簡播放demo
#include <stdio.h> #include <stdlib.h> #include "include/asoundlib.h"#define MESSAGE(format, ...) printf("[%s][%s][%d]: " format "\n", __FILE__, __FUNCTION__, __LINE__, ##__VA_ARGS__)static snd_output_t *log; static unsigned buffer_time = 0; static unsigned period_time = 0; static int start_delay = 0; static int stop_delay = 0;void dump_hw_params(snd_pcm_t *handle, snd_pcm_hw_params_t *params, snd_output_t *log) {fprintf(stderr, "Params of device \"%s\":\n",snd_pcm_name(handle));fprintf(stderr, "--------------------\n");snd_pcm_hw_params_dump(params, log);fprintf(stderr, "--------------------\n"); }snd_pcm_t* device_create(void) {int ret = -1; // return value;int n;char *hw_name = "default"; // sound card device name;int direction = 0;int channel = 2;int sample_rate = 44100;snd_pcm_uframes_t chunk_size = 1024;snd_pcm_uframes_t buffer_size = 0;snd_pcm_t *handle; //PCM設備句柄snd_pcm_hw_params_t *hw_params; //硬件信息和PCM流配置snd_pcm_sw_params_t *swparams;snd_pcm_uframes_t start_threshold, stop_threshold;/* step 1: 打開PCM,最后一個參數為0意味著標準配置 */ret = snd_pcm_open(&handle, hw_name, SND_PCM_STREAM_PLAYBACK, 0);if (ret < 0) {perror("snd_pcm_open");return NULL;}MESSAGE();/* step 2: 創建snd_pcm_hw_params_t結構體 */ret = snd_pcm_hw_params_malloc(&hw_params);if (ret < 0) {perror("snd_pcm_hw_params_malloc");goto failed;}MESSAGE();/* step 3: 初始化hw_params */ret = snd_pcm_hw_params_any(handle, hw_params);if (ret < 0) {perror("snd_pcm_hw_params_any");goto failed;}MESSAGE();/* step 4: 初始化訪問權限 */// snd_pcm_readi/snd_pcm_writei accessret = snd_pcm_hw_params_set_access(handle, hw_params, SND_PCM_ACCESS_RW_INTERLEAVED);if (ret < 0) {perror("snd_pcm_hw_params_set_access");goto failed;}MESSAGE();/* step 5: 初始化采樣格式SND_PCM_FORMAT_S16_LE */ret = snd_pcm_hw_params_set_format(handle, hw_params, SND_PCM_FORMAT_S16_LE);if (ret < 0) {perror("snd_pcm_hw_params_set_format");goto failed;}MESSAGE();/* step 6: 設置采樣率,如果硬件不支持我們設置的采樣率,將使用最接近的 */ret = snd_pcm_hw_params_set_rate_near(handle, hw_params, &sample_rate, &direction);if (ret < 0) {perror("snd_pcm_hw_params_set_rate_near");goto failed;}MESSAGE();/* step 7: 設置通道數量 */ret = snd_pcm_hw_params_set_channels(handle, hw_params, channel);if (ret < 0) {perror("snd_pcm_hw_params_set_channels");goto failed;}MESSAGE();/* get the buffer time */ret = snd_pcm_hw_params_get_buffer_time_max(hw_params, &buffer_time, 0);MESSAGE("buffer_time:%d", buffer_time);if (buffer_time > 500000)buffer_time = 500000;/* calc period time */if (buffer_time > 0)period_time = buffer_time / 4;MESSAGE("period time:%d", period_time);/* set period time */if (period_time > 0)ret = snd_pcm_hw_params_set_period_time_near(handle, hw_params, &period_time, 0);MESSAGE("period time:%d", period_time);MESSAGE("buffer time:%d", buffer_time);/* set buffer time */if (buffer_time > 0)ret = snd_pcm_hw_params_set_buffer_time_near(handle, hw_params, &buffer_time, 0);MESSAGE("buffer time:%d", buffer_time);/* step 8: 設置hw_params參數 */ret = snd_pcm_hw_params(handle, hw_params);if (ret < 0) {perror("snd_pcm_hw_params");goto failed;}MESSAGE();/* for debug info */dump_hw_params(handle, hw_params, log);#if 0/* soft params */snd_pcm_hw_params_get_period_size(hw_params, &chunk_size, 0);snd_pcm_hw_params_get_buffer_size(hw_params, &buffer_size);snd_pcm_sw_params_alloca(&swparams);snd_pcm_sw_params_current(handle, swparams);n = chunk_size;ret = snd_pcm_sw_params_set_avail_min(handle, swparams, n);n = buffer_size;start_threshold = n + (double) sample_rate * start_delay / 1000000;if (start_threshold < 1)start_threshold = 1;if (start_threshold > n)start_threshold = n;ret = snd_pcm_sw_params_set_start_threshold(handle, swparams, start_threshold);stop_threshold = buffer_size + (double) sample_rate * stop_delay / 1000000;ret = snd_pcm_sw_params_set_stop_threshold(handle, swparams, stop_threshold);ret = snd_pcm_sw_params(handle, swparams);/* for debug info */snd_pcm_sw_params_dump(swparams, log); #endifreturn handle; failed:snd_pcm_close(handle);return NULL; }void device_play(snd_pcm_t *pcm_handle, FILE *fp) {int ret = -1;int size = 5512;char *buffer;int frame;buffer = (char *) malloc(size);MESSAGE("size=%d\n", size);frame = size / 4;while (1){ret = fread(buffer, 1, size, fp);if(ret == 0){fprintf(stderr, "end of file on input\n");break;}/* step 9: 寫音頻數據到PCM設備 */// MESSAGE("fread ret:%d", ret);while(ret = snd_pcm_writei(pcm_handle, buffer, frame)<0){usleep(2000);if (ret == -EPIPE){/* EPIPE means underrun */fprintf(stderr, "underrun occurred\n");//完成硬件參數設置,使設備準備好snd_pcm_prepare(pcm_handle);MESSAGE();}else if (ret < 0){fprintf(stderr, "error from writei: %s\n", snd_strerror(ret));break;}}}MESSAGE(); }void device_destroy(snd_pcm_t *pcm_handle) {//10. 關閉PCM設備句柄snd_pcm_drain(pcm_handle);snd_pcm_close(pcm_handle);MESSAGE(); }int main(int argc, char *argv[]) {FILE *fp;snd_pcm_t *handle; //PCM設備句柄pcm.hif (argc != 2) {printf("error: play [music name]\n");return -1;}fp = fopen(argv[1], "rb");if(fp == NULL)return -1;snd_output_stdio_attach(&log, stderr, 0);handle = device_create();device_play(handle, fp);device_destroy(handle);snd_output_close(log);fclose(fp);return 0; }附錄1–參考網址
? wav報文頭格式說明
? https://docs.fileformat.com/audio/wav/
? https://juejin.cn/post/6844904051964903431
總結
以上是生活随笔為你收集整理的音频wav文件格式分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 疲劳驾驶数据集_Lyft开源L5自动驾驶
- 下一篇: ubuntu的应用商店打不开,闪退