當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

读取 wav 格式声音文件

發布時間：2023/12/14 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了读取 wav 格式声音文件小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

讀取 wav 格式聲音文件

http://bigsec.net/b52/scipydoc/wave_pyaudio.html

Python 支持 wav 文件的讀寫，實時的聲音輸入輸出需要安裝 pyAudio，pyMedia 進行 MP3 的解碼和播放。

wav 是 Microsoft 開發的一種聲音文件格式，通常被用來保存未壓縮的聲音數據（Pulse Code Modulation，PCM，脈沖編碼調制)。wav 有三個重要的參數：聲道數、采樣頻率和量化位數。

聲道數：單聲道 (mono) 或者是雙聲道 (stereo)。
采樣頻率：每秒鐘聲音信號的采集次數。常用的有 8kHz、16kHz、32kHz、48kHz、11.025kHz、22.05kHz、44.1kHz 等。
量化位數：用多少 bit 表達一次采樣所采集的數據，通常有 8bit、16bit、24bit 和 32bit 等。CD 中所儲存的聲音信號是雙聲道、44.1kHz、16bit。

如果你需要自己錄制和編輯聲音文件，推薦使用 Audacity。它是一款開源的、跨平臺、多聲道的錄音編輯軟件。在工作中使用 Audacity 進行聲音信號的錄制，然后再輸出成 wav 文件供 Python 程序處理。

1. C:\Windows\media

(base) yongqiang@yongqiang:~$ cd /mnt/f/yongqiang_work/ (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$ ll total 260 drwxrwxrwx 1 yongqiang yongqiang 4096 Jun 4 00:47 ./ drwxrwxrwx 1 yongqiang yongqiang 4096 Jun 3 22:11 ../ -rwxrwxrwx 1 yongqiang yongqiang 191788 Sep 15 2018 Windows_Ding.wav* -rwxrwxrwx 1 yongqiang yongqiang 70060 Sep 15 2018 ding.wav* (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$ (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$ pwd /mnt/f/yongqiang_work (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$

2. 讀 wav 格式聲音文件

#!/usr/bin/env python # -*- coding: utf-8 -*- # yongqiang chengfrom __future__ import absolute_import from __future__ import division from __future__ import print_functionimport wave import numpy as np# WAV file audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6] print("nchannels = ", nchannels) print("sampwidth = ", sampwidth) print("framerate = ", framerate) print("nframes = ", nframes) print("comptype = ", comptype) print("compname = ", compname)# Returns number of audio channels (1 for mono, 2 for stereo). print("object.getnchannels() = ", object.getnchannels())# Returns sample width in bytes. print("object.getsampwidth() = ", object.getsampwidth())# Returns sampling frequency. print("object.getframerate() = ", object.getframerate())# Returns number of audio frames. print("object.getnframes() = ", object.getnframes())# Returns compression type ('NONE' is the only supported type). print("object.getcomptype() = ", object.getcomptype())# Human-readable version of getcomptype(). Usually 'not compressed' parallels 'NONE'. print("object.getcompname() = ", object.getcompname())# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) object.close() /home/yongqiang/miniconda3/envs/tf_cpu_1.4.1/bin/python /home/yongqiang/pycharm_work/yongqiang.py nchannels = 2 sampwidth = 2 framerate = 44100 nframes = 17504 comptype = NONE compname = not compressed object.getnchannels() = 2 object.getsampwidth() = 2 object.getframerate() = 44100 object.getnframes() = 17504 object.getcomptype() = NONE object.getcompname() = not compressedProcess finished with exit code 0

3. 讀 wav 格式聲音文件

#!/usr/bin/env python # -*- coding: utf-8 -*- # yongqiang chengfrom __future__ import absolute_import from __future__ import division from __future__ import print_functionimport wave import numpy as np import matplotlib.pyplot as plt# WAV file audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6] print("nchannels =", nchannels) print("sampwidth =", sampwidth) print("framerate =", framerate) print("nframes =", nframes) print("comptype =", comptype) print("compname =", compname)# Returns number of audio channels (1 for mono, 2 for stereo). print("object.getnchannels() =", object.getnchannels())# Returns sample width in bytes. print("object.getsampwidth() =", object.getsampwidth())# Returns sampling frequency. print("object.getframerate() =", object.getframerate())# Returns number of audio frames. print("object.getnframes() =", object.getnframes())# Returns compression type ('NONE' is the only supported type). print("object.getcomptype() =", object.getcomptype())# Human-readable version of getcomptype(). Usually 'not compressed' parallels 'NONE'. print("object.getcompname() =", object.getcompname())# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) # nframes = 17504, channels = 2, sampwidth = 2 # str_data (bytes: 70016) = nframes * channels * sampwidth = 17504 * 2 * 2 = 70016 object.close()wave_data = np.fromstring(str_data, dtype=np.short) wave_data.shape = -1, 2 wave_data = wave_data.T time = np.arange(0, nframes) * (1.0 / framerate)plt.subplot(211) plt.plot(time, wave_data[0]) plt.xlabel("left channel - time (seconds)") plt.subplot(212) plt.plot(time, wave_data[1], c="g") plt.xlabel("right channel - time (seconds)") plt.show() /home/yongqiang/miniconda3/envs/pt-1.4_py-3.6/bin/python /home/yongqiang/pycharm_work/yongqiang.py nchannels = 2 sampwidth = 2 framerate = 44100 nframes = 17504 comptype = NONE compname = not compressed object.getnchannels() = 2 object.getsampwidth() = 2 object.getframerate() = 44100 object.getnframes() = 17504 object.getcomptype() = NONE object.getcompname() = not compressedProcess finished with exit code 0

Python 調用 wave.open 打開 wav 文件，注意需要使用 "rb" (二進制模式) 打開文件：

audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")

open 返回一個 Wave_read 類的實例，通過調用它的方法讀取 wav 文件的格式和數據：

getparams：一次性返回所有的 wav 文件的格式信息，它返回的是一個組元 (tuple)：聲道數，量化位數 (byte 單位)，采樣頻率，采樣點數，壓縮類型，壓縮類型的描述。wave 模塊只支持非壓縮的數據，因此可以忽略最后兩個信息。

# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6]

nchannels, sampwidth, framerate, nframes, comptype, compname 等方法可以單獨返回 wav 文件的特定的信息。

readframes：讀取聲音數據，傳遞一個參數指定需要讀取的長度 (以取樣點為單位)，readframes 返回的是二進制數據 (bytes)，在 Python 中用字符串表示二進制數據。

# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) # nframes = 17504, channels = 2, sampwidth = 2 # str_data (bytes: 70016) = nframes * channels * sampwidth = 17504 * 2 * 2 = 70016

接下來需要根據聲道數和量化單位，將讀取的二進制數據轉換為一個可以計算的數組：

wave_data = np.fromstring(str_data, dtype=np.short)

通過 fromstring 函數將字符串轉換為數組，通過其參數 dtype 指定轉換后的數據格式，由于我們的聲音格式是以兩個字節表示一個取樣值，因此采用 short 數據類型轉換。現在得到的 wave_data 是一個一維的 short 類型的數組，但是因為我們的聲音文件是雙聲道的，因此它由左右兩個聲道的取樣交替構成：LRLRLRLR....LR (L 表示左聲道的取樣值，R 表示右聲道取樣值)。修改wave_data 的 sharp 之后：

wave_data.shape = -1, 2

將其轉置得到：

wave_data = wave_data.T

最后通過取樣點數和取樣頻率計算出每個取樣的時間：

time = np.arange(0, nframes) * (1.0 / framerate)

4. sample width in bytes

#!/usr/bin/env python # -*- coding: utf-8 -*- # yongqiang chengfrom __future__ import absolute_import from __future__ import division from __future__ import print_functionimport wave import numpy as np import matplotlib.pyplot as plt# WAV file audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6] print("nchannels =", nchannels) print("sampwidth =", sampwidth) print("framerate =", framerate) print("nframes =", nframes) print("comptype =", comptype) print("compname =", compname)# Returns number of audio channels (1 for mono, 2 for stereo). print("object.getnchannels() =", object.getnchannels())# Returns sample width in bytes. print("object.getsampwidth() =", object.getsampwidth())# Returns sampling frequency. print("object.getframerate() =", object.getframerate())# Returns number of audio frames. print("object.getnframes() =", object.getnframes())# Returns compression type ('NONE' is the only supported type). print("object.getcomptype() =", object.getcomptype())# Human-readable version of getcomptype(). Usually 'not compressed' parallels 'NONE'. print("object.getcompname() =", object.getcompname())# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) # nframes = 17504, channels = 2, sampwidth = 2 # str_data (bytes: 70016) = nframes * channels * sampwidth = 17504 * 2 * 2 = 70016 num_bytes = len(str_data) # num_bytes = 70016 print("num_bytes =", num_bytes, "bytes") object.close()wave_data = np.fromstring(str_data, dtype=np.short) wave_data.shape = -1, 2 wave_data = wave_data.T time = np.arange(0, nframes) * (1.0 / framerate)plt.subplot(211) plt.plot(time, wave_data[0]) plt.xlabel("left channel - time (seconds)") plt.subplot(212) plt.plot(time, wave_data[1], c="g") plt.xlabel("right channel - time (seconds)") plt.show() /home/yongqiang/miniconda3/envs/pt-1.4_py-3.6/bin/python /home/yongqiang/pytorch_work/end2end-asr-pytorch-example/yongqiang.py nchannels = 2 sampwidth = 2 framerate = 44100 nframes = 17504 comptype = NONE compname = not compressed object.getnchannels() = 2 object.getsampwidth() = 2 object.getframerate() = 44100 object.getnframes() = 17504 object.getcomptype() = NONE object.getcompname() = not compressed num_bytes = 70016 bytesProcess finished with exit code 0

References