读取 wav 格式声音文件
讀取 wav 格式聲音文件
http://bigsec.net/b52/scipydoc/wave_pyaudio.html
Python 支持 wav 文件的讀寫,實時的聲音輸入輸出需要安裝 pyAudio,pyMedia 進行 MP3 的解碼和播放。
wav 是 Microsoft 開發的一種聲音文件格式,通常被用來保存未壓縮的聲音數據 (Pulse Code Modulation,PCM,脈沖編碼調制)。wav 有三個重要的參數:聲道數、采樣頻率和量化位數。
聲道數:單聲道 (mono) 或者是雙聲道 (stereo)。
采樣頻率:每秒鐘聲音信號的采集次數。常用的有 8kHz、16kHz、32kHz、48kHz、11.025kHz、22.05kHz、44.1kHz 等。
量化位數:用多少 bit 表達一次采樣所采集的數據,通常有 8bit、16bit、24bit 和 32bit 等。CD 中所儲存的聲音信號是雙聲道、44.1kHz、16bit。
如果你需要自己錄制和編輯聲音文件,推薦使用 Audacity。它是一款開源的、跨平臺、多聲道的錄音編輯軟件。在工作中使用 Audacity 進行聲音信號的錄制,然后再輸出成 wav 文件供 Python 程序處理。
1. C:\Windows\media
(base) yongqiang@yongqiang:~$ cd /mnt/f/yongqiang_work/ (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$ ll total 260 drwxrwxrwx 1 yongqiang yongqiang 4096 Jun 4 00:47 ./ drwxrwxrwx 1 yongqiang yongqiang 4096 Jun 3 22:11 ../ -rwxrwxrwx 1 yongqiang yongqiang 191788 Sep 15 2018 Windows_Ding.wav* -rwxrwxrwx 1 yongqiang yongqiang 70060 Sep 15 2018 ding.wav* (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$ (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$ pwd /mnt/f/yongqiang_work (base) yongqiang@yongqiang:/mnt/f/yongqiang_work$2. 讀 wav 格式聲音文件
#!/usr/bin/env python # -*- coding: utf-8 -*- # yongqiang chengfrom __future__ import absolute_import from __future__ import division from __future__ import print_functionimport wave import numpy as np# WAV file audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6] print("nchannels = ", nchannels) print("sampwidth = ", sampwidth) print("framerate = ", framerate) print("nframes = ", nframes) print("comptype = ", comptype) print("compname = ", compname)# Returns number of audio channels (1 for mono, 2 for stereo). print("object.getnchannels() = ", object.getnchannels())# Returns sample width in bytes. print("object.getsampwidth() = ", object.getsampwidth())# Returns sampling frequency. print("object.getframerate() = ", object.getframerate())# Returns number of audio frames. print("object.getnframes() = ", object.getnframes())# Returns compression type ('NONE' is the only supported type). print("object.getcomptype() = ", object.getcomptype())# Human-readable version of getcomptype(). Usually 'not compressed' parallels 'NONE'. print("object.getcompname() = ", object.getcompname())# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) object.close() /home/yongqiang/miniconda3/envs/tf_cpu_1.4.1/bin/python /home/yongqiang/pycharm_work/yongqiang.py nchannels = 2 sampwidth = 2 framerate = 44100 nframes = 17504 comptype = NONE compname = not compressed object.getnchannels() = 2 object.getsampwidth() = 2 object.getframerate() = 44100 object.getnframes() = 17504 object.getcomptype() = NONE object.getcompname() = not compressedProcess finished with exit code 03. 讀 wav 格式聲音文件
#!/usr/bin/env python # -*- coding: utf-8 -*- # yongqiang chengfrom __future__ import absolute_import from __future__ import division from __future__ import print_functionimport wave import numpy as np import matplotlib.pyplot as plt# WAV file audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6] print("nchannels =", nchannels) print("sampwidth =", sampwidth) print("framerate =", framerate) print("nframes =", nframes) print("comptype =", comptype) print("compname =", compname)# Returns number of audio channels (1 for mono, 2 for stereo). print("object.getnchannels() =", object.getnchannels())# Returns sample width in bytes. print("object.getsampwidth() =", object.getsampwidth())# Returns sampling frequency. print("object.getframerate() =", object.getframerate())# Returns number of audio frames. print("object.getnframes() =", object.getnframes())# Returns compression type ('NONE' is the only supported type). print("object.getcomptype() =", object.getcomptype())# Human-readable version of getcomptype(). Usually 'not compressed' parallels 'NONE'. print("object.getcompname() =", object.getcompname())# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) # nframes = 17504, channels = 2, sampwidth = 2 # str_data (bytes: 70016) = nframes * channels * sampwidth = 17504 * 2 * 2 = 70016 object.close()wave_data = np.fromstring(str_data, dtype=np.short) wave_data.shape = -1, 2 wave_data = wave_data.T time = np.arange(0, nframes) * (1.0 / framerate)plt.subplot(211) plt.plot(time, wave_data[0]) plt.xlabel("left channel - time (seconds)") plt.subplot(212) plt.plot(time, wave_data[1], c="g") plt.xlabel("right channel - time (seconds)") plt.show() /home/yongqiang/miniconda3/envs/pt-1.4_py-3.6/bin/python /home/yongqiang/pycharm_work/yongqiang.py nchannels = 2 sampwidth = 2 framerate = 44100 nframes = 17504 comptype = NONE compname = not compressed object.getnchannels() = 2 object.getsampwidth() = 2 object.getframerate() = 44100 object.getnframes() = 17504 object.getcomptype() = NONE object.getcompname() = not compressedProcess finished with exit code 0Python 調用 wave.open 打開 wav 文件,注意需要使用 "rb" (二進制模式) 打開文件:
audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")open 返回一個 Wave_read 類的實例,通過調用它的方法讀取 wav 文件的格式和數據:
getparams:一次性返回所有的 wav 文件的格式信息,它返回的是一個組元 (tuple):聲道數,量化位數 (byte 單位),采樣頻率,采樣點數,壓縮類型,壓縮類型的描述。wave 模塊只支持非壓縮的數據,因此可以忽略最后兩個信息。
# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6]nchannels, sampwidth, framerate, nframes, comptype, compname 等方法可以單獨返回 wav 文件的特定的信息。
readframes:讀取聲音數據,傳遞一個參數指定需要讀取的長度 (以取樣點為單位),readframes 返回的是二進制數據 (bytes),在 Python 中用字符串表示二進制數據。
# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) # nframes = 17504, channels = 2, sampwidth = 2 # str_data (bytes: 70016) = nframes * channels * sampwidth = 17504 * 2 * 2 = 70016接下來需要根據聲道數和量化單位,將讀取的二進制數據轉換為一個可以計算的數組:
wave_data = np.fromstring(str_data, dtype=np.short)通過 fromstring 函數將字符串轉換為數組,通過其參數 dtype 指定轉換后的數據格式,由于我們的聲音格式是以兩個字節表示一個取樣值,因此采用 short 數據類型轉換。現在得到的 wave_data 是一個一維的 short 類型的數組,但是因為我們的聲音文件是雙聲道的,因此它由左右兩個聲道的取樣交替構成:LRLRLRLR....LR (L 表示左聲道的取樣值,R 表示右聲道取樣值)。修改wave_data 的 sharp 之后:
wave_data.shape = -1, 2將其轉置得到:
wave_data = wave_data.T最后通過取樣點數和取樣頻率計算出每個取樣的時間:
time = np.arange(0, nframes) * (1.0 / framerate)4. sample width in bytes
#!/usr/bin/env python # -*- coding: utf-8 -*- # yongqiang chengfrom __future__ import absolute_import from __future__ import division from __future__ import print_functionimport wave import numpy as np import matplotlib.pyplot as plt# WAV file audio_file = "/mnt/f/yongqiang_work/ding.wav" object = wave.open(audio_file, "rb")# (nchannels, sampwidth, framerate, nframes, comptype, compname) params = object.getparams() nchannels, sampwidth, framerate, nframes, comptype, compname = params[:6] print("nchannels =", nchannels) print("sampwidth =", sampwidth) print("framerate =", framerate) print("nframes =", nframes) print("comptype =", comptype) print("compname =", compname)# Returns number of audio channels (1 for mono, 2 for stereo). print("object.getnchannels() =", object.getnchannels())# Returns sample width in bytes. print("object.getsampwidth() =", object.getsampwidth())# Returns sampling frequency. print("object.getframerate() =", object.getframerate())# Returns number of audio frames. print("object.getnframes() =", object.getnframes())# Returns compression type ('NONE' is the only supported type). print("object.getcomptype() =", object.getcomptype())# Human-readable version of getcomptype(). Usually 'not compressed' parallels 'NONE'. print("object.getcompname() =", object.getcompname())# Reads and returns at most n frames of audio, as a bytes object. str_data = object.readframes(nframes) # nframes = 17504, channels = 2, sampwidth = 2 # str_data (bytes: 70016) = nframes * channels * sampwidth = 17504 * 2 * 2 = 70016 num_bytes = len(str_data) # num_bytes = 70016 print("num_bytes =", num_bytes, "bytes") object.close()wave_data = np.fromstring(str_data, dtype=np.short) wave_data.shape = -1, 2 wave_data = wave_data.T time = np.arange(0, nframes) * (1.0 / framerate)plt.subplot(211) plt.plot(time, wave_data[0]) plt.xlabel("left channel - time (seconds)") plt.subplot(212) plt.plot(time, wave_data[1], c="g") plt.xlabel("right channel - time (seconds)") plt.show() /home/yongqiang/miniconda3/envs/pt-1.4_py-3.6/bin/python /home/yongqiang/pytorch_work/end2end-asr-pytorch-example/yongqiang.py nchannels = 2 sampwidth = 2 framerate = 44100 nframes = 17504 comptype = NONE compname = not compressed object.getnchannels() = 2 object.getsampwidth() = 2 object.getframerate() = 44100 object.getnframes() = 17504 object.getcomptype() = NONE object.getcompname() = not compressed num_bytes = 70016 bytesProcess finished with exit code 0References
http://bigsec.net/b52/scipydoc/wave_pyaudio.html
總結
以上是生活随笔為你收集整理的读取 wav 格式声音文件的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 动态版简易通讯录制作
- 下一篇: 自考 软件工程专业 07029 软件项目