maad.sound.spectrogram
- maad.sound.spectrogram(x, fs, window='hann', nperseg=1024, noverlap=None, flims=None, tlims=None, mode='psd', verbose=False, display=False, savefig=None, **kwargs)[source]
Compute a spectrogram using the short-time Fourier transform from an audio signal.
- The function can compute diferent outputs according to the parameter ‘mode’:
power (mode=’psd’)
amplitude (mode = ‘amplitude’) => sqrt(power)
complex with real and imaginary parts (mode = ‘complex’)
- Parameters:
- x1d ndarray
Vector containing the sound waveform
- fsint
The sampling frequency in Hz
- windowstr or tuple or array_like, optional, default to ‘hann’
Desired window to use. If window is a string or tuple, it is passed to get_window to generate the window values, which are DFT-even by default. See get_window for a list of windows and required parameters. If window is array_like it will be used directly as the window and its length must be nperseg.
- npersegint, optional. Defaults to 1024.
Length of the segment used to compute the FFT. No zero padding. For fast calculation, it’s better to use a number that is a power 2. This parameter sets the resolution in frequency as the spectrogram will contains nperseg/2 frequency bins between 0Hz-(fs/2)Hz, with a resolution df = fs/nperseg It sets also the time slot (dt) of each frequency frames : dt = nperseg/fs The higher is the number, the lower is the resolution in time (dt) but better is the resolution in frequency (df).
- noverlapint, optional. Defaults to None.
Number of points to overlap between segments. If None, noverlap = nperseg // 2.
- modestr, optional. Default is ‘psd’
Choose the output between - ‘psd’ : Power Spectral Density - ‘amplitude’ : module of the stft (sqrt(psd)) - ‘complex’ : real and imaginary part of the stft
- flims, tlimslist of 2 scalars [min, max], optional, default is None
flims corresponds to the min and max boundary frequency values tlims corresponds to the min and max boundary time values
- verboseboolean, optional, default is False
print messages into the consol or terminal if verbose is True
- displayboolean, optional, default is False
Display the signal if True
- savefigstring, optional, default is None
Root filename (with full path) is required to save the figures. Postfix is added to the root filename.
- **kwargs, optional. This parameter is used by plt.plot and savefig functions
- savefilenamestr, optional, default :’_filt_audiogram.png’
Postfix of the figure filename
- db_rangescalar, optional, default100
if db_range is a number, anything lower than -db_range is set to -db_range and anything larger than 0 is set to 0
- figsizetuple of integers, optional, default: (4,10)
width, height in inches.
- titlestring, optional, default‘Spectrogram’
title of the figure
- xlabelstring, optional, default‘Time [s]’
label of the horizontal axis
ylabel : string, optional, default : ‘Amplitude [AU]’
- cmapstring or Colormap object, optional, default is ‘gray’
See https://matplotlib.org/examples/color/colormaps_reference.html in order to get all the existing colormaps examples: ‘hsv’, ‘hot’, ‘bone’, ‘tab20c’, ‘jet’, ‘seismic’, ‘viridis’…
- vmin, vmaxscalar, optional, default: None
vmin and vmax are used in conjunction with norm to normalize luminance data. Note if you pass a norm instance, your settings for vmin and vmax will be ignored.
- extentlist of scalars [left, right, bottom, top], optional, default: None
The location, in data-coordinates, of the lower-left and upper-right corners. If None, the image is positioned such that the pixel centers fall on zero-based (row, column) indices.
- dpiinteger, optional, default is 96
Dot per inch. For printed version, choose high dpi (i.e. dpi=300) => slow For screen version, choose low dpi (i.e. dpi=96) => fast
- formatstring, optional, default is ‘png’
Format to save the figure
… and more, see matplotlib
- Returns:
- Sxx2d ndarray of floats
Spectrogram : Matrix containing K frames with N/2 frequency bins, K*N <= length (wave) Sxx unit is power => Sxx_power if mode is ‘psd’ Sxx unit is amplitude => Sxx_ampli if mode is ‘amplitude’ or ‘complex’
- tn1d ndarray of floats
time vector (horizontal x-axis)
- fn1d ndarray of floats
Frequency vector (vertical y-axis)
- extentlist of scalars [left, right, bottom, top]
The location, in data-coordinates, of the lower-left and upper-right corners.
Notes
This function takes care of the energy conservation which is crucial when working with sound pressure level (dB SPL)
Examples
>>> s,fs = maad.sound.load("../data/rock_savanna.wav")
Compute energy of signal s
>>> E1 = sum(s**2) >>> maad.util.power2dB(E1) 44.861029507805256
Compute the spectrogram with ‘psd’ output (if N<4096, the energy is lost)
>>> N = 4096 >>> Sxx_power,tn,fn,ext = maad.sound.spectrogram (s, fs, nperseg=N, noverlap=N//2, mode = 'psd')
Display Power Spectrogram
>>> Sxx_dB = maad.util.power2dB(Sxx_power) # convert into dB >>> fig_kwargs = {'vmax': Sxx_dB.max(), 'vmin':-70, 'extent':ext, 'figsize':(4,13), 'title':'Power spectrogram density (PSD)', 'xlabel':'Time [sec]', 'ylabel':'Frequency [Hz]', } >>> fig, ax = maad.util.plot2d(Sxx_dB,**fig_kwargs)
Compute mean power spectrogram
>>> S_power_mean = maad.sound.avg_power_spectro(Sxx_power)
energy => power x time
>>> E2 = sum(S_power_mean*len(s)) >>> maad.util.power2dB(E2) 44.93083283875093
Compute the spectrogram with ‘amplitude’ output
>>> Sxx_ampli,tn,fn,_ = maad.sound.spectrogram (s, fs, nperseg=N, noverlap=N//2, mode='amplitude')
For energy conservation => convert Sxx_ampli (amplitude) into power before doing the average.
>>> S_ampli_mean = maad.sound.avg_amplitude_spectro(Sxx_ampli) >>> S_power_mean = S_ampli_mean**2
energy => power x time
>>> E3 = sum(S_power_mean*len(s)) >>> maad.util.power2dB(E3) 44.93083283875093