maad.sound.spectrogram

maad.sound.spectrogram(x, fs, window='hann', nperseg=1024, noverlap=None, flims=None, tlims=None, mode='psd', verbose=False, display=False, savefig=None, **kwargs)[source]

Compute a spectrogram using the short-time Fourier transform from an audio signal.

The function can compute diferent outputs according to the parameter ‘mode’:
  • power (mode=’psd’)

  • amplitude (mode = ‘amplitude’) => sqrt(power)

  • complex with real and imaginary parts (mode = ‘complex’)

Parameters:
x1d ndarray

Vector containing the sound waveform

fsint

The sampling frequency in Hz

windowstr or tuple or array_like, optional, default to ‘hann’

Desired window to use. If window is a string or tuple, it is passed to get_window to generate the window values, which are DFT-even by default. See get_window for a list of windows and required parameters. If window is array_like it will be used directly as the window and its length must be nperseg.

npersegint, optional. Defaults to 1024.

Length of the segment used to compute the FFT. No zero padding. For fast calculation, it’s better to use a number that is a power 2. This parameter sets the resolution in frequency as the spectrogram will contains nperseg/2 frequency bins between 0Hz-(fs/2)Hz, with a resolution df = fs/nperseg It sets also the time slot (dt) of each frequency frames : dt = nperseg/fs The higher is the number, the lower is the resolution in time (dt) but better is the resolution in frequency (df).

noverlapint, optional. Defaults to None.

Number of points to overlap between segments. If None, noverlap = nperseg // 2.

modestr, optional. Default is ‘psd’

Choose the output between - ‘psd’ : Power Spectral Density - ‘amplitude’ : module of the stft (sqrt(psd)) - ‘complex’ : real and imaginary part of the stft

flims, tlimslist of 2 scalars [min, max], optional, default is None

flims corresponds to the min and max boundary frequency values tlims corresponds to the min and max boundary time values

verboseboolean, optional, default is False

print messages into the consol or terminal if verbose is True

displayboolean, optional, default is False

Display the signal if True

savefigstring, optional, default is None

Root filename (with full path) is required to save the figures. Postfix is added to the root filename.

**kwargs, optional. This parameter is used by plt.plot and savefig functions
  • savefilenamestr, optional, default :’_filt_audiogram.png’

    Postfix of the figure filename

  • db_rangescalar, optional, default100

    if db_range is a number, anything lower than -db_range is set to -db_range and anything larger than 0 is set to 0

  • figsizetuple of integers, optional, default: (4,10)

    width, height in inches.

  • titlestring, optional, default‘Spectrogram’

    title of the figure

  • xlabelstring, optional, default‘Time [s]’

    label of the horizontal axis

  • ylabel : string, optional, default : ‘Amplitude [AU]’

  • cmapstring or Colormap object, optional, default is ‘gray’

    See https://matplotlib.org/examples/color/colormaps_reference.html in order to get all the existing colormaps examples: ‘hsv’, ‘hot’, ‘bone’, ‘tab20c’, ‘jet’, ‘seismic’, ‘viridis’…

  • vmin, vmaxscalar, optional, default: None

    vmin and vmax are used in conjunction with norm to normalize luminance data. Note if you pass a norm instance, your settings for vmin and vmax will be ignored.

  • extentlist of scalars [left, right, bottom, top], optional, default: None

    The location, in data-coordinates, of the lower-left and upper-right corners. If None, the image is positioned such that the pixel centers fall on zero-based (row, column) indices.

  • dpiinteger, optional, default is 96

    Dot per inch. For printed version, choose high dpi (i.e. dpi=300) => slow For screen version, choose low dpi (i.e. dpi=96) => fast

  • formatstring, optional, default is ‘png’

    Format to save the figure

… and more, see matplotlib

Returns:
Sxx2d ndarray of floats

Spectrogram : Matrix containing K frames with N/2 frequency bins, K*N <= length (wave) Sxx unit is power => Sxx_power if mode is ‘psd’ Sxx unit is amplitude => Sxx_ampli if mode is ‘amplitude’ or ‘complex’

tn1d ndarray of floats

time vector (horizontal x-axis)

fn1d ndarray of floats

Frequency vector (vertical y-axis)

extentlist of scalars [left, right, bottom, top]

The location, in data-coordinates, of the lower-left and upper-right corners.

Notes

This function takes care of the energy conservation which is crucial when working with sound pressure level (dB SPL)

Examples

>>> s,fs = maad.sound.load("../data/rock_savanna.wav")

Compute energy of signal s

>>> E1 = sum(s**2)
>>> maad.util.power2dB(E1)
44.861029507805256

Compute the spectrogram with ‘psd’ output (if N<4096, the energy is lost)

>>> N = 4096
>>> Sxx_power,tn,fn,ext = maad.sound.spectrogram (s, fs, nperseg=N, noverlap=N//2, mode = 'psd')   

Display Power Spectrogram

>>> Sxx_dB = maad.util.power2dB(Sxx_power) # convert into dB
>>> fig_kwargs = {'vmax': Sxx_dB.max(),
                  'vmin':-70,
                  'extent':ext,
                  'figsize':(4,13),
                  'title':'Power spectrogram density (PSD)',
                  'xlabel':'Time [sec]',
                  'ylabel':'Frequency [Hz]',
                  }
>>> fig, ax = maad.util.plot2d(Sxx_dB,**fig_kwargs)     

Compute mean power spectrogram

>>> S_power_mean = maad.sound.avg_power_spectro(Sxx_power)

energy => power x time

>>> E2 = sum(S_power_mean*len(s)) 
>>> maad.util.power2dB(E2)
44.93083283875093

Compute the spectrogram with ‘amplitude’ output

>>> Sxx_ampli,tn,fn,_ = maad.sound.spectrogram (s, fs, nperseg=N, noverlap=N//2, mode='amplitude')  

For energy conservation => convert Sxx_ampli (amplitude) into power before doing the average.

>>> S_ampli_mean = maad.sound.avg_amplitude_spectro(Sxx_ampli)
>>> S_power_mean = S_ampli_mean**2

energy => power x time

>>> E3 = sum(S_power_mean*len(s)) 
>>> maad.util.power2dB(E3)
44.93083283875093