Template matching

Template matching is a simple but powerfull method to detect a stereotyped sound of interest using a template signal. This example shows how to use the normalized cross-correlation of spectrograms. For a more detailed information on how to implement this technique in a large dataset check references [1,2].

References

Load required modules

import matplotlib.pyplot as plt
from maad import sound, util
from maad.rois import template_matching

Compute spectrograms

The first step is to compute the spectrogram of the template and the target audio. It is important to use the same spectrogram parameters for both signals in order to get adecuate results. For simplicity, we will take the template from the same target audio signal, but the template can be loaded from another file.

# Set spectrogram parameters
tlims = (9.8, 10.5)
flims = (6000, 12000)
nperseg = 1024
noverlap = 512
window = 'hann'
db_range = 80

# load data
s, fs = sound.load('../../data/spinetail.wav')

# Compute spectrogram for template signal
Sxx_template, _, _, _ = sound.spectrogram(s, fs, window, nperseg, noverlap, flims, tlims)
Sxx_template = util.power2dB(Sxx_template, db_range)

# Compute spectrogram for target audio
Sxx_audio, tn, fn, ext = sound.spectrogram(s, fs, window, nperseg, noverlap, flims)
Sxx_audio = util.power2dB(Sxx_audio, db_range)

Compute the cross-correlation of spectrograms

Compute the cross-correlation of spectrograms and find peaks in the resulting signal using the template matching function. The template_matching functions gives temporal information on the location of the audio and frequency limits must be added.

peak_th = 0.3 # set the threshold to find peaks
xcorrcoef, rois = template_matching(Sxx_audio, Sxx_template, tn, ext, peak_th)
rois['min_f'] = flims[0]
rois['max_f'] = flims[1]
print(rois)

    peak_time  xcorrcoef      min_t      max_t  min_f  max_f
  0.220590   0.806825   0.011610   0.568889   6000  12000
  1.346757   0.340795   0.998458   1.695057   6000  12000
  2.867664   0.573861   2.519365   3.215964   6000  12000
  3.065034   0.494980   2.716735   3.413333   6000  12000
  6.443537   0.363005   6.095238   6.791837   6000  12000
  8.092154   0.795268   7.743855   8.440454   6000  12000
  9.079002   0.859184   8.730703   9.427302   6000  12000
 10.158730   1.000000   9.810431  10.507029   6000  12000
 11.935057   0.385691  11.586757  12.283356   6000  12000
 12.794195   0.475053  12.445896  13.142494   6000  12000
15.545760   0.546829  15.197460  15.894059   6000  12000
15.719909   0.523297  15.371610  16.068209   6000  12000
16.358458   0.639122  16.010159  16.706757   6000  12000
17.333696   0.347990  16.985397  17.681995   6000  12000
18.320544   0.720789  17.972245  18.668844   6000  12000
19.260952   0.813835  18.912653  19.527982   6000  12000

Plot results

Finally, you can plot the detection results or save them as a csv file.

Sxx, tn, fn, ext = sound.spectrogram(s, fs, window, nperseg, noverlap)
fig, ax = plt.subplots(2,1, figsize=(8, 5), sharex=True)
util.plot_spectrogram(Sxx, ext, db_range=80, ax=ax[0], colorbar=False)
util.overlay_rois(Sxx, util.format_features(rois, tn, fn), fig=fig, ax=ax[0])
ax[1].plot(tn[0: xcorrcoef.shape[0]], xcorrcoef)
ax[1].hlines(peak_th, 0, tn[-1], linestyle='dotted', color='0.75')
ax[1].plot(rois.peak_time, rois.xcorrcoef, 'x')
ax[1].set_xlabel('Time [s]')
ax[1].set_ylabel('Correlation coeficient')
plt.show()

Total running time of the script: (0 minutes 0.500 seconds)

Gallery generated by Sphinx-Gallery