SC - Lezione 26
Checklist
Domande, Keyword e Vocabulary
- STFT and Shazam Music Recognition
- 2-Dimensional DFT/IDFT
- 2D DFT/IDFT as matrix-vector multiplication
- Application of the I2DFT
- Example of 2D-DFT
- 2D Discrete Convolution
Appunti SC - Lezione 26
STFT and Shazam music recognition
The music recognition of Shazam uses a form of spectrogram (STFT, Spectogram) for its searching algorithm. To recognize a song, Shazam creates a fingerprint of the audio.
A simplified version of the Shazam Algorithm:
- Apply the spectogram of an audio
- Feature Extraction: identify points of interests (called peaks or landmarks) in the frequency-time domain, that are most likely to be unique to the song and resistant to distortion.
- Fingerprint Creation: Transform these point into a compact fingerprint of the audio. Just a subset of these features are stored
- Matching and Identification: the fingerprint is compared with a database of song and the most similar is picked and presented to the user
In the example on the Matlab Book:
Shazam’s technology involves more sophisticated methods for fingerprint creation, peak selection, and matching, including hashing techniques to efficiently compare fingerprints and identify matches within a large database of stored fingerprints.
2-Dimensional DFT / IDFT
Instead of vectors here we have matrices.
Let
for
Which can be rewritten using root of unity as:
- We have two summations that goes from
to to iterate over each index of the vector - We use two different root of units

2D DFT/IDFT as matrix-vector multiplication
Recall the 1D case: 1D DFT as matrix-vector multiplication
In the case of the 2D we can write:
and are the DFT matrix of order and respectivelty
Another possible representation is to express the matrix-vector multiplication using the stacked input matrix
Stacked means that
Then we use the kronecker product of the DFT matrices for rows and columns. The output will be a matrix in
2D IDFT
The IDFT is defined accordingly:
for
Application of the I2DFT: in image filtering for implementing a high-pass filter by setting all the entries of the DFT matrix, that have a magnitude below a specificed threshold, to zero.
This can be used for example to compress an image and retain only the 
Example of 2D-DFT
Notice that the 2D-DFT of a constant matrix is the Dirac Delta Function (i.e. an infinite pulse):
The 2DFT of a gaussian is still a gaussian.
The width of the Gaussian in the frequency domain is inversely proportional to the width in the spatial domain;
2D Discrete Convolution
The discrete two-dimensional convolution of two matrices,
referes to the value of the matrix shifted by rows and columns from . If this position is outside of then it is streated as zero
What’s the size of the matrix ?
It depends on the sizes of the matrix
There are three possible cases.
- full convolution the output matrix
has a size of . - Uses a larger temporary matrix
- Similar to linear convolution in 1D case
- Uses a larger temporary matrix
- same convolution: the output matrix
as a size of . - The matrix
is padded such that the convolution result is centered and retains its original size
- The matrix
- valid convolution: the matrix has a size of
, but the output matrix included only those parts that are computed without the zero-padding edges.
For a clarification image see: the-most-intuitive-and-easiest-guide-for-convolutional-neural-network-3607be47480
