SC - Lezione 26

Checklist

Domande, Keyword e Vocabulary

STFT and Shazam Music Recognition
2-Dimensional DFT/IDFT
2D DFT/IDFT as matrix-vector multiplication
Application of the I2DFT
Example of 2D-DFT
2D Discrete Convolution

Appunti SC - Lezione 26

STFT and Shazam music recognition

The music recognition of Shazam uses a form of spectrogram (STFT, Spectogram) for its searching algorithm. To recognize a song, Shazam creates a fingerprint of the audio.

A simplified version of the Shazam Algorithm:

Apply the spectogram of an audio
Feature Extraction: identify points of interests (called peaks or landmarks) in the frequency-time domain, that are most likely to be unique to the song and resistant to distortion.
Fingerprint Creation: Transform these point into a compact fingerprint of the audio. Just a subset of these features are stored
Matching and Identification: the fingerprint is compared with a database of song and the most similar is picked and presented to the user

In the example on the Matlab Book: Shazam’s technology involves more sophisticated methods for fingerprint creation, peak selection, and matching, including hashing techniques to efficiently compare fingerprints and identify matches within a large database of stored fingerprints.

2-Dimensional DFT / IDFT

Instead of vectors here we have matrices.

Let be a matrix in . The DFT of this matrix is a matrix in with entries:

for and

Which can be rewritten using root of unity as:

We have two summations that goes from to to iterate over each index of the vector
We use two different root of units

2D DFT/IDFT as matrix-vector multiplication

Recall the 1D case: 1D DFT as matrix-vector multiplication

In the case of the 2D we can write:

and are the DFT matrix of order and respectivelty

Another possible representation is to express the matrix-vector multiplication using the stacked input matrix .

Stacked means that is represented as a vector of components, made using the columns of the matrix.

Then we use the kronecker product of the DFT matrices for rows and columns. The output will be a matrix in .

2D IDFT

The IDFT is defined accordingly:

for and

Application of the I2DFT: in image filtering for implementing a high-pass filter by setting all the entries of the DFT matrix, that have a magnitude below a specificed threshold, to zero.

This can be used for example to compress an image and retain only the most significant Fourier coefficients (by magnitude) and set the rest to zero.

Example of 2D-DFT

Notice that the 2D-DFT of a constant matrix is the Dirac Delta Function (i.e. an infinite pulse): The 2DFT of a gaussian is still a gaussian.

The width of the Gaussian in the frequency domain is inversely proportional to the width in the spatial domain;

2D Discrete Convolution

The discrete two-dimensional convolution of two matrices, , called input array or image. And referred to as filter or kernel, is a matrix defined as:

referes to the value of the matrix shifted by rows and columns from . If this position is outside of then it is streated as zero

What’s the size of the matrix ?

It depends on the sizes of the matrix and and the padding used during the convolution.

There are three possible cases.

full convolution the output matrix has a size of .
- Uses a larger temporary matrix
- Similar to linear convolution in 1D case
same convolution: the output matrix as a size of .
- The matrix is padded such that the convolution result is centered and retains its original size
valid convolution: the matrix has a size of , but the output matrix included only those parts that are computed without the zero-padding edges.

For a clarification image see: the-most-intuitive-and-easiest-guide-for-convolutional-neural-network-3607be47480

Obsidian + 🪴 Quartz 4.0