Short Time Fourier Transform and Windowing
I wrote this series after attending a course on Digital Signal Processing at my university. I will try not to go into too much detail. Thus there are many essential aspects that I deliberately don’t cover (such as complex numbers). You should be able to follow along even if you don’t have a technical background. Afterwards, if you find it interesting, I would recommend you to attend a DSP course at your university.
In the previous article, we learned how to perform the Discrete Fourier Transform (DFT) on a number of samples. With this, we can perform a single measurement and determine the fundamental frequency. In order to build a tuner, however, we need not only to determine the fundamental frequency, but also to track it continuously. Unfortunately, this is not as trivial as it seems. We have to take some precautions to prevent unwanted artifacts.
Tracking the fundamental frequency is what we will do in the first part of this article, with the Short-Time Fourier Transform and Windowing. In the second part of this article I will discuss the most complicated element of the guitar tuner: determining the frequency with high resolution. Remember that the frequency we get from the DFT is a whole number. That is not enough precision to tune a guitar or any other musical instrument for that matter. Thus we have to resort to alternative entropy to estimate the real frequency. This I will discuss in Phase Retrieval and High-Resolution Frequency Determination.
1. Short-Time Fourier Transform
When we use the microphone of an iPhone to record audio, we take 44,100 samples per second. These samples are distributed evenly, thus there is roughly 2.26 μs between each sample. With the Short-Time Fourier Transform[1], we perform the DFT for each samples. Now let us look one more time at a simple case of the Discrete Fourier Transform because there are two things that we should definitely keep in mind (fig. 1).
We can draw two conclusions from this:
- The number of different frequency bins we get from the DFT is equal to the number of samples we used for the DFT.
- The frequency spectrum is always mirrored around the Y-axis. The reason why stems from Euler’s formula and since I haven’t mentioned complex numbers so far, I won’t discuss the details here either. The result however, is that the number of frequencies we can measure, is only half the number of samples we used.
With the Short-Time Fourier Transform, we are going to perform the DFT after every 4096 samples. Thus, we are going to do roughly 10 DFTs per second. Thus, it is important to keep in mind that we have to multiply the frequencies we get from each DFT by 10 (roughly). If you don’t understand why, think of it as a poll: 100 people are asked to answer a question, than the results are projected on a population 10 times as big as the test group. In the same fashion, we can differentiate between 2048 frequencies (remember the 2 conclusions) that range from 11 Hz to 22,050 Hz. The distance between each frequency is again roughly 10 Hz so we can measure 11 Hz, 22 Hz, 32 Hz, etc. Of course, this isn’t precise enough to build a guitar tuner but at the second part of this article, I will show we can improve the resolution of the frequencies.
2. Windowing
So far, we have only taken into account signals that fit in a single DFT and therefore assumed that they are periodic. That assumption is a luxury that in reality is never true. In fact, the assumption results in spectral leakage. I visualized this artifact in fig. 2.
The solution is simple: we need to zero both ends of the waveform. This is done by multiplying the waveform with a window function[2]. It is called a window function because it limits the waveform to a specific window. There are multiple window functions, named after their inventors. We will stick with the Hann window, named after Julius von Hann. In fig. 3 we can see that using the window function, we can significantly reduce the wiggling of amplitudes in the frequency domain.
3. Spectrogram
In this series, I have visualized the changing frequencies through video animations. This isn’t the most efficient way though. For one, we cannot include videos in printed papers. Therefore, the frequency domain of a signal is often visualized using a spectrogram [1].
In fig. 5, I have plotted such a spectrogram of the signal from fig. 4. The main advantage of the spectrogram is that you can see the frequency spectrum over the entire duration of a recording in one plot.
We won’t need the spectrogram for our guitar tuner application but it is essential that you know about it anyway because they are used everywhere.
4. Phase Retrieval
There is one more essential component of the Fourier transform that I have not yet mentioned. So far, we have only used the Fourier transform to measure the presence of discrete frequencies in a signal. Remember the equation for a sine wave (fig. 6):
So far, we have covered the frequency and the amplitude. However, we should not forget time (or phase offset). I visualized three sine waves in fig. 7. All three sines are exactly identical but they start at a different phase.
Fortunately, we can also use the Fourier transform to determine the phase difference between each of these waves[3]. Again, this may seem trivial but in the next section, I will show how you can use the phase to increase precision of the frequencies we can determine. For now, it is important to understand that each of the sines in fig. 7 is periodic (as are all sines). That means that at the start of each period the phase is and at the end it will be .
In order to illustrate this, I plotted a simple sine wave and its phase over time using the Fourier transform in fig. 8.
It is clear that at the end of each period, the phase (blue dot) is back at its original position.
5. High-Resolution Frequency Determination
When searching on Google for using the Fourier transform to build a guitar tuner, you will most likely stumble upon a page on StackOverflow explaining that you cannot use Fourier transform because the frequency resolution is finite and you can’t detect small frequency changes.
While the frequency resolution is indeed finite, the latter statement is incorrect. Phase information is often overlooked but in this case can help us to achieve high-resolution frequency determination.
This algorithm was first published in 1993[4] and is implemented in AudioKit[5], the framework we are going to use next week for our guitar tuner app. The paper first describes how we can estimate the fundamental frequency bin (which basically is the nearest whole number) with the DFT, which is what we did over the last 2 weeks. It then explains how the frequency resolution can be increased further by using the phase information that is retrieved from the DFT.
Let’s look at two sine waves now, one with a whole frequency and one that does not fit in one window (fig. 9).
With the DFT, we first determine that the nearest frequency bin of both waves is . In the right polar plot, we can see that the initial phase difference is which is exactly the additional resolution we are looking for. Of course, both sines started at the same phase (). Therefore I visualize the sine wave progressing over time and the phase difference in the polar plot. To calculate the sub-bin frequency we then only have to compute the phase difference twice and divide it by the time between those computations. We only use the phase that corresponds to the Fourier coefficient with the highest magnitude, i.e. the estimated fundamental frequency.
For sake of completeness, let us see how it works for different frequencies. In fig. 10 I visualized another pair of sine waves.
Lastly, there is one more case that we need to address. If the frequency is matched to a higher bin, for example to bin , we see a negative phase difference. In other words: the actual sine waves is a bit slower than we would expect based on the frequency bin it is in. I have visualized such a case in fig. 11.
The reason why it works is quite simple. We first estimate the fundamental frequency, which we know to be off by at most or . Based on that frequency, we can calculate the expected phase at any time. Because the sine wave does not have the exact same frequency as we expected, the phase will differ from the expected phase we calculated. All we have to do is divide the difference between that phase by and we have a really good estimation of the actual frequency.
In the tests that I performed, this algorithm was able to correctly determine the frequency with up to 4 decimals precision. That is by far a better resolution than we even need for our guitar tuner app. Concluding: yes, we can use Fourier transform for building a tuner and it works amazingly well.
6. What’s Next
In the next article, we are finally going to bring theory into practice by building our very own guitar tuner app for iOS in Swift. We will be using an audio processing framework that implements the algorithms discussed in these articles. We will also look at the challenges that arise when building the UI.
This is all the math we are going to need so if you made it here: congratulations. The next article will contain only practical matters, no difficult techniques or algorithms. Just make sure you have Xcode installed and up and running!
7. References
- Rioul, Olivier and Vetterli, Martin “ Wavelets and signal processing ” IEEE signal processing magazine 8. (1991) : 14–38
- Harris, Fredric J “ On the use of windows for harmonic analysis with the discrete Fourier transform ” Proceedings of the IEEE 66.1 (1978) : 51–83
- Filenup, James R “ Phase retrieval algorithms: a comparison ” Applied Optics 21.15 (1982) : 2758–2769
- Brown, Judith C and Puckette, Miller S “ A high resolution fundamental frequency determination based on phase changes of the Fourier transform ” The Journal of the Acoustical Society of America 94.2 (1993) : 662–667
- Prochazka, Aurelius “ AudioKit ”