I wrote this series after attending a course on Digital Signal Processing at my university. I will try not to go into too much detail. Thus there are many essential aspects that I deliberately don’t cover (such as complex numbers). You should be able to follow along even if you don’t have a technical background. Afterwards, if you find it interesting, I would recommend you to attend a DSP course at your university.

In my first series of articles, I will attempt to introduce you to Joseph Fourier’s masterpiece: a collection of transformations that are used everywhere. In order to do this, I will first look at its most obvious use case: audio processing. At the end of the series I will show how Fourier transform can be used to create a digital tuner for iOS in Swift (psst. fig. 1).

Audio, or sound, is most commonly observed as a vibration of air. Musical instruments and the speaker in your phone produce these vibrations, which the human brain then interprets as sound. The number of vibrations per second is the frequency in Hz.

Let’s look at a guitar first. Most guitars have 6 strings. The lowest string is very thick, which reduces the vibration it generates. The highest string is very thin and vibrates very much in comparison to the lowest string. Precisely, the lowest string vibrates 80 times per second and the highest string vibrates 330 times per second[1].

Below is an animation of the picking of guitar strings in slow-motion (fig. 3). It is clear that the lower and thicker strings vibrate much less than the higher and thinner strings. The result is that the human brain interprets the thin strings as higher tones and the thick strings as lower tones.

Note that the same mechanism applies to pianos as well, even though you cannot see the strings that are attached to each key.

1. Wave Equations

When a device generates sound, its vibration can be described by a simple wave equation (fig. 4). For example, guitar (strings) and the vibrating diaphragm of a speaker are two of such devices. The wave equation is defined over time $\text{t}$ and has a frequency parameter. In fig. 5 and fig. 6 we can see that a lower frequency results in less movement and a low tone. The average human ear can perceive between 20 Hz (low tones) and 20.000 Hz (very high tones)[2].

2. Chords

In the real world, sound never consists of only a single frequency. When playing the guitar, for example, your fingers will generate some additional almost inaudible noise. Besides that, often you will pick more than 1 string at a time. When two devices generate sound at the same time, the vibration of air is simply summed. In fig. 7, 3 individual tones of Em-chord are plotted with different colors each. In fig. 8, the individual tones are summed together, which is the sound recording devices will pick up.

This poses a problem because as you can see, recovering the frequency from a recording (such as fig. 8) seems impossible if multiple devices produce sound at the same time. Even the slightest high-frequency noise will completely ruin any attempt to recover the frequency from a waveform. Fortunately, this is were the Fourier transform comes in.

3. Continuous Fourier Transform

Ideally, what we want is to be able to measure the presence of any particular frequency at a particular moment in time. In the previous section, we learned that each sample of audio can contain multiple frequencies for example when you hear both a flute and a bass at the same time. Precisely, what we want is a function $\text{X}(\omega)$ than can tell us what amplitude a frequency $\omega$ has. For example, when playing the low E string of a guitar, we want to be able to determine that $\text{X}(80) > \text{X}(196)$.

We can compute $\text{X}(\omega)$ with the Continuous Fourier Transform (CFT)[3]. Its equation is given in fig. 9. The CFT transforms a signal $\text{x}(\text{t})$ into $\text{X}(\omega)$. The terminology for this is: transforming a signal from time-domain ($\text{t}$) to frequency-domain ($\omega$).

Depending on your personal preferences, this equation may or may not frighten you. Fortunately, I have made the effort to visualize the process of the CFT. In fig. 10 I have plotted a basic sine wave both in time-domain (left, $\text{x}(\text{t})$) and frequency-domain (right, $\text{X}(\omega)$). It is clear that increasing the frequency left, moves the peak in the right graph. Looking more carefully, you should also notice that the location of the peak in the right graph is exactly the frequency of the sine left.

Hopefully you are still a bit hesitant. After all, the function in time-domain in fig. 10 is really basic. Does it still work with composite sines? Absolutely! In fig. 11 we see that even with multiple sines, the Fourier transform can even differentiate between their frequencies.

4. What’s Next

In the next 2 articles, you will get familiar with the operations that are necessary to implement a guitar tuner in Swift for iOS, which we will do in the 3rd article after this one. A screen capture of the final result is displayed in fig. 1.

In the 2nd article, we will examine the problems that arise when capturing real sound, instead of the basic generated sine waves. Specifically, I’ll be talking about sampling, the Discrete Fourier Transform (DFT) and explaining the difference between the CFT and DFT.

In the 3rd article, we will learn how to use the DFT in real-time with short windows, as is necessary to create a guitar tuner.

5. References

1. French, M and Hosler, D   “ The Mechanics of Guitars ”   Experimental Techniques 25.3 (2001) : 45–48
2. Smith, Steven W and others   “ The scientist and engineer’s guide to digital signal processing ”   (1997)
3. Papoulis, Athanasios   “ The Fourier Transform and Its Applications ”   New York . (1962)