|<-- Back to Previous Page||TOC||Next Section -->|
Chapter 3: The Frequency Domain
Section 3.4: The DFT, FFT, and IFFT
The most common tools used to perform Fourier analysis and synthesis are called the fast Fourier transform (FFT) and the inverse fast Fourier transform (IFFT). The FFT and IFFT are optimized (very fast) computer-based algorithms that perform a generalized mathematical process called the discrete Fourier transform (DFT). The DFT is the actual mathematical transformation that the data go through when converted from one domain to another (time to frequency). Basically, the DFT is just a slow version of the FFTtoo slow for our impatient ears and brains!
FFTs, IFFTs, and DFTs became really important to a lot of disciplines when engineers figured out how to take samples quickly enough to generate enough data to re-create sound and other analog phenomena digitally. Remember, they dont just work on sounds; they work on any continuous signal (images, radio waves, seismographic data, etc.).
An FFT of a time domain signal takes the samples and gives us a new set of numbers representing the frequencies, amplitudes, and phases of the sine waves that make up the sound weve analyzed. It is these data that are displayed in the sonograms we looked at in Section 1.2.
Figure 3.23 shows the first 16 bins of a typical FFT analysis after the conversion is made from real and imaginary numbers to amplitude/phase pairs. We left out the phases, because, well, it was too much trouble to just make up a bunch of arbitrary phases between 0 and 2. In a lot of cases, you might not need them (and in a lot of cases, you would!). In this case, the sample rate is 44.1 kHz and the FFT size is 1,024, so the bin width (in frequency) is the Nyquist frequency (44,100/2 = 22,050) divided by the FFT size, or about 22 Hz.
Amplitude values are assumed to be between 0 and 1, and notice that theyre quite small because they all must sum to 1 (and there are a lot of bins!).
We confess that we just sort of made up the numbers; but notice that we made them up to represent a sound that has a simple, more or less harmonic structure with a fundamental somewhere in the 66 Hz to 88 Hz range (you can see its harmonics at around 2, 3, 4, 5, and 6 times its frequency, and note that the harmonics decrease in amplitude more or less like they would in a sawtooth wave).
How the FFT Works
The way the FFT works is fairly straightforward. It takes a chunk of time called a frame (a certain number of samples) and considers that chunk to be a single period of a repeating waveform. The reason that this works is that most sounds are "locally stationary"; meaning that over any short period of time, the sound really does look like a regularly repeating function.
The following is a way to consider this mathematicallytaking a window over some portion of some signal that we want to consider as a periodic function.
Now, once weve got a periodic function, all we need to do is figure out, using the FFT, what the component sine waves of that waveform are.
As weve seen, it is possible to represent any periodic waveform as a sum of phase-shifted sine waves. In theory, the number of component sine waves is infinitethere is no limit to how many frequency components a sound might have. In practice, we need to limit ourselves to some predetermined number. This limit has a serious effect on the accuracy of our analysis.
Heres how that works: rather than looking for the frequency content of the sound at all possible frequencies (an infinitely large number100.000000001 Hz, 100.000000002 Hz, 100.000000003 Hz, etc.), we divide up the frequency spectrum into a number of frequency bands and call them bins. The size of these bins is determined by the number of samples in our analysis frame (the chunk of time mentioned above). The number of bins is given by the formula:
number of bins = frame size/2
So lets say that we decide on a frame size of 1,024 samples. This is a common choice because most FFT algorithms in use for sound processing require a number of samples that is a power of two, and its important not to get too much or too little of the sound.
A frame size of 1,024 samples gives us 512 frequency bands. If we assume that were using a sample rate of 44.1 kHz, we know that we have a frequency range (remember the Nyquist theorem) of 0 kHz to 22.05 kHz. To find out how wide each of our frequency bins is, we use the following formula:
bin width = frequency/number of bins
This formula gives us a bin width of about 43 Hz. Remember that frequency perception is logarithmic, so 43 Hz gives us worse resolution at the low frequencies and better resolution at higher frequencies.
By selecting a certain frame size and its corresponding bandwidth, we avoid the problem of having to compute an infinite number of frequency components in a sound. Instead, we just compute one component for each frequency band.
Software That Uses the FFT
There are many software packages available that will do FFTs and IFFTs of your data for you and then let you mess around with the frequency content of a sound. In Chapter 4 well talk about some of the many strange and wonderful things that can be done to a sound in the frequency domain.
The details of how the FFT works are well beyond the scope of this book. What is important for our purposes is that you understand the general idea of analyzing a sound by breaking it into its frequency components and, conversely, by using a bunch of frequency components to synthesize a new sound. The FFT has been understood for a long time now, and most computer music platforms have tools for Fourier analysis and synthesis.
The y-axis tells us the amplitude of each component frequency. Since we’re looking at just one frame of an FFT, we usually assume a periodic, unchanging signal. A histogram is generally most useful for investigating the steady-state portion of a sound. (Figure 3.30 is a screen grab from SoundHack.)
|<-- Back to Previous Page||Next Section -->|
©Burk/Polansky/Repetto/Roberts/Rockmore. All rights reserved.