You can have the smallest music in the world, all compressed down into mere bytes, but if the instrument playing it sounds rubbish then what’s the point? We need a virtual Pianola, a (small) piece of code that responds to commands and makes a noise like a piano should. We need a piano synthesizer.
First things first, what does a piano actually sound like? You can’t just grab any old piece of piano music and listen to it; you need to have a clean controlled recording of the piano playing a single note, otherwise you can’t analyse it properly. Preferably, you’ll have a whole set of these recordings covering the range of piano notes, with each note recorded at a few different velocities (the velocity is how hard you hit the piano key).
Luckily for me, I don’t need to go out with a microphone and find a grand piano to take recordings from, because other people have already done exactly that. SoundFont files are the result, a bank of recordings of an instrument playing a range of notes at a range of velocities. Most SoundFonts are expensive to buy, but I located two high quality examples online which are free to download. Extracting the notes from a SoundFont is easy, and you end up with a few dozen sound files like this one:
Piano note A3 forte
The first thing you notice when you listen to them all is that the notes all sound different to each other. Not just higher and lower pitched versions of each other, but qualitatively different. If for instance you take a middle C4 note and pitch-shift it down by two octaves, you do
not arrive at an acceptable sounding C2 note. It will sound too thin and plinky. The lower notes on a piano have a deep sonorous bell-like quality to them that the higher notes do not have.
The second thing you notice is that even the different velocities sound different to each other; they are not just louder and quieter. The harder you hit a note the brighter it sounds, and when you study it with the help of a sound analysis program, you see that this is because the stronger note contains more powerful high-frequency components to it.
One of the most useful ways to study an audio signal is to treat it the same way as the human ear does. Your ear has thousands of tiny little hairs inside which are all different lengths, and they vibrate at different sound frequencies. When you hear a sound, each hair in your ear vibrates to a greater or lesser amount according to how much the sound contains its
resonant frequency. If you hear a very pure tone, only a few of the hairs in your ear will vibrate. If you hear white noise, they’ll all be vibrating together. Your brain decodes all of this and turns it back into sound in your mind.
The computer equivalent of this process is called a
Fourier transform, and when you run an audio signal through it, out will pop a neat graph of loudness versus frequency. This is what the piano note above looks like when you run it through a Fourier transform:
(click the image to download a larger version)
The note is A3, which has a frequency of 220Hz. The first thing you notice is that it is arranged into a series of spikes. The biggest spike on the left is the fundamental tone at 220Hz. The other spikes to the right of it are called
overtones, and they give the piano note its character. In the musical world, the musical character from overtones is called
timbre.
Something you may notice is that the overtones are evenly spaced, they are whole-number multiples of the fundamental frequency. Overtones like this are called
harmonics. In fact a piano's overtones aren’t quite perfect harmonics, but they are close enough for government work. We don’t need to study the difference until the next article when we talk about physical modelling.
As we’ve already stated, the overtones provide character to the note, they give the musical instrument its voice. To illustrate this, listen to the note below which is simply a pure tone at 220Hz with no overtones, and compare it with the piano note from above.
Pure tone note A3
Quite a significant difference!
An idea strikes me around this time in the study. How about if, for each piano note, I simply store the height and frequency of the most important spikes in that graph? We could probably get away with recording about 20-30 spikes, and that would give us a reasonable reproduction of the piano’s timbre. Throw it through an inverse Fourier transform, and pull the audio signal out of the other side.
I don’t have a recording handy, but I can assure you that it sound's pretty good, a fairly reasonable piano-like sound. Unfortunately it has two critical drawbacks:
- Those spikes decay at different rates on a real piano, the high frequency overtones decay faster than the lower frequencies. The result is a bright punchy start to the note, and a duller ringing tail-off. We’ve lost that, and it’s hard work to get it back.
- The bass notes definitely do not sound good with only 30 overtones. They are so rich in overtones that we need at least 100! Imagine for each overtone that we want to store frequency (4 bytes) and power (another 4 bytes). That’s 800 bytes for one bass note at one velocity level. The piano has 88 notes, and we’re going to need several key velocities. It doesn’t scale-up well.
In the end, rebuilding a signal like this from overtones can be thought of as a very strange and unique form of lossy compression. As we learned in the
previous article, data compression doesn’t work very well at getting us down to the required sizes needed for packing everything into a 64KB demo.
So we go back to the drawing board, in search of a more procedural solution.
Thankfully, brighter minds have been applying their considerable capabilities to the same problem for years, albeit for different reasons, and they have been achieving impressive results with a technique called physical modelling. Briefly, physical modelling is a way of simulating the actual inner workings of the piano, in order to arrive at a completely procedural synthesis of the sound.
In the next article we will explore some of the current approaches to physically modelling the piano, and then focus on the one I chose for the demo.