How Synthesisers Work

How Synthesisers Work

by Colin Pykett

Posted: April 2006

Last revised: 12 November 2012

Abstract Despite what some might claim, digital electronic organs are little different to the synthesisers used by pop musicians. All of these, together with other sound devices such as computer sound cards, use the same basic principles to generate sound. They are decidedly complex pieces of hardware and software, and they have attracted much attention in the professional literature dealing with digital signal processing and computer music. Unfortunately the majority of this is intended for the specialist who is familiar with topics such as digital filtering, interpolation and software synthesis. If you do not know what these terms mean, that merely proves the point. Even if you have met these terms, you might have been put off by the mathematical framework which so often accompanies them. Because so little information is available describing how a synthesiser works at a relatively simple level, this article attempts to fill the gap by explaining in simplified terms how synthesisers have evolved from their "Moog" ancestry in the 1960's to the present day. It comes with a promise that homomorphisms, finite difference calculus, z-transforms and cubic splines will not be mentioned! However the important but rather complicated subject of frequency shifting by interpolation and decimation is discussed, but as simply as possible.

Contents

(click on the headings to access the desired section)

Introduction

History

What is a Synthesiser?

Polyphony

Multitimbrality

Room for Confusion

Software Synthesisers and Polyphony

MIDI

Analogue Synthesisers in Detail

Analogue Synthesiser Architecture

DAHDSR – Delay, Attack, Hold, Decay, Sustain, Release

VCA – Voltage Controlled Amplifier

VCF – Voltage Controlled Filter

VCO – Voltage Controlled Oscillator

Digital Synthesisers in Detail

Digital Synthesiser Architecture

Oscillators in Digital Synthesisers

Sampled sounds and looping

Frequency Shifting – Interpolation and Decimation

Aliasing

Linear Interpolation

Higher Order Interpolation

Notes and References

Introduction

Some while ago I was asked to explain how a synthesiser worked. I must admit to an inward groan, not because I do not like explaining things to people, but because of the vastness of the field. It is not the sort of thing that can be covered in a few words. Therefore my first suggestion was that the enquirer should look on the Internet. His immediate reply was that he had done so but had found that little information existed, and most of that which did exist was so technical that it was not much use to him. At the time I was surprised to hear this, but a few mouse clicks later I found he was absolutely right. As I write this, I have just repeated the exercise. Typing ‘synthesiser’ into Google in its world wide mode produced over 700,000 hits. Using the American spelling (synthesizer) resulted in over 20,000,000.

Yet it is still the case that little material exists which explains how they work, and I have just this moment given up trying to find anything of much value. Papers, dissertations and theses on electronic music which immediately dive into z-transforms, finite difference calculus and the like are not really all that useful to the majority of the population, and certainly not to musicians. To my mind they fall mainly into the “see how clever I am” category.

Therefore this article describes how synthesisers work as simply as possible. It begins with an overview of some important aspects of both analogue and digital synths, and it then goes on to look in detail at their design. Some aspects of modern digital synthesisers are discussed in more depth, including the important subject of real time frequency shifting by interpolation or decimation which was irrelevant in the analogue days. All but the simplest mathematics has been avoided, and the article does not require a detailed understanding of digital signal processing (DSP). Digital electronic organs are sometimes referred to because they also are synthesisers, but most of the article is written at a more general level. Sampled sound synthesis is the main focus of the article but passing reference is also made to additive synthesis. Physical modelling is not covered at all but its recent appearance in digital organs is described in detail elsewhere on this site [4].

History

Today’s synthesiser technology has roots way before the legendary Dr Robert Moog, who died last year (2005). One of the most famous names associated with the earliest analogue synths was Harald Bode, born in 1909 in Germany, who was already well into pioneering entirely novel electronic musical instruments by the 1930’s. Later in a long and continuously productive career he collaborated with other famous names including Moog himself, and was largely responsible for introducing devices such as the ring modulator (really just an analogue multiplier) and the vocoder into electronic music (try ‘Sparky’s Magic Piano’ for a very early application of the vocoder to music).

Until about the 1960’s, the equipment in electronic music studios consisted of lots of expensive and separate boxes which were patched together as required, and many of these (such as signal generators) were not sold with electronic music in mind. But as the decade progressed two key things happened. Firstly, analogue integrated circuits became commercially available at reasonable prices, mainly due to the US space programme to put a man on the moon. By the late ‘60’s you could buy a 702 or 709 operational amplifier for around one pound (!) from firms such as SGS-Fairchild, and not long afterwards the ubiquitous industry standard 741 device appeared which is still widely used to this day. Voltage controlled amplifier and filter chips, analogue multipliers and other useful devices such as bucket brigade delay lines also materialised. The second notable event was the invention of the Dolby noise reduction system in the UK by Ray Dolby in 1965. This reduced noise on analogue tape systems, thereby enabling multi-tracking to become a practical reality in much the same way as tracks are still laid down today using a digital sequencer. Prior to that it was impossible to lay down more than a few tracks using analogue tape because of the problem caused by the tape noise building up as more tracks were added to the final mix.

For these reasons interest in electronic music exploded during the late 60’s, and some earlier ideas of Bode and others to assemble an electronic instrument from a number of standardised modules finally became reality in the form of the first true analogue synthesisers. Carlos’s “Switched On Bach” LP, with its sleeve picture of a bemused Cantor of Leipzig wearing headphones in front of a collection of large Moog synthesisers, was an instant hit and it remains a sought-after piece of classic vinyl. Nevertheless commercial products at first remained so expensive that they were inaccessible to the majority of musicians and even some smaller studios. But because the interest which had been aroused was so great, a range of analogue synthesisers was widely marketed during the 1970’s and early 80’s as DIY kits which could be assembled at relatively low cost. Some were also available in fully-assembled form. The well known designer Tim Orr was responsible for some of these, one of which was the ‘Transcendent’ marketed by Powertran Electronics in the UK after its design had been serialised in ‘Electronics Today International’. This was a monophonic (one note at a time) design but with a multitimbral (several simultaneous voices) capability. A 4-voice Transcendent is shown in Figure 1.

Figure 1. A version of Tim Orr’s ‘Transcendent’ analogue synthesiser c. 1981 (Powertran DIY kit)

Digital synthesis techniques were also being developed, but at first only huge mainframe computers had enough power to make them work in real time. Low cost digital synthesisers had to await the appearance of reasonably fast microprocessors in the mid-1980's, though even then they only possessed rudimentary capabilities. It was often still necessary to buy separate boxes to perform operations such as digital sampling of audio waveforms, which could then be downloaded into a PC. Only by the mid-1990's was the digital synth revolution really under way as far as the small studio or home enthusiast was concerned. By then items such as computer sound cards were appearing with a useful range of capabilities, and the cost of professional digital synths was also coming down to more realistic levels. Some home systems (e.g. the Atari range) were marketed with a semi-customised music capability, including such rare features for the time as built-in MIDI interfaces. Like analogue synths, these still attract a considerable cult following.

What is a synthesiser?

A synthesiser is a device for simulating virtually any musical sound, whether melodic or percussive. It uses a number of building blocks to implement standard functions which either generate sounds or modify them, and because the word synthesise means manufacture, thus it derived its name. A synthesiser manufactures sounds from scratch according to the whim of the player. What we shall be discussing in this article are time domain synthesisers, which use ordinary waveforms as the source of the sounds whose amplitudes and frequencies vary with time. Frequency domain synthesisers are less common – they use additive synthesis to produce sounds by adding together the necessary harmonics with the desired strengths. There are also some other synthesis techniques which will not be discussed here, but another article on this website includes a survey of the main synthesis methods [2].

The building blocks of a time domain synthesiser are of two types – either they generate a waveform, or they modulate (modify) it in real time in terms of its amplitude and/or frequency. When a key is pressed on the keyboard, a waveform of the correct type and frequency is generated and simultaneously it is modulated. Both the type of waveform used and the manner in which it is modulated will have been previously determined by the player because of the way the synth was set up beforehand. Note that this brief description applies equally to both analogue and digital synths. Although today’s digital synths are considerably more flexible than the earlier analogue ones in the sense they offer far more options regarding waveform generation and modulation, they are still fundamentally the same sort of instrument as far as the performer and composer are concerned.

Polyphony

An important practical difference between analogue and digital synths concerns polyphony – the earliest analogue synths were monophonic, meaning that only one note at a time could be played. Only by using multi-tracking techniques on analogue tape could a single performer build up a fully polyphonic piece of music (hence the importance of Dolby noise reduction). Alternatively, several performers playing several synths could generate a polyphonic rendition in real time. However both options were so restrictive, time consuming and expensive that polyphonic operation soon became available, but because of the limitations of analogue technology you were lucky to find a synth which offered a polyphony of more than four notes or so. By contrast, even the humblest digital synth today running on a PC will typically offer a polyphony of 64 notes (but see the ‘Room for Confusion’ section below).

Multitimbrality

Timbrality means the range of voices a synthesiser offers, each voice corresponding to a different type of tone colour (otherwise known as timbre, hence the name). Thus multitimbrality implies the existence of simultaneous multiple voices. For example, a multitimbral synth with sufficient voices could simulate an orchestra in which different instruments are playing simultaneously, such as violins, oboes, trumpets, etc. Or it could imitate a pipe organ being played with several stops in use, each with a different tone colour or pitch.

A restriction in the analogue days concerned the number of voices which could be used simultaneously. Today’s digital synths usually allow you to mix up to 16 voices simultaneously, this limitation reflecting MIDI as much as anything else, and we shall discuss MIDI later on. Older analogue synths generally offered fewer voices, such as the four-voice capability of the Transcendent synth pictured in Figure 1 (though this could be increased by an expander system).

Room for Confusion

In modern digital synths, the polyphony figure has to be examined rather carefully. At the least there is an opportunity for confusion to arise, and at the other extreme manufacturers sometimes advertise a polyphony value which cannot always be achieved in practice.

The reason for this is that the polyphony figure (conventionally relating to the number of notes which can be keyed at the same time) and the timbrality figure (conventionally relating to the number of voices which can sound at the same time) are not always independent of each other. This is because a digital synth will usually contain a limited number of identical hardware circuits which can output sound – we shall be examining the structure and capabilities of these circuits in detail later on. Each circuit will be assigned dynamically by the synth’s internal computer to a particular voice of a particular note as the instrument is played.

Let us assume there are 64 hardware circuits (a typical figure), and that 4 voices are being used for each note. Each voice of each note played has to be generated by a separate circuit. This means that a maximum of only 16 notes can be keyed simultaneously (64 divided by 4) before the system runs out of sound generating capability. Therefore the polyphonic capability of the instrument, using the conventional understanding of the term, is not 64 at all in this situation but only 16. In other words, you would only be able to play a maximum of 16 notes at once. It would only increase to 64 if you used only 1 voice per note. At the other extreme, if you were using 16 voices the number of notes you could key simultaneously would only be 4. Therefore a useful rule to bear in mind is that advertised polyphony equals the maximum number of voices you want to use multiplied by the maximum number of notes you will be able to key simultaneously for an instrument of this type.

Software Synthesisers and Polyphony

The discussion above related to digital synthesisers. It assumed that each sound generating circuit was implemented in hardware, which it often is, although the circuits are not built on circuit boards using discrete transistors etc. They are contained within one or more extremely complex integrated circuits. Sometimes however, the speed of modern computers is such that each sound generating circuit is virtual - it exists only as part of the software program rather than as a hardware entity. Synthesisers of this type are called software synthesisers, or just soft synths. Manufacturers of soft synths like to pretend that there is no polyphonic limit to their products, although what they really mean is that it is difficult to predict what that limit will be. There will always be a limit, but because it is not directly related to hardware it will depend in an unpredictable manner on what you are asking the synth to do in terms of the number of simultaneous voices used and the number of notes being played. Soft synths are also strongly dependent on the computer being used, and if it is a PC for example, then the faster the PC the more polyphony you will get before the synth runs out of steam. A hardware synth will simply refuse to play any more notes once all of its sound generating circuits have been used up, or it might drop some notes currently sounding in favour of the latest ones just keyed. A soft synth, on the other hand, will typically signal that it has reached its polyphony limit when the sound starts to break up or when it responds more slowly to the player.

MIDI

MIDI stands for Musical Instrument Digital Interface, and it was part of the enabling technology which accompanied the continuing high level of interest in synthesisers during the 1980’s. MIDI was invented by Dave Smith, who first suggested it publicly to a meeting of the Audio Engineering Society in 1981. It produces no sounds of its own but is intended solely to enable the products of various manufacturers to adhere to a common standard such that they can be connected together reasonably simply. Using MIDI, several other synthesisers can be played from one keyboard by chaining them together via their MIDI In and MIDI Out sockets.

Although the MIDI system itself is digital, it was nevertheless used to control analogue synths in its early days. This was because, although they were MIDI compatible, their actual sound generating circuitry was still analogue. This is one reason why the MIDI official specification today still has a special ‘monophonic’ mode which is a lingering echo from the days of analogue synths, though it is of course hardly ever used now.

MIDI is a vast subject which cannot be covered here. However another article on this website discusses it more detail and also gives some references which can be consulted for more information [1].

Analogue Synthesisers in Detail

Analogue synthesiser architecture

Figure 2. A typical analogue synthesiser

A block schematic diagram of a typical analogue synthesiser is shown at Figure 2. Such a set-up is capable of producing the sound corresponding to a single voice of a single note. Therefore several such complete assemblies (apart from the summing node at the right hand side, which is common to many or all of them) would be required for a polyphonic synth. A multitimbral capability would require at least some of the elements of the assembly to be duplicated. Since all the elements, i.e. the contents of each box, were built on separate printed circuit boards with individual transistors, early integrated circuit packages, etc, it can be appreciated how difficult it was to get the size and cost of these analogue synths down to reasonable levels, and to provide more than a few simultaneous voices. For the same reason only the most advanced and expensive analogue synths had a polyphonic capability.

The diagram does not necessarily represent any specific model or type of synthesiser, nor does it include all functions or facilities that could conceivably have been used. For example, a white noise generator is not included, nor is a MIDI interface shown. Although it represents analogue technology, it is also typical of today’s digital instruments even though the names of some of the modules are different, therefore we do not lose generality by discussing this architecture first. It is also easier to understand in some respects.

The basic signal flow is shown by the red lines and it is very simple. Thus the sound is first generated by a VCO (voltage controlled oscillator), it is then modified by a VCF (voltage controlled filter) and then by a VCA (voltage controlled amplifier). The audio output from the VCA is then summed with the outputs from other modules before being passed to a power amplifier and then the loudspeakers. Effects such as reverberation and chorus would also be added at this point.

When a key is pressed, two signals are generated by the keyboard interface circuitry. One is an analogue control voltage, typically varying between 0 and 10 volts, which controls the frequency of the VCO. The voltage depends on which key was pressed, thus the frequency (pitch) of the waveform produced by the VCO will correspond to that key. The second signal is a gate pulse which (in this example) goes from zero to a high level while the key remains pressed, and reverts to zero when it is released. This pulse is applied to several DAHDSR circuit blocks. DAHDSR stands for Delay, Attack, Hold, Decay, Sustain, Release. It is a more elaborate version of the ADSR (Attack, Decay, Sustain, Release) circuit which was often used. We shall discuss the more complicated version here because it is more representative of today’s digital synths.

DAHDSR - Delay, Attack, Hold, Decay, Sustain, Release

Figure 3. The DAHDSR curve

In Figure 3 is sketched a DAHDSR curve, which is generated by all the boxes having this label in the synthesiser schematic of Figure 2 whenever a key is pressed and released. It is a control voltage envelope with six distinct regions as the name implies. However not all of them have to be used on every occasion, and because all are adjustable you can simply reduce the “value” of the regions you do not want to zero. With analogue synths the adjustments were made by turning actual knobs, which is one reason why these synths bristled with controls. With digital synths the adjustments are made on a computer display, although many musicians today still lament the passing of real controls and their replacement by virtual ones. Much of the time you will not want the Delay and Hold parameters, therefore their respective time periods would be reduced to zero. The characteristic then reduces to the better known, though less flexible, ADSR curve.

VCA – Voltage Controlled Amplifier

To explain the many functions which the DAHDSR curve performs it is easiest to start with the VCA – the voltage controlled amplifier. When a key is pressed, nothing happens at first during the Delay period (which might of course have been set to zero anyway, by virtue of the previous discussion). The control voltage applied to the VCA then rises over the Attack period. Because the gain of the VCA depends on the instantaneous value of the control voltage, this means the volume or amplitude of the sound you hear at the output increases from zero over a time denoted by the Attack parameter. The maximum value reached during this phase is sometimes called the Initial Attenuation level. Then the amplitude might remain constant during the Hold period (again this might have been set to zero), before falling to some lower level over a time defined by the Decay value. After this the amplitude remains constant, at a level defined by Sustain, for the time that the key is held down. Finally when the key is released the sound drops to zero over the Release period.

Note that the Delay, Attack, Hold, Decay and Release parameters are all expressed as time intervals (typically milliseconds), whereas the Sustain parameter is an amplitude level (typically dB). In organ work the Decay phase is more often than not irrelevant as far as the overall sound is concerned, because once the Attack phase is over we do not usually want the amplitude to reduce again before the steady state sound commences. Such an overshoot can produce a most peculiar effect if overdone. However it can have importance if a separate DAHDSR characteristic is applied to each harmonic of the sound separately in a frequency domain synthesiser. For example, the second harmonic of a diapason pipe often rises more rapidly than the fundamental to a higher peak value, before dropping back to a steady state (Sustain) level.

Do not confuse the Decay and Release parts of the curve; the ending of a sound when the key is released is governed by the Release value, not the Decay one. Some odd effects can be produced if the wrong value is tweaked inadvertently when adjusting the sounds.

Figure 4. Typical organ sound envelope applied to the VCA

The form of a typical volume (amplitude) envelope for a simulated organ pipe is shown in Figure 4. Only three of the possible six regions of the DAHDSR characteristic are used in this case. Thus the Delay, Hold and Decay regions are not used. Typically the Attack time will last for between 10 to 50 cycles of the fundamental frequency of the pipe, the exact value depending on the type of organ pipe being simulated. The Release time will be similar, but it can be extended if required to simulate reverberation in a simple manner.

Figure 5. Typical piano sound envelope applied to the VCA

A typical piano or harpsichord volume envelope is shown in Figure 5. Here the sound begins rapidly during a short Attack phase, then it immediately begins to Decay rather more slowly, and it terminates abruptly when the key is Released. In this case no Delay, Hold and Sustain phases were used. The characteristic would be modified if the piano sustain (damper) pedal was used, by allowing the sound to continue decaying even when the key had been released.

You will have noted that in these diagrams all regions of the DAHDSR curve have been represented by straight lines, thus the volume would ramp up or down linearly with time. This is often literally true with digital synthesisers today, whereas with analogue ones the corresponding line segments were more often curved (apart from the horizontal Hold and Sustain lines). This was because the voltage envelopes were derived from simple passive resistance-capacitance networks which produced an exponential (curved) voltage versus time response. Much discussion takes place about this today, and it is the case that the actual volume envelopes observed with real musical instruments are made up from curved lines rather than straight ones. Therefore this is one way in which the old fashioned analogue synth was better at imitating real instruments than some examples of its modern counterpart, hence this is at least one good and objective reason justifying the nostalgia for these old synths. However, in practice it is doubtful that the differences amount to much most of the time. If you dispute this statement, you can try implementing both types of curve on a digital synth (if it allows you to do this) and see if you can detect the differences by ear.

Figure 2 also shows an LFO (Low Frequency Oscillator) driving the VCA, and this can be used to cyclically modulate its gain to derive a simple amplitude vibrato effect. In turn the LFO is controlled by yet another DAHDSR circuit, and this enables the frequency and/or the amplitude of the LFO waveform to be varied during the keying interval. For example, using an initial Delay interval followed by an Attack phase would allow the vibrato to start after the note itself was sounding, and the vibrato intensity to increase thereafter. Such a vibrato characteristic is often used by orchestral players or singers, who sometimes do not begin to wobble their notes until a short time after the notes themselves have begun sounding.

VCF – Voltage Controlled Filter

In Figure 2 a DAHDSR circuit also drives the VCF. This enables the characteristics of the filter to be changed during the interval between keying a note and releasing it. It is difficult to go into great detail about this type of facility because the effects available depend strongly on what type of filter is available. For example, it could be a high pass, band pass, low pass or resonant filter. The DAHDSR envelope could in principle be made to modify any combination of the filter parameters during the keying interval. Typically a low pass filter will exist whose cutoff frequency can be varied, and a more elaborate one might allow its Q factor at the cutoff frequency to be modified also. Such a filter is useful when simulating a piano, whose higher harmonics reduce more rapidly during the Decay phase than the lower frequency ones. This could be simulated by causing the DAHDSR envelope to progressively reduce the cutoff frequency of the filter during the Decay phase of the note.

VCO – Voltage Controlled Oscillator

In analogue synthesisers the VCO generated a relatively simple type of waveform whose frequency (musical pitch) was derived from the keyboard control voltage. Typically sawtooth, sine, pulse, square or triangle waves were available. In modern digital synths the waveform can be much more complex, such as a complete sample of the sound which is to be reproduced, and more will be said about this later.

Figure 2 also shows another LFO (Low Frequency Oscillator) driving the VCO, and this can be used to modulate its frequency to derive a vibrato effect. This is different to the LFO applied to the VCA, which varies the amplitude of the sound as described earlier. As with the VCA, the LFO for the VCO is also controlled by an independent DAHDSR circuit, and this enables the frequency and/or the amplitude of the LFO waveform to be varied during the keying interval.

Digital Synthesisers in Detail

The digital representation of audio waveforms in computers, synthesisers, etc as binary digits (bits) or words is not described in this article, as it is assumed the reader has some basic knowledge of the subject. However it is described elsewhere on this website in Appendix 1 to reference [2]. That discussion relates specifically to digital organs, but it is sufficiently general to be of value here.

Digital synthesiser architecture

Figure 6. Typical digital synthesiser schematic

A top-level schematic diagram of a typical digital synthesiser is shown at Figure 6. Among other things, it contains N circuits capable of generating sound, and I have called these note generators. Each generator can be assigned under computer control to any voice of any note keyed (the computer itself is not shown in the diagram). The polyphony of such a synth would therefore have the value N, though beware of what this actually might mean in practice (see the ‘Room for Confusion’ section above). A typical figure for N would be at least 64. The outputs of all note generators are added together and sent to the stereo output ports, together with effects such as reverberation and chorus with programmable amounts. Keyboard input is via MIDI, and it includes key velocity sensitivity as standard which analogue synths often did not cater for.

The entire assembly is digital, and it runs at a constant sample rate throughout e.g. 44100 kHz [3]. The sample rate imposes a continuous rhythmic pulse on the signal and control flows within the entire synth, and it does not vary. At least 16 bit word lengths are used, giving a dynamic range of 96 dB. The constant sample rate means that some operations in a digital synth are more difficult and complicated than in an analogue one, especially frequency (pitch) shifting. I will come onto this later. Also note that a twin channel (stereo) capability is maintained throughout, even though I have only actually depicted this at the output.

Figure 7. Typical note generator schematic

Each note generator might have an architecture similar to that of Figure 7. The signal flow is shown in red and is little different to that shown earlier for the analogue synth. The three main processing blocks are still there, consisting of an oscillator followed by a filter followed by an amplifier. Therefore this confirms that a digital synth is essentially the same type of instrument as an analogue one as far as the performer is concerned, and anything that could be done on an analogue synth can be done on a digital one, and then some. However there are many detailed differences, one of them being that the analogue concept of voltage control of the parameters in these three processing units no longer exists. All control is digital via a computer. In the example shown in Figure 7 the parameters which can be controlled are the frequency of the oscillator, the cutoff frequency of the filter and the gain of the amplifier. The various ways this can be done are under the control of the four modulation modules at the left of the diagram. The functions these perform are similar to their counterparts described earlier for the analogue synth, but they are not identical and neither are the names applied to the modules. The differences are deliberate to illustrate that, while all synths are similar, few are identical.

Oscillator frequency and filter cutoff are controlled by a DAHDSR ‘Modulation Envelope’ module (outlined in green) in much the same way as in the case of the analogue synth described earlier. In particular, the DAHDSR control envelope is triggered in just the same way at the instant of key depression, and its Release phase commences when the key is lifted. Although this module is shown permanently connected to the oscillator and filter control lines, in fact it can be connected to just one or the other, or to neither. This is also true for all the other modules. The choice is up to the performer.

Vibrato is applied by varying the frequency of the oscillator by means of a ‘Vibrato LFO’ (purple). A second, independent, ‘Modulation LFO’ (blue) can also be connected to vary the oscillator frequency, and this is sometimes useful to get some interesting effects when the amplitudes and frequencies of the two LFO’s are different. This second LFO can also be used to modulate the filter cutoff frequency and the gain of the output amplifier.

Finally, a second DAHDSR module called ‘Volume Envelope’ (red) varies only the gain of the output amplifier. As with the analogue synth, this is probably the most frequently used and most important module because it applies a volume control envelope to the overall sound during the keying interval. Therefore it enables a range of widely different effects to be obtained (organ and piano envelopes, etc).

Figure 8. Typical Digital Synthesiser Display

A picture of a typical computer display for the type of digital synthesiser under discussion is shown in Figure 8. This is in fact a screenshot of the SoundFont editing program ‘Vienna’ (version 2.3), and it was chosen because SoundFonts are probably the most standardised and widely available synthesiser format in general use today. Choosing a standard and well known format was done simply to facilitate the discussions in this article rather than necessarily implying any recommendation for their use, and for the same reason the note generator schematic depicted in Figure 7 also broadly corresponds to this standard.

Nevertheless, if you are unfamiliar with digital synths you could do a lot worse than to gain experience by cutting your teeth on Vienna and SoundFonts, if only because much of what you need is free. A lot of information on SoundFonts can be found at www.creative.com/soundblaster/soundfont. The Vienna editor can also be downloaded free from that site. The latest version of Vienna (version 2.4) to my mind is in some ways not so easy to use as version 2.3, which at the time of writing is still available for free download. This is because the designers of 2.4 have tried to squeeze too much onto the screen in my opinion, rather than leaving it up to the user to request the less frequently used information via menus and dialogue boxes as in version 2.3. Notwithstanding this minor gripe, for a pretty amazing piece of free music software Vienna takes some beating. I often wonder why those who whinge about it so much don’t just go out and spend their hard earned cash on something they like better, instead of boring us all to death on the Internet. Free is a pretty good price.

To use Vienna you will also need a SoundFont-compatible computer sound card. However, please note that although I am happy to get involved in correspondence on most aspects of this article, I cannot offer advice on sound cards. This is a subject which changes so dramatically and continuously that anything I might say one day will almost certainly be out of date the next. Although intended only as a sound editor, Vienna can in fact perform many of the functions of a digital synthesiser as well – you can use it as a fairly complete polyphonic multitimbral synth. But to enjoy the full capabilities of a SoundFont based synthesiser you will also need a sequencer such as Steinberg’s Cubase which is not free, though ‘lite’ versions of it such as Cubasis are sometimes bundled with the software which comes with better quality sound cards.

For convenience in following this discussion, the four boxes outlined in colour in Figure 8 correspond to those in Figure 7. Note that these colours do not actually appear when you use Vienna as I added them deliberately to the picture to facilitate this discussion. The large pane containing the virtual keyboard can be ‘played’ by the mouse, or the corresponding keys will ‘move’ if you have an external MIDI keyboard connected to the PC. Below the keyboard can be seen a number of bars, which together comprise an Instrument in Vienna terminology, or a Voice in more generalised synth terminology. Each of these bars is only active over the range of keys (key group) specified, and one of them has been highlighted as indicated by the blue band. All of its parameters have also appeared in the four boxes mentioned earlier, and any of these can be adjusted at will. Tuning, filter parameters, reverb, chorus and pan (position in the stereo image) for the highlighted key group can also be adjusted in the two panes at the bottom left of the display.

Oscillators in digital synthesisers

Sampled sounds and looping

Although the oscillator, filter and amplifier processing blocks in analogue and digital synths perform broadly comparable functions, there are major differences in detail, particularly for the oscillator. For an analogue synth the oscillator waveforms available were restricted to the simple ones which could be generated easily such as sawtooth, sine or square waves. In a digital synth the oscillator is the point at which you can insert the sampled sound of your choice, and some synths of this type are called wavetable synths. The difference between the oscillators for analogue and digital synths is considerable, as the following will show.

Figure 9. Sampled sound of a piano note in a digital synthesiser

A sample of an actual piano note is illustrated in Figure 9, taken from one of the displays within Vienna. The rapid rise of the amplitude of the sound as the hammer hits the string is at the left hand end of the display, with the gradual decay following it. Note that this is the real acoustic waveform of a struck piano string as recorded by a microphone; it was not produced by the application of an artificial DAHDSR envelope to an artificial waveform as would have been necessary with an analogue synthesiser. As the sound continues to decay it eventually enters a region in which its amplitude remains relatively constant for a while, although much lower than that which it had when the note was first keyed. At the right hand end of the display can be seen a green line and a blue line close together. These are the loop points of the waveform.

Figure 10. Loop points of the piano waveform

The loop points are shown on an expanded horizontal (time) scale in Figure 10. The concept of looping is necessary to enable a wavetable synthesiser to sustain a sampled note for as long as necessary – in this case, for as long as a key on the keyboard of the synth is held down. What happens is that when the blue line on the sampled waveform is reached on replaying the stored waveform, the synth jumps back to the green line and then loops continuously between them until you release the key. By this means the synth can simulate a note held for as long as necessary.

Figure 11. Joining up the loop points of a piano waveform

Of course, it is necessary to ensure that there is no abrupt transition between the waveforms at the two loop points, otherwise this will be heard as an unacceptable audible discontinuity as long as the sound loops between them. It is necessary to choose the loop points carefully to avoid this, and Vienna enables you to do it as shown by Figure 11. Basically, you have to try to match up the waveform at the right hand edge of the left pane with that at the left hand edge of the right pane.

You do not have to go through this process if using ready-made commercial SoundFonts such as the General MIDI fonts which are widely available, as it has already been done for you by the SoundFont designer. It is only if you want to add new sounds of your own that you have to look at their waveforms in a SoundFont editor such as Vienna and assign loop points to them.

Frequency Shifting – Interpolation and Decimation

Another major difference between the oscillators in analogue and digital synthesisers concerns the way in which frequency (pitch) shifting of the waveform is done. In analogue synths the initial pitch of the waveform was defined by the analogue DC control voltage from the keyboard, this voltage corresponding to the key which was pressed. Within the synth, the keyboard control voltage could also be varied slightly to implement vibrato, pitch bend or anything else which required the frequency of the note to change while it was sounding. This made things pretty easy at an engineering level.

In digital synths things are much more complicated. If it was possible to vary the sample rate of the stored waveform within the oscillator it would be fairly simple to implement frequency shifts. The sample rate would be reduced to reduce the frequency, and vice versa. Unfortunately this is not possible, because all modules within a digital synthesiser must operate at exactly the same sample rate at all times [3]. This is because a large number of binary arithmetic operations have to be done on each voltage sample of a waveform as it passes through the system, such as when you want to alter the gain of an amplifier. In digital terms, altering the gain means multiplying each sample of the waveform passing through the amplifier by a constant factor (the gain) using a binary multiplier. If the sample rate of the data in different processing modules was not the same, it would become almost impossibly difficult to design the hardware to do the necessary arithmetic. At an engineering level digital synths are fearsomely complicated as it is, rather like computers, and they only work because certain rules are imposed to keep the complexity within bounds. One of these rules, both for computers and synthesisers, is that the whole thing must work at a constant sample rate, like a rapid heart beat pumping the successive digital samples through the veins of the entire synth.

Frequency (pitch) shifting therefore has to be done in a way which does not require the sample rate of the oscillator waveforms to be altered, and therefore the process of interpolation is used to increase the frequency. The inverse process, called decimation, reduces the frequency. For convenience, the two processes will be called interpolation in what follows as they are conceptually similar. Any discussion of interpolation can get quite complicated, and much of the material available elsewhere involves advanced mathematics from the start. The latter has been avoided here, although it is fair to say that from this point onwards the article gets rather more specialised. I have included this discussion of interpolation in the hope that it might throw light on a difficult subject in a reasonably straightforward way, as without some understanding of interpolation you cannot claim to fully understand digital synthesisers and their limitations. As one example of such limitations, the amount of arithmetical number crunching required by interpolation means that soft synths (those which perform all operations by software) can easily require a processor of almost supercomputer proportions to equal the range of frequency shifting options which the humblest hardware-based synths (such as sound cards) take in their stride. Another example is the distortion introduced by interpolators under certain circumstances.

Figure 12. Illustrating interpolation and decimation

The top part of Figure 12 shows a small section of an analogue waveform which was digitally sampled and stored in the oscillator of a synthesiser. The numbers stored correspond to the amplitudes of the waveform at the sampling instants indicated by the black vertical lines. Typically this original sample rate might be 44100 samples per second. Sketched beneath using red lines is a new, higher, sample rate. If the waveform was to be sampled at this rate, but then read out at the original (lower) rate, its frequency would be reduced. This process is called decimation. On the other hand, if the red lines were spaced more widely in time than the black ones, corresponding to a lower sample rate, we would increase the frequency of the waveform by then reading it out at the original sample rate. This process is called interpolation. As stated above, the two processes are similar and we shall refer to them both as interpolation.

Unfortunately we do not have stored samples of the waveform at the ‘red’ sampling instants because the waveform was only sampled and stored at the ‘black’ sample rate. However, by doing some reasonably simple mathematics we can calculate approximately what those samples would be. This is the process of interpolation, and it is of interest that the mathematics of interpolation was developed in the form discussed here by Isaac Newton in the 17^th century, long before digital sampling was dreamed of. However, being Newton, he took it to a level far beyond that which we shall struggle with in this article, and beyond what most ordinary mortals could understand even today.

Aliasing

Aliasing occurs when replaying a digital waveform if it contains frequency components higher than what is called the Nyquist frequency. The Nyquist frequency equals half the sample rate. An aliased waveform contains these spurious high frequencies reflected into the frequency band below the Nyquist frequency, and they can be heard as strange whining or whistling sounds. When doing interpolation with the intention of raising the frequency of a waveform, care has to be taken not to introduce aliased frequencies. This can happen because the effective sample rate, and thus the effective Nyquist frequency, of the original waveform is reduced in these circumstances. Therefore an interpolator usually includes an initial stage of digital filtering to remove sufficient of the high frequencies before the interpolation itself is performed. Aliasing is not a problem when reducing the frequency, however.

Further discussion on aliasing can be found in Appendix 1 to reference [2] elsewhere on this website.

Linear interpolation

Figure 13. Linear interpolation.

Figure 13 shows a magnified portion of the analogue waveform between three ‘black’ sampling instants, where we know all the ‘black’ sampled voltage and time values because they are stored as the oscillator waveform sample. It also shows the corresponding ‘red’ sampling instants where we do not in general know what the voltage values are. Because of the short time considered, the waveform is approximately a straight line during this interval, and therefore we can work out what the sampled voltage values would be at the ‘red’ sampling instants using only the equation of a straight line. This process is called linear interpolation. In mathematical parlance the problem is this: knowing the ‘black’ voltage values v_k+1 and v_k at times t_k+1 and t_k respectively, what is the unknown ‘red’ voltage sample v at time t? The unknown sample lies between two of the known samples.

The equation of any straight line in Figure 13 is:

v = c₀ + c₁ t (1)

where c₀ and c₁ are constants. If you are familiar with this type of equation you will know that the constants are, respectively, the intercept of the straight line on the vertical axis and the gradient of the line. You will also probably be able to show that these constants are given by:

c₁ = (v_k+1 - v_k ) /( t_k+1 - t_k) (2)

and

c₀ = v_k – c₁ t_k (3)

Therefore, substituting these values back into equation 1 enables any ‘red’ voltage value to be found in terms of the two ‘black’ values which bracket it.

A linear interpolator solves the three equations above many times to generate a complete new oscillator waveform sampled at a different rate, and because it needs two samples of the original waveform to generate each new sample, it is sometimes called a 2 point interpolator. Although it is the simplest form of interpolator, it is obvious from the equations above that it involves quite a few multiplications, divisions, subtractions and additions to generate each new sample. It is this feature which causes difficulties for a soft synth in which all the arithmetic has to be done by software. Such an apparently simple function as vibrato requires a complete new waveform to be generated by interpolation many times a second, to simulate the continual frequency changes required. Therefore soft synths can literally grind to a halt unless they run on very powerful computers, and for this reason they may not offer the range of frequency modulation facilities which a hardware-based synth enjoys. Of course, such limitations will not exactly be advertised by the manufacturers and therefore you must be prepared for disappointment when you discover them for yourself.

Higher order interpolation

Although linear interpolation is the simplest form, it suffers from other disadvantages. The most obvious is the assumption that the waveform is linear, because it is not. All waveforms are curved in practice otherwise they would never return to zero! This results in harmonic distortion in a linearly interpolated waveform which is required to loop between two loop points, because instead of the waveform curving nicely it is represented as a sequence of nasty linear segments. Fortunately, much of the time this does not matter too much because many audio waveforms are over-sampled. This means that the maximum frequency they contain is significantly less than the Nyquist frequency, which equals half the sample rate. For a waveform sampled at 44100 Hz the Nyquist frequency is 22050 Hz, yet few audio waveforms in synthesisers would contain upper harmonics anywhere near that frequency, and most of us couldn’t hear them even if they did. Although over-sampled waveforms are still curvy, the rate at which they curve is often sufficiently gradual to allow them to be approximated reasonably accurately as straight lines between successive samples. In this over-sampled situation linear interpolators can work quite well. The harmonic distortion they introduce gets worse as the frequency (pitch) of the waveform increases because the amount of over-sampling gets progressively less, but for the same reason much of the distortion is pushed progressively beyond the range of audibility.

Nevertheless, the distortion introduced by linear interpolation runs counter to the efforts made to maintain extremely high signal to noise ratio and extremely low distortion figures in digital equipment, and therefore it is rather lame just to say that “it’s not too bad”. Consequently most interpolators use more than two points of the original waveform when calculating each sample of the interpolated one, and this is the same thing as saying that they assume a curvy line instead of a straight one when doing the calculations. Many hardware-based synthesisers use 8 point interpolation, meaning that 8 samples of the original waveform are required to generate each new sample. Even quite cheap wavetable synths, such as those in some sound cards, use 8 point interpolation. For example, the E-mu10K1 and 10K2 sound engines which have been used in Creative Labs sound cards for some years have an impressive 8 point hardware interpolator for each of their 64 note generators. The number of calculations required per second is pretty fearsome and it can only be realised using dedicated high speed hardware arithmetic units in sound engines such as these. It is instructive to ask soft synth manufacturers or devotees what order of interpolation they use, because doing higher order interpolation consumes so much processor power that it is rare to find a soft synth using more than 2 point interpolation!

However, as with life generally, we take two steps forward only to take another back. A problem with higher order interpolation is that the interpolator can sometimes think the waveform is more or less curvy than it actually is, and consequently its calculated sample points can be erroneous. The result is harmonic distortion, just as with the simple linear interpolator. One way the problem can arise is due to noise superimposed on the original signal, and because the interpolator cannot distinguish between signal and noise, it merrily computes its new samples in a way that can sometimes amplify the noise. In a periodic waveform such as that produced when a synth is looping between loop points, noise appears as spurious harmonics, i.e. as harmonic distortion. With synths which use real acoustic samples there will always be some degree of noise on the waveform, and this can result in unsatisfactory interpolator performance. It is impossible to predict how severe the problem will be because it is data-dependent, but sometimes the results can be audible. Many people will have experienced the peculiar noises sometimes produced when the interpolator in a synth shifts the frequency of a sample, for example while it is executing vibrato.

I have done experiments on this aspect of interpolation and some of the results were interesting. For example, interpolating a sine wave pitched at middle C and sampled at 44100 Hz showed that anything above 4 point interpolation introduced progressively more harmonic distortion. This was probably because of the minute amount of quantisation noise present, which was audibly undetectable. The type of interpolation that has been discussed in this article is called polynomial interpolation, which assumes that the original sample points lie on a perfectly smooth curve. Noise disturbs the smoothness of the curve, and polynomial interpolators are therefore rather sensitive to it. However it is a complex subject, and one which cannot be taken further here.

Notes and References

1. “A MIDI Pedalboard Encoder”, C E Pykett, currently on this website (read).

2. “Voicing Electronic Organs”, C E Pykett, currently on this website (read).

3. A digital synthesiser will often allow samples at arbitrary sample rates to be inserted, but this does not necessarily mean that they are replayed at those rates within the synth. Usually if not always they will be automatically interpolated to the internal sample rate used within the system. This process will not usually be obvious to the user.

4. "Physical Modelling in Digital Organs", C E Pykett, currently on this website (read).