VOICING ELECTRONIC ORGANS
Posted: July 2003
Last revised: 23 December 2009
Copyright © C E Pykett
Abstract The subject of how today's digital electronic organs are voiced is many-facetted. However there is one factor usually paraded as a benefit which in fact is a decided hindrance. This factor is the complete lack of any intrinsic organ-like character in the hardware and software environment of a digital organ, unlike a pipe organ which can only ever sound like what it is. By contrast, an electronic organ could just as readily emit the sound of a barking dog as an organ pipe. Therefore this virtually unbounded flexibility demands a detailed understanding of the speaking characteristics of organ pipes before the organ can be voiced effectively. Consequently the majority of this article is concerned with this topic. It describes the types of digital organ available, how they work and the pros and cons of the various systems. It then covers matters such as how recordings of organ pipes can be made, how these may require pre-processing before being incorporated into an instrument and how sounds can be created without the need to first make recordings. Spectrum analysis is discussed in detail because its capabilities and limitations are widely misunderstood, particularly when applied to attack and release transients.
(click on the desired section to access it)
processes used in voicing digital electronic organs are not well known and
consequently there is much ill informed buzz and speculation about the subject,
as evidenced from Internet chat forums for example.
The reasons for the situation include the lack of information available from
manufacturers, and an apparent misunderstanding of topics such as spectrum
analysis and the physics of organ pipes even on the part of some of those in the
electronic music business.
article surveys the types of digital organ available and then moves on to how
they are voiced.
The most important feature is a detailed understanding of the nature of organ
pipe sounds, which is far more significant than the type of digital system used
to simulate them.
Without such knowledge no system, no matter how sophisticated, will achieve
This is because any digital organ, unlike its pipe counterpart, is not
intrinsically organ-like in any way.
Until it is programmed in minute detail to emit the sounds of organ pipes it
could just as easily become a digital dog-barking machine.
Therefore this article considers in detail the processes necessary to identify
and extract the essential features from the sounds of real pipes, including
their transient behaviour.
material in this article requires no more than a high school understanding of science, together
with a similar level of computer literacy. Some musical understanding and
experience of the pipe organ is also assumed.
Appendix 1 outlines the essentials of how digital signals are generated and
handled by computers for those who are unfamiliar with the subject. Appendix 2
covers additive synthesis, a topic which occurs repeatedly throughout the
However in both cases the Appendices explain a number of detailed technical
points, so they are recommended reading even for those with greater experience.
Spectrum analysis is covered in considerable detail in the article because not
only is it pivotal to an understanding of pipe sounds, but it is a subject
of the more detailed aspects can be omitted without a significant loss of
These are identified by a smaller typeface.
However this does not imply lesser importance, and anyone seriously into the
subject needs to be familiar with them.
Digital electronic organs came in two varieties
when this article was first posted in 2003, though a third (physical modelling)
made a later appearance in about 2006 in the form of a MIDI expander box made by
Viscount. In 2009 this company introduced a complete organ
using physical modelling technology. Frequently the first two
varieties are called sampling and real time synthesis
An example of the former is the Allen organ and of the latter, the Bradford
Computing Organ (and those which use derivatives of the Bradford technology).
However this nomenclature is unfortunate and misleading in one respect : the
term “real time” is used widely in the field of digital signal processing to describe a system
which is capable of responding to or
reproducing events at the rate
at which they occur in the real world.
Since both types of digital organ and indeed any musical instrument that was
ever made necessarily must have this attribute, it cannot be appropriated
legitimately to describe just one variety. In any case, it will be pointed out
later that at least one type of “real time” organ does not operate strictly
in real time at all.
Moreover, a defect of some digital organs in the past which still persists
sometimes today is that they cannot keep up with the demands of the player in
some circumstances - and this was true particularly of "real time"
This inexcusable attribute makes the use of the term “real time” ridiculous.
The use of the term to denote a particular kind of electronic organ is
usually only for sales and marketing purposes.
The use of the term to denote a particular kind of electronic organ is usually only for sales and marketing purposes.
terms time domain and frequency domain synthesis, which are also
the property of the wider signal processing community, are much better and we
shall use them in this article.
Samplers are examples of time domain synthesisers, and the so-called real time
organs to be described use frequency domain methods.
But what are these methods in plain language?
To understand them better it is helpful to take a trip into history.
the beginning, starting around the 1920’s, all electronic organs used analogue
But in 1971 there appeared a patent  which was materially associated with the
development and subsequent marketing of the Allen digital organ.
Although it was by no means unique among a plethora of similar patents both
before and since, it was in many ways a remarkable document and even today it
repays careful study.
Among other things it described a “generator” which could be assigned to any
key and which would read out from a memory one or more stored waveforms, at a
rate related to the key in question.
When the key was released the generator would be released also, after the
waveform had decayed gradually.
All of this was done years before the microprocessor had appeared in general
use, and before memory chips were widely available.
In fact the whole thing was a vast hard wired arrangement of simple binary logic
elements and the memory was contrived from diode arrays.
In practice the system was realised using a set of novel custom integrated
circuits designed by the North American Rockwell Corporation which
filed the patent.
This story is interesting not only for its technical content, but because it
portrays the evolution of virtually every digital organ since – a new
technique protected by patents filed by a specialist firm or other developer,
the subsequent development of custom integrated circuits of high complexity
(and, later, software as well), and the dissemination of these solely to one or
more organ manufacturers under tight commercial licensing conditions.
A few years later another key patent was filed in the
UK in 1976 by the National Research and Development
Corporation, a long-defunct invention of the interventionist government of the
day, acting for Bradford University. After a curiously long wait this patent was published in 1980
It described an alternative system in which the only stored waveform was a sine
wave (a pure tone), or even just part (one quadrant) of a sine wave, together with lists of
numbers describing the harmonic structures of the tones to be simulated.
By then cheap microprocessors had become available, considerably simplifying the
design problems, although they were not capable enough to undertake many of the
real time computations required.
Consequently this first Bradford Computing Organ design also contained esoteric
electronic chips such as hardware binary multipliers.
This system has been used by a number of organ manufacturers and so we see again
the same evolutionary pattern as before – a novel design licensed
to various organ
firms, most of which had played little or no part in the initial development
(although in the case of the Bradford system a few did).
It is useful to take stock of the historical context in which these two patents arose. A large part of the earlier one had nothing to do with the storage and generation of sounds at all, rather it described methods of key scanning and multiplexing which today are taken for granted. But the point is that when the patent was conferred these techniques (obvious though they were, even at the time) were also then the subject of legal protection. Thus during the 1970's and 80's other organ manufacturers were prevented, in theory, from even using these relatively trivial techniques let alone the novel method of sound generation itself. One consequence was that virtually all other firms continued to build analogue organs in this era. Although a minority of these was very good (but expensive), the rest were cheap and awful. In the midst of this situation the Bradford patent emerged, because it was able to sidestep the problem of infringement by describing a method of sound generation using frequency domain techniques rather than time domain ones. In the 1980's a few firms in Europe began to use the Bradford technology, including Ahlborn and Wyvern. By the mid-1980's the Musicom firm had also appeared which developed a derivative of the Bradford technology, and one of the first commercial users of this system was Copeman Hart in the UK. A few years later the situation relaxed still further with the expiry of the early time domain patents. Then firms worldwide were able to use time domain methods freely, and there was a flood of electronic organs in the early 1990's using direct storage of waveforms. The analogue organ market collapsed equally rapidly at that time.
Perhaps the most important message to take away from this story is that frequency domain synthesis arose during a period when the protection afforded to time domain synthesis was tight and vigorously defended. Because the Bradford work was funded by the British taxpayer it is reasonable to assume that it was driven by a desire to sidestep the time domain legal straitjacket and open the door wider to UK industry. Because of this background, some of the claims one still hears about the alleged superiority of frequency domain methods do not always have a sound factual basis.
situation depicted above has been repeated several times, particularly after the
original patents expired.
Both of the systems mentioned have been the subject of continuous development
and in addition others have appeared.
Today, various specialist firms source the hardware and software necessary for
the systems, and these are usually updated regularly.
It is rare for the electronic organ manufacturer himself to assemble his
products from scratch as he used to do in the analogue days.
But from a business perspective, the fact that some manufacturers have made the
decision to procure their enabling technology from a single source might imply a
strategy with some risk. This could have unfortunate implications not
only for themselves but for their customers in the long term.
This could have unfortunate implications not
only for themselves but for their customers in the long term.
we have seen that the
Bradford system began to be used by a few manufacturers in the 1980’s, and
shortly afterwards the technology also became available in an
alternative form from Musicom. Some
firms, such as Wyvern,
apparently continued with the Bradford system, though their implementation as described in 1999
 was somewhat dated by current standards.
The systems are modular in that they can be expanded to suit various sizes of
organ, measured in terms of keyboards and stops.
Depending partly on the size of the organ and partly on how much flexibility the
customer is prepared to pay for, these organs incorporate a number of “music
modules” (the name might vary depending on the manufacturer), each of which
can provide typically 64 independent note generators.
A single module of this size could only cope with a single department of a few
stops without running the risk of missing notes, another undesirable phenomenon
exhibited by the cheaper organs of either type.
A single module of this size could only cope with a single department of a few stops without running the risk of missing notes, another undesirable phenomenon exhibited by the cheaper organs of either type.
Allen digital organ, and it is probably safe to say most others, continues to
domain methods of sound production.
This technique also characterises most if not all pop music instruments such as
synthesisers and computer sound cards. It therefore has an enormous user base
world wide, with correspondingly wide acceptance and a large amount of pooled
The pros and cons of the systems continue to excite debate, not always conducted
objectively or with sufficient insight, among manufacturers, dealers,
performers, owners and others with an interest.
The third method of sound production in use today was mentioned above. Called physical modelling, it is based on a physical analysis of how musical instruments actually work, and the conversion of this understanding into mathematical equations and computer algorithms. Computers today are so powerful and fast that these equations can be solved in real time to generate the sound that the instrument itself would emit. The method has its roots in research begun several decades ago, and the clarinet was one of the first to be fully characterised in this manner. An article elsewhere on this site discusses physical modelling in detail , so it is not described further in this already rather long article.
problem besetting the subject is the paucity of information and documentation
available in the public domain from electronic organ manufacturers and those who
supply them with their enabling technology.
One can surmise why this is so, but unfortunately it does not serve the
interests of customers and it also allows an unhelpful surfeit of misinformation
and misunderstanding to flourish.
This can be observed on a daily basis merely by dipping into various Internet
discussion forums, or from the letters columns in the organ literature where
correspondents frequently plead for clarification.
Fortunately, even digital electronic organs cannot transcend the capabilities of
computer science and engineering or the laws of physics, regardless of claim and
counter-claim and strenuous efforts by the ad-men.
Therefore by examining the subject at this level we can identify their strengths
and weaknesses and the areas which are most important in terms of voicing,
regardless of the firms or technologies involved.
It is for this reason also that terms such as time or frequency domain synthesis
are used in this article, to avoid confusing matters of fact and principle with
particular commercial offerings.
We have just seen that
commercial music systems for the mass consumer market such as synthesisers use
time domain methods, and this is also true of many if not the majority of
digital electronic organs. It is also true for PC-based systems which
often call themselves virtual pipe organs.
The characteristic feature of time domain synthesis is the direct storage of
waveforms, or time series as they are more rigorously termed, as sequences of
Thus the digital organ described in  was a time domain system because it
relied on the storage of waveforms in memory, and these were read out and
supplied to the loudspeakers when keys were pressed.
Appendix 1 outlines the essentials of digital signal representation.
domain synthesis techniques are less common than time domain ones in the
commercial music world, at least at the mass consumer level, and the only ones
to be described are those which use a form of additive synthesis.
These techniques are classified as frequency domain ones in this article because
the stored values relate to frequency artefacts (spectra) rather than those
based on time (waveforms). Appendix 2 summarises the process of additive synthesis.
have just seen that in time domain synthesis the memory of the system is
occupied by samples of the actual waveforms we want to hear.
With frequency domain synthesis the only stored waveform as such is typically
one cycle of a sine wave, or even just one quadrant of a sine wave .
This is because the only waveform necessary in additive synthesis is a pure sine
wave at each harmonic frequency.
For this reason the Bradford patent  did not infringe those such as
protecting the Allen organ.
But additive synthesis relies on adding various harmonics, so it is also
necessary to store tables of numbers representing the strengths of the harmonics
making up the spectrum of an organ stop.
Thus for a diapason, typically 15 to 30 numbers would be needed for each
However, as with the time domain system, each stop requires several spectra to
enable the variation in tone quality across the keyboard to be captured.
Figure 3 shows a conventional representation of an amplitude spectrum. Each vertical line represents a harmonic and its height is the amplitude or strength of that harmonic. The harmonic amplitudes in this example are in dB (decibels), and although this representation is standard practice it can give rise to confusion. The use of the term “amplitude” is important here as it implies that an increase of 6 dB means a doubling of amplitude. An alternative representation uses harmonic intensities or powers (again the words are important as they have an exact meaning), which are the square of amplitude. In this case a doubling of intensity means a change of 3 dB. More explanation of decibel notation is available in . Therefore you have to look closely when decibels are used to find out which type of spectrum – amplitude or power – is implied, otherwise major errors can ensue in the subsequent synthesis. Sometimes the term SPL (Sound Pressure Level) appears on the dB axis and this denotes an amplitude measure, not intensity or power.
frequency domain synthesis each of the individual spectra making up one stop can
be assigned to a different keygroup, a range of notes, though in the commercial systems we
have mentioned each is usually assigned to a single note which is sometimes
referred to as a voicing point.
The computer system then usually derives an interpolated spectrum for every
other note from the two nearest voicing points.
Interpolation is a mathematical technique but it really means blending or
(to use an image processing term) morphing, the production of something
new from several constituents. If a note keyed corresponds precisely to a
voicing point then only that spectrum is used further in synthesising the
If the note lies between two voicing points, an interpolated (blended) spectrum
is calculated in which each harmonic has an amplitude which is some arithmetical
function of the same harmonic in the two spectra at the nearest voicing points.
In some systems the mathematical form of the interpolation function can itself
Interpolation disguises sudden audible discontinuities between voicing points,
although the operation of the interpolator can sometimes be detected by playing
chromatic scales up and down the keyboard – one can hear the changes in tone
quality between the voicing points, and indeed often find out where they are.
This illustrates at once the blatant crudeness of such instruments compared to
real pipe organs, where such phenomena are of course absent.
spectra themselves cannot be used directly for generating sound – for this
they need to be converted to waveforms.
This is done in the Bradford system, for example, in two stages. The spectra for
each note of a stop are first calculated by interpolation as described above
when the stop is drawn.
Then a corresponding waveform for each note, sometimes just a single cycle in
the early systems, is
derived by additive synthesis from each spectrum and placed in a temporary note
These processes occur whether keys are currently pressed or not.
The system then waits until a key is pressed, at which point the waveform for
that note and that stop is read out from the note memory just as in a time
This method of doing additive synthesis gets over the problem of doing it in
real time as the music is played, which would require an almost impossibly fast
Therefore some organ systems colloquially called “real time” are in fact
This is another reason for referring to them here as frequency domain systems.
In some systems the synthesised waveforms can be updated rapidly as the music is
played to simulate effects such as “live winding”, in which the speech of
pipes in a real organ is affected by the dynamics of the winding system.
This is discussed at more length later on.
It is a feature made much of by some manufacturers and pundits, but in fact it
uses only basic computer techniques.
It is no different in principle to the monitor on the humblest PC whose picture
content can be changed rapidly in response to the demands of a computer game,
In theory, a more efficient way to perform additive synthesis is to compute the Inverse Fourier Transform (IFT) using an FFT algorithm. Just as the forward Fourier Transform converts time domain data into the frequency domain, the inverse transform converts frequency data into the time domain. Therefore it can be used to generate a waveform from a set of harmonic amplitudes, i.e. from a spectrum. As with the forward transform, additional efficiency is gained by ignoring the phase of each harmonic. However if the transform length is small, care has to be exercised to ensure that the FFT is actually faster than the "brute force" method described above.
might be wondered why the frequency domain system is used, because it appears at
first sight to be more complicated than the time domain sampler technique.
When it was first developed one answer doubtless related to the patent
Nevertheless it is a reasonable question to ponder on, and the advantages and
disadvantages of each method will be considered later.
now leave this discussion of synthesis methods and turn to some related topics
which apply to all of them.
The term ADSR occurs repeatedly in electronic music, and it is necessary to
understand what it means.
It is an expression which arose in the early days of monophonic analogue
synthesisers and it stands for Attack, Decay, Sustain and Release.
Figure 4 is sketched an ADSR curve, which is the amplitude-versus-time envelope
of the sound of a single note of a single stop (although in the case of
frequency domain systems it can also relate to a single harmonic of a sound).
When a note is keyed the amplitude increases from zero over a time denoted by
the Attack parameter.
The maximum value reached during this phase is sometimes called the Initial
Then the amplitude might fall to some lower level over a second time period
defined by the Decay value.
After this the amplitude remains constant, at a level defined by Sustain, for
the time that the key is held down.
Finally when the key is released the sound drops to zero over the Release
The Attack, Sustain and Release phases correspond exactly to those in Figure 1
which illustrated how sound is generated from a finite wave sample.
that the Attack, Decay and Release parameters are all expressed as time
intervals (typically milliseconds), whereas the Sustain parameter is an
amplitude level (typically expressed in dB).
In organ work the Decay phase is more often than not irrelevant as far as the
overall sound is concerned, because once the Attack phase is over we do not
usually want the amplitude to reduce again before the steady state commences.
Such an overshoot can produce a most peculiar effect if overdone.
However it can have importance if a separate ADSR characteristic is applied to
each harmonic of the sound in
frequency domain synthesis.
For example, the second harmonic of a diapason pipe often rises more rapidly
than the fundamental to a higher peak value, before dropping back to a steady
state (Sustain) level.
not confuse the Decay and Release parts of the curve; the ending of a sound when
the key is released is governed by the Release value, not the Decay one.
Some odd effects can be produced if the wrong value is adjusted inadvertently
when voicing or regulating a stop!
term does not enjoy a universally acknowledged definition, although it is used
in all synthesis methods in some way or other even though its name may vary. It
governs the overall attenuation applied to each waveform and is quite
independent of the other parameters in the ADSR characteristic.
It is one of the most useful parameters available in a voicing program for
getting the regulation of a stop correct over the compass, that is, the way its
loudness varies across the keyboard.
In turn the quality of regulation of a stop materially influences its blending
characteristics with the others.
a sample is loaded into a time domain digital organ it will always be replayed
at some fixed maximum level (referred to here as an Initial Attenuation of 0 dB)
unless that level is changed.
This assumes there is no Decay phase, so that the Initial Attenuation and
Sustain levels in Figure 4 are one and the same.
This will be appropriate for the vast majority of organ samples.
Naturally, the level of the sample corresponding to one keygroup might not be
appropriate for the adjacent ones, so to prevent an audible discontinuity
appearing the Initial Attenuation parameter for the corresponding samples has to
It is given some other value such as – 6 dB, meaning that an attenuation by a
factor of 2 would be applied to that keygroup, although in practice the correct
values have to be found by trial and error listening tests.
Attenuation is also important for adjusting the relative loudness of the various
If this were not done a Dulciana would sound as loud as a Trombone, and to
prevent this the Dulciana might need to be attenuated by 35 dB or more relative
to the Trombone.
The computer in an organ has no way of divining what we want it to do unless we
tell it explicitly!
this discussion referred to time domain systems it also affects frequency domain
At the point when sounds are emitted from the organ, both systems are replaying
waveforms stored in computer memory.
For many purposes it does not matter how they got there.
is necessary to say a few words about how digital organs use the stored
waveforms or harmonic tables to produce sound when several notes are keyed at
One of the most important issues concerns polyphony, which is the ability of an
organ to produce many notes simultaneously as opposed to the monophonic
operation which characterised the earliest synthesisers.
The following brief discussion is included to assist an understanding of why polyphony is important, and why unlimited polyphony as in the pipe organ can be difficult to achieve in electronic ones. It should be appreciated that the hardware and software details mentioned are generic, rather than relating to any particular make of digital organ.
All organs can be considered to use generators to actually produce the sounds, although these are very different from the tone generators used in analogue organs. Moreover, the actual term used might vary depending on the manufacturer’s preferences. Each generator can be envisaged as a flexible digital circuit arrangement, or a very fast software module, which can accept any waveform for any stop and output it to any amplifier and loudspeaker. When a key is pressed the computer determines which waveform has to be used for a particular stop and loads it in some way into a free (currently unused) generator, although usually the waveform would not be physically relocated within the computer’s memory. Instead the generator would be loaded with the address of the waveform where it resides in memory. However this is a detail which is not of particular interest at present. Also one need not assume that each generator necessarily handles the complete waveform for a sound. Sometimes only specific harmonics are loaded into a generator, and these are then combined afterwards to produce the composite sound. Such techniques are associated mainly with frequency domain (additive) synthesis.
The computer also loads quite a lot of additional
information into the generator relating to the ADSR envelope, Initial
Attenuation, subtractive filter parameters (if relevant), looping parameters (if
relevant), the output channel to be used, etc before switching the generator on.
The note then sounds until the key is released.
If another key was to be pressed while the earlier note was still sounding it is
easy to see that another generator has to be available.
In fact if it is decided that up to 8 notes, say,
need to be
catered for simultaneously on each department of the organ then there need to
be at least 8 generators per department.
But this is an oversimplified view of things, and we need to consider how to
handle the use of multiple stops simultaneously as well as multiple notes.
organ has to cater not only for polyphonic operation on a single stop but on
multiple stops simultaneously.
It would be a poor organ on which one could only play one stop at a time.
In electronic music parlance this leads to the concept of multitimbrality, a
monstrous word no doubt intended to impress the gullible.
However it merely means the ability of an instrument to sound several stops of
different timbres at once.
an organ of any pretension it is essential to be able to introduce the subtle
variations in tuning which characterise a pipe organ.
If a single note is played on a pipe organ with two unison stops drawn they will
almost always be slightly out of tune as revealed by very slow beats.
In a recently tuned organ one beat every few seconds would be typical.
If this feature is not incorporated the sound of the digital organ will be thin
and unconvincing, and it can only be achieved by having enough generators to
assign to every key currently pressed of each stop currently in use.
Thus if there were 8 stops on a particular department and a polyphonic
capability of 8 notes was required, then a total of 64 generators would be
required just for this small department.
The generators are quite complex pieces of hardware and software which interface in an
intimate manner to the computer itself, so that the program in the computer can
control all of their functions.
It is this intrinsic complexity which demands the use of specialised integrated
circuits for digital organs, as it would be virtually impossible and definitely
uneconomic to build all the generators required using ordinary components.
Each “music module” of the Bradford system contains 64 generators, the same
as some modern computer sound cards.
ability to add more sound generating hardware to a digital organ at will, such
as by specifying the number of “music modules” to be used, means that enough
generators can be provided in theory to cater for the size of stop list under
However in practice, even with the advantages conferred by integration,
economies have to be made sometimes.
To ensure that sufficient generators remain free while the organ is being
played, some notes on some stops will either not sound at all or they will be
treated in a different manner under certain circumstances.
For example if a loud reed is being used at the same time as a quiet flue stop
is drawn on the same manual, the quiet stop simply might be ignored by the
system in the belief that the organist and his audience would not notice.
Or the waveforms for both the stops might be added together before being loaded
as a composite waveform into a single generator, thereby removing the “free
phase” effect that we have just seen is so desirable.
Some systems brutally drop notes and/or stops completely and randomly when an
overload situation approaches, rather than attempting to fail gracefully in the
problems arise if many couplers, particularly octave or suboctave couplers, are
used while playing.
A single octave coupler can double the number of generators required as soon as
it is drawn. This explains why such
couplers are used rather sparingly in most digital organs. Also mixtures can represent a considerable resources overhead if the individual
ranks of the mixture are assigned to separate generators, to properly simulate
independent ranks of pipes.
For this reason mixtures may use a single composite waveform which then only
requires a single generator per note played, just as with an ordinary stop.
However when this is done, it is then much more difficult (often impossible) to
simulate the slight independence of tuning between the ranks; the mixture will
be tempered to an unrealistic degree of exactness in this case.
all digital organs have to work in real time.
Unfortunately not all of them do, and sometimes players find that a perceptible
time lag arises when the music is fast, highly polyphonic, when many stops are
in use, or when some or all of these factors arise at once.
In the worst cases the system can grind to a halt, notes can be lost or they
continue to sound indefinitely.
The problems arise because the computers in the system simply cannot always cope
with the demands placed on them by the player.
It is fair to say that these problems are probably less frequent today than they
appeared to be some years ago, but they are of course completely indefensible in
an instrument of any sort.
If an electronic organ is dependent on MIDI, this can introduce similar problems
of its own .
TIME VERSUS FREQUENCY DOMAIN – PROS AND
section begins a discussion of the advantages and disadvantages of time domain
and frequency domain methods of synthesising sounds.
Not all conceivable systems can be considered, so the discussion is limited to
those mentioned previously which either store waveforms directly or those which
compute the waveforms from stored frequency information.
A number of points needs to be made or repeated first to facilitate the
discussion: firstly, it is useful to keep in mind that both types of organ produce
the sounds from waveforms stored in memory at the time the sounds are being
The fact that the waveforms in one type of organ are permanently resident
whereas in the other they are computed as required is sometimes of lesser
A second point is that, because of this difference, the amount of storage
required for the time domain system can be much greater than for the frequency
Each stored sample might be several seconds in length, and because there are
many samples the total amount of memory might exceed that for a system which
stores harmonic amplitude information.
In practice this is not usually a disadvantage given the ready availability and
cheapness of memory today.
Moreover, the difference is not always as great as might be supposed, because
the most sophisticated frequency domain systems also have to store large data
tables to simulate the transient portions of the waveforms, and even more if
real time fluctuations in quasi-steady-state sounds are simulated (see below).
A third difference is that the computing power required in frequency domain
systems is generally greater than in time domain ones, though again the
difference might not be as significant as often supposed when very high quality
simulation is demanded.
A major difference in capability between the two systems in principle is that time domain organs can reproduce a previously recorded pipe waveform directly whereas those using frequency domain methods cannot. In the latter case the waveforms have to be reconstructed from frequency information, and this information therefore has to be derived offline beforehand from the paradigm waveforms or in other ways. A frequency domain organ does not reproduce an exact copy of what you hear from an organ pipe because it will necessarily have been processed in some way first. In practice the difference in many cases is less important than sometimes claimed however, because making recordings of organ pipes to the necessary high standard required for direct insertion into a time domain sampler is extremely difficult. Extraneous noises are but one problem to overcome. In these cases the waveforms have to be cleaned up and then resynthesised offline, and similar processes are involved in both cases. Another major problem affecting time domain samplers is that of looping the waveform to simulate a note longer than that of the stored sample, and often it is impossible to find loop points on the raw waveform such that the ear cannot detect that looping is taking place. This can occur because of noise present in the signal, and although the noise can usually be suppressed, it is often undesirable to do this because it is the windy sound of some flue pipes which gives them much of their character. Even if loop points can be found in these cases, the repetitive "swishing" of the noise as it loops becomes offensive to the ear once it has detected its presence.
Claims are sometimes made today for digital processing techniques which profess to be able to loop "difficult" waveforms undetectably, but the plain fact is that they alter the waveform to such an extent that it can no longer be regarded as a sample of the sound as originally recorded. About the only way to surmount the looping problem without corrupting the original waveform is to use samples of such a length, in excess of ten seconds or so, that their duration is unlikely to be exceeded in normal playing. If this should occur then looping will, of course, have to be used to extend the apparent duration of the samples, but with such long samples it will only be required infrequently. Although memory is cheap today, the total memory requirement can nevertheless become problematical when hundreds or thousands of such long samples have to be stored, but it can be reduced considerably by carefully selecting the sample rate of each sample in relation to its audio bandwidth. For example, it is wasteful in terms of memory to use a 44.1 kHz sample rate for samples representing the lowest notes of a pedal Bourdon when a rate of 8 kHz or less would suffice.
Regardless of the length of the stored samples, yet another problem affecting time domain samplers concerns the termination of the sound. When the key is released it is necessary for the system to jump immediately to the release phase of the sound, and if a characteristic release transient and/or the recorded ambience of the auditorium is required it can be difficult to achieve a seamless and undetectable "join" between the previous steady state phase and release. Most systems which claim to have solved this problem in fact have shortcomings which are only detected when playing the instrument, though there are ways to achieve it.
Frequency domain synthesis also has peculiar difficulties in some situations,
such as when the frequency structure of the waveform is changing rapidly.
This can occur during transient phases of the waveforms.
Capturing a complicated transient directly simply by recording it is sometimes
the better, indeed the only, way to incorporate it in a digital organ.
Many transients are so complicated that they are impossible to synthesise in any
other way because of issues such as partials which begin as band limited noise
rather than as discrete frequencies. Even analysing such transients can be
next to impossible because of the mathematical
limitations of spectrum analysis, and if you cannot analyse a transient in the
first place then you have no hope of being able to re-synthesise it accurately
using frequency domain techniques. Also, the addition of realistic-sounding wind noise to the tones can
often be difficult with these systems. All these issues raise important details which will be expanded in later
claims are made for the frequency domain system that it is better at mimicking
the subtle variations which occur while a real organ pipe is speaking.
Pipes respond sensitively to their environment in myriad ways, and the onset of
speech can be affected by factors such as how many other stops are drawn and how
many notes are demanded simultaneously.
Pipes can interact through a common wind supply and
acoustically with others nearby, and there is some truth in the statement that a
pipe never speaks the same way twice.
Critics of the time domain system point out that to incorporate such effects it
would be necessary to store an impracticably large number of samples, whereas
the frequency domain system can simulate them by computing the necessary
variations to the waveforms while they are sounding.
In practice the truth, as usual, lies between these extremes, and time domain
systems have reached a level of considerable sophistication in view of their
ubiquity. It is just as easy (or
difficult) to incorporate multiple time domain samples as it is to incorporate
multiple tables of harmonic amplitudes or other data. More than ten years ago at least one make of time domain organ even then was
able to simulate the gentle variations, including the “burbling”, of a reed
stop while the key was held down for an extended period.
Therefore, particularly today, the ability to include these and many other
effects are not solely the province of frequency domain systems.
Both systems require the storage of large amounts of data to properly simulate
these subtleties, together with highly sophisticated hardware and software host
In terms of information theory, a time domain system is in all
circumstances and in every way equivalent to a frequency domain one. Neither has
an advantage over the other in this respect, and anyone who maintains it does
must have a technical insight which implies they should apply for the next Nobel
In terms of information theory, a time domain system is in all circumstances and in every way equivalent to a frequency domain one. Neither has an advantage over the other in this respect, and anyone who maintains it does must have a technical insight which implies they should apply for the next Nobel prize.
characteristics attributable to unsteady winding were delightfully set out by a
19th century description of Austin’s Universal Wind Chest system :
description continues in similar vein to describe robbing caused by the
conveyances and trunking but not, curiously, inadequate pallets.
Perhaps this is because even Austin’s system could not dispense with these.
Nevertheless it is fascinating to thus see how fashions have swung from attempts
to eradicate these “defects” to today’s strenuous endeavours to simulate
them in electronic organs, and to re-introduce them in some pipe instruments!
claim sometimes made in favour of frequency domain organs is that it is easier
to voice them.
In order to change the harmonic structure of a stop in these instruments, it is
only necessary to alter the tables of stored harmonic amplitudes.
Manufacturers’ voicing software generally facilitates this by allowing the
voicer to make these adjustments in real time while a note is sounding, through
the use of convenient on-screen displays containing virtual “gain controls”
for each harmonic (of which there might be up to 250 or so).
However it is also straightforward to implement such a scheme for time domain
Having altered the harmonic structure of a sound in a time domain organ, it is
then necessary to do an Inverse Fourier Transform (additive synthesis) to
generate the new version of the waveform.
This has to be done each time the slightest adjustment to the harmonic
amplitudes is made.
However the speed of modern computers should make this somewhat longer process
transparent to the voicer.
If it does not, the voicing software and hardware being used is non-optimum and
out of date.
In practice, one area where frequency domain techniques run out of steam before time domain ones is in the number of waveform recipes which can be stored. Storage of separate samples (or even multiple samples) for every note of every stop is quite feasible for the cheapest and most basic time domain synthesisers today - even the humble SoundFont based synths used in many computer sound cards can do this. However the latest Bradford frequency domain organ simulator (BEST) still has difficulty in competing in this vital area according to its inventors , and therefore it has to fall back on interpolating between a smaller number of voicing points as described previously. This shortcoming certainly reflected the rather dated technology on which earlier Bradford systems were hosted (thirty-year-old Z80/S80 low speed 8 bit processors are apparently still used in some commercial organ systems which use Bradford technology).
the two techniques appear to be moving closer to each other in that some
frequency domain organs also rely on storage of waveform samples, such as BEST.
The addition of wind noise to the sounds of certain stops is achieved in these
cases by adding sampled noise.
This is not surprising because noise contains an infinite number of Fourier
components; it cannot be synthesised from a spectrum of finite size.
This illustrates another disadvantage of the frequency domain system.
Some starting transients may also be stored directly as waveforms rather than as
extremely complex data tables.
would be comfortable to be able to say that neither system has a definite nor
incontrovertible edge over the other.
Both can simulate organ tones to a high degree of realism, but both can
also produce disappointing results.
As with pipe organs themselves it boils down to how well they have been voiced,
assuming that the hardware and software environment has the necessary degree of
flexibility and sophistication to allow the voicer to flex his muscles as it
And, as with pipes, the voicer has to know his business inside out, which is
something which only comes from experience and an “ear” for the job.
Remember that the frequency domain system arose at a time when the patent
situation was weighted heavily against the majority of organ
manufacturers. This fact, rather than spurious technical
claims, was largely responsible for its genesis.
Remember that the frequency domain system arose at a time when the patent situation was weighted heavily against the majority of organ manufacturers. This fact, rather than spurious technical claims, was largely responsible for its genesis.
But I think it is possible to make an objective choice in favour of one system over the other, and therefore I shall conclude this section on the pros and cons of the two methods by stating my own preference. Because frequency domain ("real time") organs cannot reproduce an exact copy of the sound emitted by an organ pipe, I find them inferior to time domain ("sampling") systems in practice. It is not possible to model, and thus program, every conceivable nuance of pipe speech for the same reasons that it is impossible to achieve accurate weather forecasts. In both cases some factors are difficult to reduce to the mere numbers that computer programs demand, and we do not know about other factors at all. Also the dated hardware environments on which some frequency domain systems are hosted simply cannot compete in capability with those used by the synthesiser world at large for time domain synthesis. Therefore "real time" organs tend to produce a rather blander and smoother approximation to pipe sounds than "sampling" ones, often quite acceptable but less characterful nevertheless. The variations of the sounds from one actual pipe to another and those which occur while actual pipes are sounding tend to be captured better by "sampling" than by "real time" organs. I have not yet heard a "real time" digital organ which comes anywhere near being able to simulate the character of a Schnitger or Silbermann instrument for example. Manufacturers of these types of organ seem content with the numbing blandness of mid-20th century British and American organs. To illustrate this, the two clips which follow were produced by the "sampling" and "real time" synthesis methods respectively:
"Sampled" sounds - mp3 file (3.8 MB/4m 9s)
(Extracts from Mein junges Leben hat ein End, Sweelinck. About 1/2 minute from each movement)
"Real time" sounds - mp3 file (719 KB/46s)
(Extract from Ave Verum, Mozart)
section begins a discussion of some of the processes involved in voicing a
The term “voicing” is used rather loosely here to include many activities,
ranging from the collection of raw sound samples through to voicing operations
themselves, but it is assumed the reader will have no difficulty in
understanding the meaning of the terms from the contexts in which they appear .
The voicing process requires two categories of
The first includes such things as making recordings of pipe waveforms, and
converting them into tables of harmonic amplitudes and other data to be
synthesised back into sounds.
Thus spectrum analysis must be
included in the range of processing functions available so that the harmonic
structure of the waveforms can be derived.
Much if not all of the necessary hardware and software can be obtained
commercially, though a customised software suite will often be better suited to
The second category includes activities associated with the minutiae of loading
all these data into a particular organ and adjusting them to taste within more
or less narrow limits.
This requires a software and hardware environment
necessarily specific to a particular organ; there are many organs on the market
and they are constantly evolving.
Moreover, details concerning this environment can usually only be obtained from
the manufacturer at his discretion.
Therefore only the first category of activities can be described here.
However it covers an extremely important range of topics for the reason now to
digital organ when first switched on is nothing but a vast empty vacuum which
cannot make the merest squeak on its own. It is about as much use as a pipe
organ without any pipes.
The vacuum has to be filled by specifying almost every conceivable
characteristic of every stop, sometimes on a note-by-note basis.
This statement is particularly true of frequency domain organs, which are
incapable of accepting waveform samples.
In these instruments an enormous amount of data have to be laboriously inserted
before even a single new stop can be heard. By “new stop” is meant one which
has not been used before in that organ.
For example, merely to hear the steady state sound of a stop (i.e. neglecting
its attack transient) the voicing points have to be defined, the harmonics for
each voicing point have to be inserted, the way that the interpolator treats the
variation of each harmonic between each pair of voicing points has to be
To specify the transients in addition, it is typically necessary to define which
frequencies are present in the transients, how the frequencies might change with
time and the specific ADSR characteristics for each.
Having done all this, the stop may then require a huge amount of additional data
before aspects such as live winding can be reproduced.
In other words it is necessary to input hundreds if not thousands of items of
data before a frequency domain organ can emit sounds.
time domain organ can be got going much more quickly because a set of samples
for a new stop can be loaded into it very rapidly.
Nevertheless, the collateral information necessary to tonally finish and
regulate the instrument properly is considerable and consumes a lot of time.
this situation to that for real pipes.
A pipe can only emit the sound for which it was designed, more or less.
If a diapason pipe is made it will only ever sound like a diapason, whose
precise tone is determined by its scale, the cut-up, etc.
All the voicer can do is to ensure it comes onto speech properly and, within
narrow limits, emits the sound which its designer and maker intended.
Obviously he could not change a diapason into a reed.
Yet that is exactly what is possible within the vacuum of a digital organ, which
contains nothing intrinsically organ-like at all.
It could just as easily emit the sound of a barking dog or a car engine revving
as an organ.
This virtually unbounded flexibility is usually paraded as a benefit but in fact it is a decided disadvantage. It is the main reason why digital organ manufacturers have been forced to copy real organ tone. Without an a priori detailed knowledge of the structure of pipe sounds they could never have conjured out of thin air the myriad parameters necessary to enable a digital organ to sound like a pipe organ. This is not to deny that with sufficient experience it is possible to create a new sound without recourse to real pipe data. But most digital organ sounds are developed on the basis of a detailed examination of pipe waveforms and, to reinforce the point again, they could never have got off the ground even in principle were it not for such examination. Therefore the derivation, analysis and understanding of real pipe sounds materially affects all digital organs and it is the subject of prime importance in voicing. The subsequent process of loading the results of the studies into a particular make of organ is a mechanical process by comparison.
principle, therefore, there are two ways to insert a new sound into any digital
organ: one can start either from real pipe sounds or the sound can be
synthesised from scratch (but with a priori knowledge and experience).
In practice a combination of both methods is also used, often of necessity
because of problems such as unacceptable levels of noise or other forms of
corruption of real pipe sounds.
Noise can result from the blower, the organ action, traffic, etc.
In these cases the original data have to be processed to remove the offending
In what follows we shall refer to these three methods as “real”,
“synthetic” and “hybrid” voicing for brevity.
of the steps necessary to insert a new sound into any form of digital organ are
For time domain organs one would first wish to obtain a satisfactory sample of
the waveform of interest, including its transient structure, using any of the
three methods outlined above.
In the “synthetic” (and sometimes the “hybrid”) case the waveform would
typically be generated offline from a table of harmonic amplitudes using an
additive synthesis (inverse Fourier Transform) program. The harmonic amplitudes
would be produced by the voicer based on his intuition, judgement and
Then the waveform would be loaded into the organ system and parameters such as
ADSR adjusted until it matched other samples relating to the same stop, and
those of other stops.
Note that the generation of the transient portions of the waveforms in
“synthetic” or “hybrid” synthesis is much more difficult than for the
steady state portions, hence the frequent reliance on the “real” voicing
method in practice.
frequency domain organs the waveform itself cannot be loaded into the system.
In the “real” case a spectrum analysis of the harmonic structure of a real
pipe sound would be used to populate the necessary tables of harmonic
amplitudes. In the “hybrid” case these values would be modified
heuristically in some way. In the “synthetic” case the amplitudes would be
created from scratch as described above.
In all cases, difficulties related to transient structure will often be
encountered when dealing with frequency domain synthesis.
Making Recordings of Organ Stops
of real organ pipes are used in the “real” and “hybrid” methods of
voicing as defined above.
The advantages of using recordings of the sounds to be simulated include the
ability to reproduce the starting and termination transients of the tones, which can be very
complex in structure and therefore difficult or impossible to synthesise in any
By simply using a recording of the waveform we do not need to concern ourselves
about this complexity; the transients will be reproduced automatically as part
and parcel of the process.
However this is only possible with time domain organs, or frequency domain ones
which also allow samples to be used.
Having said this, a major disadvantage is the
difficulty of making recordings of sufficiently high quality to be used
directly, and most people find it is only after trying to do it that the
problems become apparent.
There is also an aesthetic consideration of considerable importance.
Frequently one sees a firm trumpeting the fact that
recordings of one famous organ or another have been incorporated into their
products, yet they seem not to realise that they are alienating a substantial
part of their potential market.
By admitting they are merely adept copyists of first rate organ tone rather than
being able to generate it from scratch, they are also admitting doubts about
their inability to understand and implement the voicing techniques which are the
subject of this article.
Some critics go so far as to argue that producing “pirated” samples from
recordings of pipe organs is a form of copyright infringement , which does
not apply to generating them synthetically.
This is why both methods are discussed here.
Anyone who has yet to try making recordings of organ pipes will probably be surprised by the difficulty of achieving a quality sufficient for direct use in a digital sampling system. The many issues involved include the following:
6 illustrates a similar situation but the signal phases cancel, resulting in a
reduction of the signal level recorded for that harmonic.
Because each harmonic is at a different frequency the
phases of the reflected waves in each case will differ because of simple
geometry: the path lengths between the pipe, microphone and wall remain the
same, whereas the distance between the peaks of the waves
(the wavelength) varies with frequency.
Thus the amount of reinforcement or cancellation will be different for each
Therefore standing waves will produce a distortion of the true spectrum shape
for the pipe under consideration.
general, the more attractive the ambience of the building in terms of
reverberation, the more problematical will standing waves be when making
recordings in it.
This is because it is precisely the phenomena which give rise to standing waves
(reflections within the building) which also generate ambience.
A multiplicity of hard stone surfaces results in multiple reflections in which
the amplitudes of the reflected waves decay slowly; there is little energy loss
at each reflection.
It is not unusual to find harmonics in the recorded waveform which almost
completely vanish during the steady state phase of a pipe sounding within such
buildings, particularly at the lower frequencies.
experience one can often tell when a particular recording is over-contaminated
by standing wave distortions using a simple listening criterion.
Because the standing waves do not build up or die away instantaneously, the
nature of the replayed sound when it begins and ends gives clues as to the
relative proportions of direct and reflected energy.
At the beginning of the sound the microphone picks up the direct wave from the
pipe before the reflected waves appear.
At the end the reverse applies, because the direct wave is cut off abruptly
while the reflected ones continue to die away.
If there is significant standing wave distortion the timbre of the sound will be
heard to change suddenly when the pipe ceases speaking and the sound dies away, and this will often
reveal which frequency bands were over-attenuated or over-amplified during the steady
In these circumstances the signal from another microphone should be tried
because the standing wave effects at another location will, of course, be
In an attempt to remove the
problems of standing waves, some recordings are made in anechoic conditions, in
which the pipe radiates in a free acoustic field because no reflections occur.
In the past some firms have gained advertising mileage by claiming that they
make their pipe recordings in an anechoic chamber.
However anyone who has been in such a chamber will have experienced the utterly
unnatural sound within, and if such sounds were to be subsequently radiated from the loudspeakers in a small room it is unlikely they would
In a large auditorium the situation is different and it is arguable that the
anechoic recording method might be the more appropriate.
Therefore the anechoic approach cannot always be recommended nor dismissed for
An alternative approach to
achieve near-anechoic conditions is to mount the pipes some distance above
ground out of doors.
Both techniques border on the impractical for most purposes and they are
certainly an expensive way to make recordings.
To conclude, the best criterion is to train one’s ears to judge whether a particular recording is acceptable or not; if the sounds on the recording are what you as the voicer want to hear, then there is no reason why you should not use them in a digital organ.
Pre-processing Recorded Waveforms
Some form of pre-processing is
almost always required before a recorded waveform can be used.
Even if it is of the highest technical quality it will usually need to be
processed in some way before being loaded into a time domain digital organ.
For example, there will often be a silent section at the beginning which needs
to be edited out, or action noise at the point of pallet opening will likewise
need to be removed.
These examples are of the
simplest types of pre-processing and they can be executed using widely
One way to proceed is to use an ordinary personal computer with one of the many
time domain editors available commercially.
They are referred to here as time domain editors because they are basic programs
which only operate on the waveform itself; they do not usually offer frequency
domain facilities such as spectrum analysis.
A useful editor is CTWave by Creative Technology who make the
SoundBlaster range of computer sound cards, and it is often supplied with a
sound card or available from many sources on the Internet.
It runs under the Microsoft Windows operating system and requires a sound card
The sound card, connected to a high quality audio system, is essential so that
you can hear the results of the editing process.
Editors of this type generally operate on Windows WAV files both for input and
With modern computers and sound
cards it is possible to generate an input file for the editor using a digital
link from the recorder which was used to record the sample.
If a Minidisc the link will often be an optical one, or a Firewire/SB1394
connection might be available.
Doing things this way avoids the slight loss of quality which would result if
the recording had to be played back in analogue mode and re-digitised to
generate the WAV file.
However if this is unavoidable, editors such as CTWave also incorporate
the necessary digitising facilities when used in conjunction with a sound card.
The output WAV file containing the edited version of the waveform can readily be
converted to whatever data type is required by the organ system because the file
format is standardised and available in the public domain.
The use of Windows, a PC and WAV files makes for a very economical and flexible
housekeeping system for managing and processing the recorded and edited data.
To perform spectrum analysis a frequency domain editor is required. These are not as easily obtainable as time domain editors, indeed they can be very expensive and even then the facilities offered might be little more than rudimentary. Because of this problem, customised editing software is often better able to perform the necessary frequency domain functions for digital organ applications. Commercial editors often seem to be limited to the radix 2 FFT, meaning that data lengths are restricted to powers of two as explained previously. This is quite unnecessary today.
Two important cases where spectrum analysis is required will now be described, the software being different for each.
If the recorded signal is very clean, with a high signal to noise ratio and satisfactory in every other way as judged by listening tests, it is possible to derive its harmonic structure in the steady state simply by analysing a single cycle of the steady state waveform. The harmonic amplitudes so produced can then be inserted into a frequency domain organ directly if desired.
Firstly the single cycle is identified or
cut out of the steady state part of the recorded waveform using standard
on-screen time domain editing functions.
It is desirable that the selected cycle begins and ends as close to the zero
line as possible, otherwise harmonic information will be lost or harmonic
The number of data samples present in the selected cycle then needs to be
examined for two reasons.
If a radix 2 FFT algorithm is being used, the number of samples will need to be
increased by interpolation to the next highest power of two.
Also the maximum number of harmonics which can be extracted from the data cannot
be more than half the original number of data points, even after interpolating
to some higher figure.
Interpolation cannot put information into the data which was not there to start
Interpolating to a higher sampling rate is the same thing as oversampling, a
technique widely used in devices such as CD players.
An interpolator will usually be in the
toolkit of most frequency domain wave editors.
Note that it is dangerous to interpolate downwards; for example, if the number
of samples in the selected cycle is 328 it might be tempting to specify an FFT
length of 256 data points.
However, unless the interpolator also includes automatic digital low pass
filtering, there is a risk that aliased frequencies will be introduced into the
Therefore in this case it would be safer to interpolate upwards to 512 points
and then to use no more than 164 (= 328/2) harmonics in the resulting spectrum.
Incidentally, these issues illustrate how awkward it can be if one is forced to
use a radix 2 FFT editor.
They do not arise if a mixed radix FFT algorithm is available, which will
operate on any data sample length.
Despite its simplicity and the fact it is
often used, single cycle analysis is not a robust means of deriving a spectrum.
Its main shortcoming is that noise or any other undesired artefact in the signal
is always forced into the harmonics.
This can result in major spectral distortions.
The reason this occurs is as follows.
The spectrum of any data sample of length (number of data points) N is of
length N/2 + 1, representing the theoretical maximum number
of harmonics plus the zero frequency (DC) component.
Because there is only one cycle in the input data, the harmonics will always
occupy adjacent frequency slots in the spectrum – there are no empty spaces
Therefore noise in the data can only appear in the spectrum positions occupied
by the harmonics themselves.
Besides noise on the signal, other factors
which can corrupt the sample being analysed include careless editing so that the
beginning and end of the chosen cycle do not join up – there is a
discontinuity which introduces false harmonic information.
Upwards interpolation can reduce this problem though.
Then the chosen cycle might not be a “good” one for several reasons, such as
its being too close to the attack transient of the pipe.
These errors can all be reduced by taking several cycles and averaging their
However an effective way of deciding whether the spectrum is a “good” one is
to re-synthesise the cycle of data from it by using the harmonics in the
spectrum in an inverse Fourier Transform.
The re-synthesised data can be turned into a continuous waveform by repeated
looping and then simply listened to.
An easy way to do this is to use the facilities of a computer sound card if it
is not possible to insert the data into an organ.
The sound which results should be very close or identical to that of the
If it is not, the data should be rejected and you will have to start again.
constitutes a major problem in spectrum analysis, and it means that any spectrum
derived from a noisy signal is only an estimate of the harmonic
The single cycle method is not robust because it is particularly vulnerable to
these problems. The method assumes that the signal is deterministic (can be
fully predicted) rather than stochastic (cannot be predicted because of the
presence of random processes such as noise).
However the problems can be reduced by using multiple cycles instead.
In this case a spectrum is derived using several, perhaps many, cycles of the waveform. If the number of cycles analysed is p, the harmonics in the spectrum will occur at every pth frequency slot instead of each successive slot as for the single cycle case where p = 1. This has the advantage that much of the noise and other unwanted artefacts in the signal will be distributed in the gaps between the harmonics, thus the amplitudes of the harmonics themselves will generally be less affected. The noise is not forced solely into the harmonic frequencies as with single cycle analysis. Although the harmonic amplitudes are still only estimates of their true values, the estimates in this case will be more accurate than in the single cycle example.
As in the single cycle case, the data segment to be used can be selected manually from the original waveform using a waveform editor. The required number of cycles is cut out from the waveform, taking care to ensure that the beginning and end of the selected segment lie close to the zero line so they do not introduce a significant discontinuity into the spectrum analysis. However it is better in this case to apply a graded data window to the segment, which has the effect of reducing the effects of such discontinuities. There are many window functions which can be used but an appropriate one for these purposes is the Hamming function. This is available in the toolkit of several up-market commercial editors, such as CoolEditPro (this was a widely used editor when this article was first posted, but it is no longer available. Many others are now widely used, such as WaveLab). The pros and cons of various data windows cannot be entered into here but a rigorous treatment, though not for the faint hearted, is available in the classic text by Blackman and Tukey .
The Hamming window function has the shape illustrated in Figure 7. In this diagram the vertical axis represents the value of the function, whereas the numbers on the horizontal axis are arbitrary. This axis merely indicates the extent of the data segment selected for analysis, values around 20 lying in the centre.
The Hamming window function
Hamming window is based on a cosine function, so it has a maximum value of 1 at
It does not fall quite to zero at the extremes of the window, having a value of
0.08 at these points.
The data values in the selected waveform segment are multiplied by the
corresponding values of the Hamming function before the Fourier Transform is
executed, thus the window has the effect of shading off the values towards each
Perhaps surprisingly, the use of the Hamming function does not make much
difference to the spectrum values for those components of the signal which are
deterministic and continuous.
In other words the harmonics emerge relatively unchanged, whereas aspects such
as discontinuities at the beginning and end of the data segment have much less
effect than would have been the case if a window function had not been used.
This is because of the shading effect.
example of a spectrum derived this way is in Figure 8.
Noisy Multiple Cycle Spectrum after Hamming Window applied
The pipe in question was at treble F# on a
4 foot Principal rank (fundamental frequency about 1480 Hz) and about 60 cycles
of the fundamental were used in the spectrum analysis.
However things were made deliberately difficult for the purpose of illustrating
the effectiveness of this form of analysis – the signal to noise ratio of the
recording was intentionally made very poor on account of the high level of
blower noise allowed to contaminate the recording, together with the
considerable distance of the microphone from the pipe.
The signal to noise ratio could have been much improved by using a high pass
filter at the microphone output to suppress the outband noise from the blower,
as described in the earlier section dealing with recording techniques.
As it is, the high level of blower noise can be seen from the noisy nature of
the spectrum between the harmonics.
Also the high DC level will be noted from the line at zero frequency; in fact
this also was due to the blower because of the large amplitude low frequency
fluctuations it impressed on the signal.
Thus the desired segment had, by chance, a large DC offset which was no doubt
increased because no attention was paid to ensuring the selected segment started
and ended close to the zero line.
In spite of all the advice given hitherto
having therefore been deliberately ignored, it is remarkable how sharp the
harmonic spikes are.
It would have been impossible to have done a single cycle analysis in this case
because the desired signal could barely be seen on top of the much larger noise
fluctuations from the blower, therefore a clean cycle could not have been
selected on which to operate.
The discussion above brings us conveniently
to the subject of how to remove noise from a recording and, having done so, how
to reconstitute a clean signal.
From a noisy spectrum such as Figure 8 a noise-free
signal can be reconstituted very simply.
One merely reads off the amplitudes of the harmonics, of which there are 6 in
this case, and uses them in an inverse Fourier Transform (additive synthesis)
This will generate a few cycles of a noise-free sound sample at the pitch of the
fundamental frequency (about 1480 Hz) which can be auditioned by looping.
If this reconstituted signal does not sound close enough to the original one,
allowing for the noisy nature of the latter, then another attempt at the process
will have to be made.
However it needs to be emphasised that Figure 8 is a spectrum of deliberately
poor quality data, and in practice there are bound to be difficulties in cases
as bad as this.
Nevertheless it illustrates the method well.
All of the discussion so far has considered
only the steady state part of the waveform of interest, that is, the part of the
waveform after the attack transient has died away and before the release
The analysis of the transients themselves is much more difficult and, among
other matters, it requires close attention to be paid to the capabilities and
limitations of spectrum analysis.
This is why the essentials of spectrum analysis were considered in so much
detail previously, and further discussion is now necessary.
During the transient phases of a waveform
the frequency structure often changes rapidly.
Particular frequencies (we cannot necessarily call them harmonics because they
may not have an exact harmonic relationship during the transient) might grow and decay in an
unpredictable manner as the pipe settles down to stable speech. If not
harmonically related to start with, these frequencies will then become so as the
steady state approaches.
When the frequency structure is changing with time one has to apply spectrum
analysis carefully to ensure that important aspects of the structure are not
missed either in terms of time or frequency.
The most important point to keep in mind is
the “uncertainty principle of spectrum analysis”, namely that time and
frequency can never be measured simultaneously to a high degree of precision.
The situation can best be explained by first considering a tuned filter of the
sort often used in analogue circuits.
If the filter is tuned very sharply to a certain frequency (i.e. if it has a
high Q factor), its output amplitude will only change relatively slowly
regardless of how fast the input amplitude might change.
For example, if a sharp voltage step is applied to such a filter it will
“ring” for a certain number of cycles, the number being related to its Q.
Therefore by observing this extended output signal, it is impossible to be sure
of exactly when the input was applied or removed.
In other words, precise frequency measurement implies that time is imprecise.
The reverse also applies in that making a precise measurement of when
something happens means that the frequency content of the event can only be
Exactly the same applies to the case when
digital spectrum analysis rather than analogue filtering is used to measure
The physics, the fundamental mathematical relationships and therefore the
numbers are all identically the same as for the analogue case, despite anything
else which might be claimed.
Before a digital analysis can be carried out it is necessary to wait for a
certain time for the required number of data points to arrive.
If using a radix 2 FFT with a transform size of 512 points, that time would be
about 11.5 milliseconds using the common sampling rate of 44.1 kHz.
It is a feature of spectrum analysis that the frequency resolution, which is the
smallest frequency interval which can be measured, is the reciprocal of the data length in
In this case the resolution would therefore be 1/11.5 kHz approximately, or
about 85 Hz.
Frequency differences of less than this do not appear at all in the spectrum
because adjacent frequency slots are spaced by this value.
Moreover, a feature which might occur in the spectrum, such as a group of lines
near a particular frequency, could have occurred anywhere within the 11.5 msec
There is no way of telling exactly where, and therefore no way of telling
exactly when the event happened.
To make the time resolution better one has to use a shorter data window, but
then the frequency resolution gets worse.
Using a window of length 256 data points in this case would mean that the
frequency resolution changes to about 170 Hz.
Because of this inescapable straitjacket,
some of what one hears about the capabilities of particular organs to reproduce
transient structure is nonsense.
Let us take an example.
Typically an attack transient for a flue pipe lasts for a time equivalent to
about 10 cycles of the fundamental frequency.
There are wide variations – pipe organ voicers speak of flutes being
“quicker” than strings for example - but this is a working figure for the
purposes of this discussion.
Therefore a transient will last for about 40 msec for an 8 foot stop
sounding middle C.
Recently a certain digital organ was said to be able to reproduce a transient in
which “the 7th harmonic in the transient started off 0.5% flat”.
At middle C the frequency of the 7th
harmonic is 1834 Hz, therefore a 0.5% frequency difference would be about 9 Hz.
Because the entire transient only lasts for 40 msec, the maximum possible
frequency resolution within it is the reciprocal of this figure, namely about 25
This is much greater than the frequency difference of 9 Hz.
Consequently it would be impossible, even in theory, to measure such a small
frequency deviation within the duration of such a transient, let alone to say
exactly where it occurred. Moreover, attempting to simulate such events is
pointless for another reason, because the ear itself uses spectrum analysis and
it could not therefore respond to such incompatible frequency and time
parameters even in theory.
Nevertheless, to be fair it might be the
case that a particular digital organ system could attempt to simulate such an
For example one can envisage data tables for each of the chosen frequencies in a
transient in which the ADSR envelope for each can be specified, together with
the way in which the frequency varies in time.
But in this case the basis on which these data were derived would have to be
suspect; it would be false to claim they were obtained from measurements of the
behaviour of real organ pipes if the numbers were similar to those above.
It is legitimate, therefore, to question whether the capability claimed of such
an organ system is useful or relevant.
Vague phrases used in marketing such as “modelled synthesis captures the true
dynamic of the pipe” could relate to physically meaningless situations if the
data defining the synthesis were not chosen carefully.
Provided we do not ask the analysis to do more than the mathematics allows, it is possible to observe within these limits how the sound in a transient varies both with time and frequency.
Figure 9. Time-frequency spectrogram of a real transient
9 is a high resolution spectrogram, an attempt to represent the three dimensions
of amplitude, frequency and time on the two dimensional page.
The sound was from a flue pipe as it came onto speech, and the amplitude axis
was plotted using linear units, rather than logarithmic ones such as decibels.
This was done here merely so that the display appeared cleaner and less
cluttered for the purposes of discussion.
features are of interest.
Firstly the low level noise due to the blower can be observed rumbling away
close to zero frequency.
Secondly, several harmonics are visible though many more would have been
observed if the vertical axis had used logarithmic units.
Thirdly, the singular behaviour of the fundamental can be seen in which it rises
rapidly to a peak, dies away and then climbs more slowly to reach its steady
state value. Such features must not always be assumed to represent
real transient effects, though. If the sounds are recorded in a
reverberant environment such as large church, it is common to find particular
harmonics which behave in such a manner. When the pipe first starts
emitting sound there will usually be a short delay before reflected sound of
comparable intensity arrives at the microphone. When this happens the
amplitude at certain frequencies will either grow or decay abruptly, depending
on the relative phase relationships, and it is quite possible it is this
phenomenon which we observe here. Therefore, during the transient phase of
pipe sounds, we are hearing not only the way that the pipe is settling down to
stable speech but the way the standing waves and room modes in the building are stabilising
also. The important point to note here is that the effect will be
significantly different for different microphone positions, because at different
positions the standing wave effects will also change. This is why much of what one hears about transient structure and the necessity of
copying it faithfully verges on the absurd. Phrases such as "the
second harmonic of a Principal pipe comes onto speech first" are often so
much bluster with no basis in reality. This might be the case sometimes,
but certainly not always. It is necessary not only to establish what the
features of the transient are, but to understand why they are as they are
before they can be intelligently simulated in a digital organ. For what it
is worth, a key characteristic of the pipe used for the data in Figure 9 turned
out to be that the third harmonic grew in quite a leisurely manner compared to
the first and second, and this was an important ingredient in getting a
realistic simulation of its starting transient.
Such features must not always be assumed to represent real transient effects, though. If the sounds are recorded in a reverberant environment such as large church, it is common to find particular harmonics which behave in such a manner. When the pipe first starts emitting sound there will usually be a short delay before reflected sound of comparable intensity arrives at the microphone. When this happens the amplitude at certain frequencies will either grow or decay abruptly, depending on the relative phase relationships, and it is quite possible it is this phenomenon which we observe here. Therefore, during the transient phase of pipe sounds, we are hearing not only the way that the pipe is settling down to stable speech but the way the standing waves and room modes in the building are stabilising also. The important point to note here is that the effect will be significantly different for different microphone positions, because at different positions the standing wave effects will also change. This is why much of what one hears about transient structure and the necessity of copying it faithfully verges on the absurd. Phrases such as "the second harmonic of a Principal pipe comes onto speech first" are often so much bluster with no basis in reality. This might be the case sometimes, but certainly not always. It is necessary not only to establish what the features of the transient are, but to understand why they are as they are before they can be intelligently simulated in a digital organ. For what it is worth, a key characteristic of the pipe used for the data in Figure 9 turned out to be that the third harmonic grew in quite a leisurely manner compared to the first and second, and this was an important ingredient in getting a realistic simulation of its starting transient.
Assuming that we decide the effects depicted in the diagram are meaningful, one way to simulate this transient in a frequency domain organ would merely be to examine the 35 separate spectra more closely, and read off the amplitudes of the significant harmonics for each one. Depending on how the organ was programmed to receive its input data, one could then construct a separate ADSR envelope for each harmonic or simply copy the measured amplitude values into the appropriate data tables. The same data could also be used in an off-line multiple cycle additive synthesis program to reconstruct the actual waveform less the blower noise, and this could then be used as a noise free transient sample in a time domain organ. This would be an example of “hybrid” voicing as defined earlier, and it results in an extremely realistic simulation of the recorded transient.
When analysing a new transient it is useful
to begin with a spectrogram of the type shown above, having first ensured that
the spectrum analysis parameters are sensible as already described.
The pictorial representation thus obtained is better than many pages full of
tables and statistics – a picture is worth a thousand words.
In the case
shown, the data blocks for the successive spectrum analyses were overlapped in
time by half a block length to ensure a smooth time progression.
The amount of data required to define a transient can be reduced by using fewer spectra. The complete spectrogram will often indicate where significant changes in the frequency structure occur, and only spectra from these epochs need be used in many cases. One would generally need to use one spectrum at the start of the transient and one at the end also. This truncated set of data can be referred to as a set of “waypoint spectra”, and it is possible to use a computer program which “morphs” or interpolates each frequency component in the spectra between the waypoints to generate a close frequency domain approximation to the complete transient. The set of waypoint spectra can be used in a multiple cycle additive synthesiser to generate a synthetic transient waveform. This can then be used directly in a time domain organ, or maybe a frequency domain organ can be instructed to do its own morphing when presented with the waypoint data only.
We have seen that the sound emitted by organ pipes comprises two quite different regimes. During the steady state phase, the statistics describing the waveform are almost stationary and the waveform itself is essentially deterministic, meaning that spectrum analysis is a good technique for establishing its frequency structure. During the transient phases at the start and end of the waveform the reverse applies - the waveform is highly non-stationary in a statistical sense and therefore spectrum analysis is not a good tool with which to establish its time versus frequency characteristics. Because frequency domain digital organs rely on frequency domain (spectral) data, it follows that they will have greater difficulty in simulating transients than their time domain cousins.
An alternative means for gaining
insight into how the waveforms evolve during the transient phases is to use
wavelet processing, a technique which is well matched to analysing
non-stationary time series. Broadly, the difference between conventional
spectrum analysis and wavelet analysis can be understood as follows. In
spectrum analysis the waveform is effectively multiplied by a series of sine
waves at a range of closely spaced frequencies. The power of the signal
resulting from each multiplication then represents the power at that particular
frequency existing in the waveform. Because each prototype sine wave is
continuous and of a duration equal to that of the waveform segment selected for
processing, the method is not good at identifying variations in energy which
occur over timescales smaller than that of the selected segment. Such
variations tend to get smoothed out, as we have seen. Wavelet processing,
on the other hand, uses a set of much shorter prototype pulse-type waveforms
rather than sine waves made up of many identical cycles, and moreover they are of a wide range of different
shapes. In effect each wavelet is slid along the acoustic waveform from
the organ pipe to reveal how the signal power corresponding to that wavelet
varies with time. The results of a wavelet analysis can be represented as
a three dimensional plot of the type shown in Figure 9, except that the
limitations of ordinary spectrum analysis are not so much in evidence.
Time resolution can be very high, for example.
An alternative means for gaining insight into how the waveforms evolve during the transient phases is to use wavelet processing, a technique which is well matched to analysing non-stationary time series. Broadly, the difference between conventional spectrum analysis and wavelet analysis can be understood as follows. In spectrum analysis the waveform is effectively multiplied by a series of sine waves at a range of closely spaced frequencies. The power of the signal resulting from each multiplication then represents the power at that particular frequency existing in the waveform. Because each prototype sine wave is continuous and of a duration equal to that of the waveform segment selected for processing, the method is not good at identifying variations in energy which occur over timescales smaller than that of the selected segment. Such variations tend to get smoothed out, as we have seen. Wavelet processing, on the other hand, uses a set of much shorter prototype pulse-type waveforms rather than sine waves made up of many identical cycles, and moreover they are of a wide range of different shapes. In effect each wavelet is slid along the acoustic waveform from the organ pipe to reveal how the signal power corresponding to that wavelet varies with time. The results of a wavelet analysis can be represented as a three dimensional plot of the type shown in Figure 9, except that the limitations of ordinary spectrum analysis are not so much in evidence. Time resolution can be very high, for example.
Wavelet processing can reveal a lot about the fine structure making up transient waveforms. Unfortunately much of it is academic when related to digital organs for at least two reasons. Firstly, at the end of the day a frequency domain organ will still need to receive its input data in the form of tables describing how the power or amplitude of the signal varies with time at a number of frequencies. Certain representations of the data in "wavelet space" can augment but not entirely supplant these data, which conventional spectrum analysis produces. Secondly, we have noted already that we only perceive the aural world after a spectrum analysis has been carried out by our ears. Our brains operate largely on the amplitude versus time envelopes (the ADSR characteristics if you will) of the outputs from the enormous number of active bandpass filters in the cochlea. Therefore there are limits beyond which it is futile to proceed when trying to simulate transient waveforms - the limitations of spectrum analysis apply just as much to our ears as to any other application which uses the technique.
Having described the main processes used to study the acoustic behaviour of organ pipes, it is useful to summarise the software tools which are necessary to facilitate the studies. Some, perhaps all, can be obtained commercially but their utility tends to be swamped by a mass of options and functionality which is never used. The overhead cost of purchasing the components of the system and assembling them so they all work together would also be prohibitive for many individuals or organisations. This is partly because not all commercial software, particularly spectrum analysis programs, works properly. I discovered (the hard way) that a well known and expensive editor gave the wrong answers for the harmonic amplitudes as you moved the cursor over them in the spectrum display window! Because of such problems I have developed an interactive suite of customised analysis and voicing software which has been in use and continuously upgraded over nearly 20 years. Currently it runs under Windows XP so that maximum use can be made of the labour-saving graphical interface that it offers.
The main elements of the software suite, constituting a minimum set of tools, are:
Fig 10. Screenshot illustrating automatic harmonic detection
Figure 11. External hardware synthesiser for voicing
The connection to the computer port is at the rear
Fig 12. Screenshot illustrating the interactive tonal design program
The software is used in conjunction with a two manual and pedal voicing console to enable stops to be properly integrated into a tonal scheme from the point of view of a performer. The keyboards interact directly with the voicing software on a computer. The console is built in skeletal form so it can easily be dismantled and transported to alternative sites; this enables, for example, tonal recipes to be tested in a building before an organ has been installed or purchased. It also enables captured pipe sounds to be re-synthesised and auditioned in the same building to confirm their authenticity. The necessary computer equipment and other hardware, including the dedicated audio synthesiser referred to above, is contained in a small free-standing mobile rack assembly. The voicing console is illustrated below (Figure 13).
Fig 13. The interactive voicing console
Appendix 1 – Digitising Pipe Organ Waveforms
Contents: Analogue-to-digital conversion; sampled signals; aliasing and the Nyquist sampling rate; dynamic range and quantisation noise
Analogue-to digital conversion
Computers, and therefore digital electronic
organs, can only operate with numbers expressed in binary notation.
Therefore to enable them to process the sounds of organ pipes which actually
consist of continuous (“analogue”) air pressure variations, the sounds have
to be converted to strings of binary numbers.
This process is called digitising the waveform of an organ pipe.
It is done by a device called an analogue-to-digital converter (ADC) connected
to a microphone or to a recording made from a microphone.
Figure 1-1 shows a waveform which is a pure tone, a sine wave, though any waveform can be used. The dots indicate the instantaneous values or voltage of the waveform at 20 equally-spaced intervals along the time axis. Thus the dots have values which vary between –1 and +1. If we were to give a computer these 20 values it would be perfectly happy and, with the aid of a suitable program, it could do anything we wished with the sine wave. Therefore the process of converting an analogue signal into a digital one is conceptually simple – it is only necessary to sample it regularly and convert the sampled voltages into binary numbers. This is what the ADC does.
Aliasing and the Nyquist sampling rate
Two factors are important for present
Firstly the rate at which the samples are taken must be at least twice the
frequency of the highest frequency in the signal.
This is called the Nyquist Rate to immortalise the communications
theorist who first realised this. Thus if the waveform is of a Diapason pipe at
middle C, its fundamental frequency will be 262 Hz.
Typically it will have about 15 harmonics, so the highest frequency in this case
will be 15 x
262 = 3930 Hz.
Thus it must be sampled at twice this frequency, 7860 Hz.
In practice a considerable margin is desirable, and the much higher industry
standard sampling rate of 44.1 kHz might well be used.
This would certainly be the case if the pipe was recorded on a Minidisc, for
If the sampling rate is not high enough, spurious tones called aliases
will be heard when the recording is replayed, in the form of peculiar whines or
The situation is simple to visualise by
recalling that the wagon wheels in Western movies sometimes seem to rotate
slowly or even go backwards – this is merely because the frame rate of the
cinema or TV system is not fast enough to capture the much higher rate of spoke
movement on a rapidly rotating wheel.
What we see on the screen is therefore a spurious, aliased, frequency.
(The backwards rotation also illustrates the reality of negative frequencies,
which the Complex Fourier Transform can detect.
Fourier analysis is considered in detail elsewhere in the article, but the
difference between positive and negative frequencies is not discussed further to
keep the necessary detail to a minimum).
Dynamic range and quantisation noise
The second important factor concerns the number of
bits available in each binary number to represent the values of the digitised
If there are not enough, the waveforms will sound noisy when replayed or when
the digital organ emits them from its loudspeakers.
The peculiar type of noise encountered is
called quantisation noise.
For this reason current digital sound systems such as CD and Minidisc players
use 16 bits for each number.
This gives noise free reproduction for subjective purposes.
With 16 bits there are over 65000 separate steps available, giving a signal to
noise ratio of 96 dB (6 dB per bit).
Compare this to the 60 dB or so which was all that the old analogue magnetic
tape systems offered (on a good day!).
However, although sometimes noisy in other ways, analogue systems do not suffer from
Appendix 2 – Additive Synthesis
Contents: What is additive synthesis; the Inverse Fourier Transform; the myth of Real Time
Synthesis; Phase optimisation; the myth of Anharmonicity; Multiple Cycle
Additive synthesis is the process of adding
a number of harmonics together to produce a composite tone.
Because our subjective perception of tone colour is strongly influenced by the
relative proportions of the harmonics, it is possible to derive a huge variety
of tones simply by adjusting the numbers of harmonics used and their amplitudes.
Extremely accurate simulation of organ tones can be achieved in this way.
Because the process is the reverse of spectrum analysis, which uses the Fourier
Transform to reveal which harmonics are present in a periodic signal, additive
synthesis is often called the Inverse Fourier Transform or IFT.
The process is illustrated graphically in Figure 2-1.
Here we have only two harmonics, the fundamental (blue) and the second harmonic (red). These are pure tones or sine waves, and their amplitudes are different, that of the fundamental being greater than that of the second harmonic. One can conceive of the sine waves as voltages from electrical oscillators set at frequencies exactly an octave apart. Adding these voltages together then produces a summed waveform which is shown by the yellow curve. It is no longer a sine wave, the kink in the curve produced by the low level second harmonic being obvious. Exactly this process was used in some analogue electronic organs such as the Compton Electrone and the Hammond organ with its harmonic drawbars. In a digital organ the waves are all digitised (see Appendix 1) so they appear as a string of numbers. The computer adds these together to produce the summed waveform. Unless there are only a few numbers, it is usually faster to do the additive synthesis by computing the Inverse Fourier Transform directly with an efficient FFT algorithm, as mentioned in the main text.
Real Time Synthesis – often a myth
Although the process is
conceptually simple, making it work efficiently involves further complication.
In digital organs it is often impossible to get the addition to work in real
time if there are many harmonics – the computer simply cannot work fast
Thus the synthesis itself is not actually done in real time at all in this case,
making the use of the term “real time” an inappropriate and misleading
adjective for organs which use it.
One way the problem is overcome for the Bradford computing organ is described elsewhere in this article under the
heading of frequency domain synthesis.
Another way is to compute the IFT in the manner described in the previous
Another way is to compute the IFT in the manner described in the previous paragraph.
Another problem concerns the shape of the summed waveform. Figure 2-2 shows the waveshape when 5 harmonics are summed, the feature of interest being the large peak which develops at the start of each cycle. As more and more harmonics are used, this peak gets larger and sharper. Such a waveform has most of its energy concentrated into the initial spike, with the rest consisting of low energy ripples. In both digital and analogue systems this is an inconvenient waveshape to handle because it can cause either type of system to run out of “headroom”. Moreover, because of the inefficient power distribution with time across the waveform, it represents a waveform of relatively low energy for the peak amplitude it exhibits. This means the signal to noise ratio of the system handling such a waveform is less than optimum.
The peak arises because each harmonic in Figure 2-1 and 2-2 had a starting phase of zero. In other words, each harmonic starts at zero volts and each then begins to rise in a positive direction. The peak can be suppressed by assigning different starting phases to each harmonic, and this leads to a waveform in which the power is more evenly distributed with time. Such a waveform is shown in Figure 2-3 for the same set of 5 harmonics used before. Note that the peak amplitude of the optimised waveform has been reduced by about one third. When many harmonics are involved the reduction can be dramatic. Even though the amplitude has been reduced a major benefit of this type of waveform is that it sounds louder if all other factors remain the same. This is because of the more uniform distribution of power with time. Thus the signal to noise ratio of the organ system is also increased. However the tone colour is not altered because our ears are insensitive to phase.
The process which determines the
starting phase of each harmonic is part of the computer program in the organ
Unfortunately it seems to be little used, one reason no doubt being the
additional time overhead involved in computing the phases.
The optimum phases depend intimately on the waveshape in question; there is no
single solution to the problem.
Anharmonicity – another myth
One commonly hears that
non-harmonic frequencies are present in real organ tones, and therefore that
these have to be used when synthesising them using additive synthesis.
This assertion, which is complete nonsense in the way it is usually posed, is
based on a number of misunderstandings and it arises because people confuse the
forced and natural vibration frequencies of organ pipes.
The subject is discussed in detail in .
In steady state sound emission we only perceive a pipe as having a definite
pitch because the waveform is precisely periodic. If it was not the notion of
pitch would be vague, as it is for bells and chimes etc, whose overtones are not
In a periodic waveform the harmonics are exact integer multiples of the
fundamental frequency, therefore if the frequency of one of them varies, the
others must also vary to retain periodicity.
The pitch of the pipe will then change also.
However, it is possible for the various
frequencies making up a rapidly changing waveform not to be precisely
harmonically related until the pipe settles down to stable speech.
Such waveforms occur during the attack and release transients of pipes.
But the question which then must be asked is whether the structure is
sufficiently non-harmonic for it to be detectable using a spectrum analysis.
This also is an area where there is much misunderstanding and woolly thinking,
and it is discussed extensively in this article in the sections dealing with
spectrum and transient analysis.
If it is impossible to detect such structure using spectrum analysis methods,
then our ears will not detect it either because we can only perceive the aural
world after a spectrum analysis has been performed.
Multiple Cycle Synthesis
If a single set of harmonics is
used in additive synthesis, all cycles of the waveform produced will be
identically the same.
Therefore it is only necessary to actually synthesise a single cycle, which can
then be repeated ad infinitum by looping round it indefinitely.
It is sometimes necessary to use a series of spectra, thus several sets of harmonic amplitudes, as the starting point of the additive synthesis, each spectrum generating a different cycle of the waveform. When the successive cycles are put together a waveform will result which varies with time. This technique is used to synthesise an attack or release transient for a pipe tone, or to relieve the monotony of the otherwise identical cycles during the steady state sound.
NOTES AND REFERENCES
“Multiplexing System for Selection of Notes and Voices in an Electronic
Musical Instrument”, US Patent 3610799, 5 October 1971.
“Digital Generator for Musical Notes”, UK Patent 1580690, 3 December 1980.
In 1999 Wyvern Organs defended their use of the Z80 microprocessor, which first
appeared over two decades ago, in a candid and useful description of their organ
system (see “Electronic Technology”, C Peacock, Organists’ Review,
August 1999, p. 275).
Some systems assign each sample to a single key rather than a keygroup and
interpolation (a blending technique) between samples is then used depending
which key is pressed.
This produces a smoother variation in tone quality across the compass.
In fact it is not necessary to store any sine wave information at all in
principle, because if the computer is fast enough the individual numbers
representing the sine wave could be computed as needed on the fly (i.e. in real
6. “The Tonal Structure of Organ Flute Stops”, C E Pykett 2003, currently on this website. (read)
7. “MIDI for Organists”, C E Pykett 2001, currently on this website. (read)
“Novel System of Organ Building by Mr J T Austin, Jun, of Detroit, USA”, in Organs
and Tuning, T Elliston, Weekes and Co, 1898.
“A Proper Organ has Pipes”,
J Brennan, The Organbuilder, Vol 17, November 1999
10. “The Measurement of Power Spectra”, R B Blackman and J W Tukey (Dover 1958)
11. “How the Flue Pipe Speaks”, C E Pykett 2000, currently on this website. (read)
12. "Fundamentals of Musical Acoustics", A H Benade, Dover, New York, 1990. ISBN 0 486 26484 X
13. Simulating every note separately in the Bradford system was described by its inventors as "resource hungry and therefore expensive" in a recent article. This makes it at once inferior to virtually any modern time domain sampler in this important respect. See "Music's measure: using digital synthesis to create instrument tone", Organists' Review, May 2007, p. 35, Peter and Lucy Comerford. The words quoted are on p. 38 of the article.
14. "Physical Modelling in Digital Organs", C E Pykett 2009, currently on this website (read).