7.1 Format of the SOUND resource
by Lance Ewing
<be@ihug.co.nz>
Last updated: 18 August 1997
NOTE: The original version of this document did not cover
every aspect of the sound format. It made no mention that the
volume control and noise voice were also part of AGI's sound
format. It turns out that the data contained in a sound resource
is so much like the data sent to the PCjr's T1 chip that I have
included a lot of Peter Nortons T1 sound chip section from the
"Programmers Guide to the IBM PC".
INTRODUCTION
Most people who think of AGI games remember that they played
their music and sounds over the PC speaker. What they may not
know is that all sounds are composed of four parts, one which is
the melody, two which are accompaniment, and the final one being
noise. The IBM PC can only play one note at a time so all AGI
games for the PC play the melody by itself. The other three parts
are still included in the data though because some PC comptibles,
including the IBM PCjr, have more than one sound generator.
HISTORY
According to Donald B. Trivette author of 'The Official Book
of King's Quest', a year before the IBM PCjr was announced IBM
asked Sierra to create a game that would show off the new
computers color graphics capabilities. IBM supplied the company
with a prototype Junior, and Roberta set to work designing a new
type of adventure game. The game produced was called King's
Quest. This is important because the IBM PCjr had a different
method of sound generation than the IBM compatibles of today. The
sound data was stored to make it easy to send to the Juniors
sound generators. This format appears to have remained right
through the AGI games up until 1989-90 when SCI took over even
though the PCjr had long since been surpassed by the 286, and
386.
SOUND AND THE IBM PCjr
The best known source of sound in the Junior is the TI
SN76496A sound generator chip. This source has four separate
sound voices. Three of these are tone generators and the fourth
is a noise source. All four voices have an independent volume
control, providing an evenly graduated set of 15 volume levels,
plus a zero volume (off). Each of the three pure voices has an
independently selected frequency. The noise voice has three
preselected frequencies and a fourth option, which borrows the
frequency of the third pure voice. The data stored in the AGI
games is designed to be sent to these four voices.
THE TONE GENERATIONS
A tone is produced on a voice by passing the sound chip a
3-bit register address and then a 10-bit frequency divisor. The
register address specifies which voice the tone will be produced
on. This is done through port 192 on the IBM PCjr by sending it 2
bytes in the following format:
First Byte
7 6 5 4 3 2 1 0
1 . . . . . . . Identifies first byte (command byte)
. R0 R1 R2 . . . . Register number in T1 chip (0, 2, 4).
. . . . F6 F7 F8 F9 4 of 10-bits in frequency count.
Second Byte
7 6 5 4 3 2 1 0
0 . . . . . . . Identifies second byte (completing byte)
. X . . . . . . Unused, ignored.
. . F0 F1 F2 F3 F4 F5 6 of 10-bits in frequency count.
Register Addresses:
R0 R1 R2
0 0 0 Holds voice 1 frequency number.
0 1 0 Holds voice 2 frequency number.
1 0 0 Holds voice 3 frequency number.
The actual frequency produced is the 10-bit frequency divisor
given by F0 to F9 divided into 1/32 of the system clock frequency
(3.579 MHz) which turns out to be 111,860 Hz. Keeping all this in
mind, the following is the formula for calculating the frequency:
F = 111860 / (((Byte-2 AND 0x3F) * 16) + (Byte-1 MOD 16));
Note: The order of the bytes are reversed for AGI sound data.
ATTENUATION
Each voice in the T1 sound chip has an independent sound-level
control, which is calculated in terms of decibels of attenuation,
or softening. There are four bits uses to control the volume.
These bits, labeled A0 through A3, can be set independently or
added together to produce sixteen volume levels as shown below.
A0 A1 A2 A3 Value Attenuation (decibels)
. . . 1 1 2
. . 1 . 2 4
. 1 . . 4 8
1 . . . 8 16
1 1 1 1 Volume off
When a bit is set on, the sound is attenuated (reduced) by a
specific amount: either 2, 4, 8, or 16 decibels. When all four
bits are set on, the sound is turned completely off. When all
four bits are off, the sound is at
its fullest volume.
The attenuation is set by sending a byte of the following
format to the T1 sound chip:
7 6 5 4 3 2 1 0
1 . . . . . . . Identifies first byte (command byte)
. R0 R1 R2 . . . . Register number in T1 chip (1, 3, 5, or 7).
. . . . A0 A1 A2 A3 4 attenuation bits
Register Addresses:
R0 R1 R2
0 0 1 Holds voice 1 attenuation.
0 1 1 Holds voice 2 attenuation.
1 0 1 Holds voice 3 attenuation.
1 1 1 Holds noise voice attenuation.
THE NOISE GENERATOR
There are two modes for the noise operation, besides the four
frequency selections. One, called periodic noise, produces a
steady sound; the other, called white noise, produces a hissing
sound. These two modes are controlled by a bit known as the FB
bit. When FB is 0, the periodic noise is generated; when FB is 1,
the white noise is produced.
Two bits, known as NF0 and NF1, control the frequency at which
the noise generator works. Three of the four possible
combinations of NF0 and NF1 set an independent noise frequency
based on the timer. The fourth combination borrows the frequency
from the third of the three pure voices made by the tone
generators.
NF0 NF1 Noise Frequency
0 0 1,193,180 / 512 = 2330
0 1 1,193,180 / 1024 = 1165
1 0 1,193,180 / 2048 = 583
The noise frequency is set by sending a byte of the following
format to the T1 sound chip:
7 6 5 4 3 2 1 0
1 . . . . . . . Identifies first byte (command byte)
. 1 1 0 . . . . Register number in T1 chip (6)
. . . . X . . . Unused, ignored; can be set to 0 or 1
. . . . . FB . . 1 for white noise, 0 for periodic
. . . . . . NF0 NF1 2 noise frequency control bits
AGI SOUND FILES
We now know enough about the PCjr's T1 sound chip to discuss
the AGI sound format. The sound is stored as four separate units
of data, one for each voice. Each sound file stored in the VOL
files has an 8-bit header which contains offsets into file. The
format is as follows:
| Byte |
Meaning |
| 0-1 |
Offset of first voice data. |
| 2-3 |
Offset of second voice data. |
| 4-5 |
Offset of third voice data. |
| 6-7 |
Offset of noise voice data. |
The data starting at each voice offset is stored as 5-byte
notes which give the frequency and duration of a note played on
that voice. The 5 bytes have the following meanings:
Byte
0-1 Duration (16-bit word)
2-3 Frequency divisor of the format described in the PCjr section
above except the two bytes are around the other way.
4 Attenuation of the note in the format described above in the
PCjr
section.
| Byte |
Meaning |
| 0-1 |
Duration (16-bit word) |
| 2-3 |
Frequency divisor of the format described in the PCjr
section above except the two bytes are around the other
way. |
| 4 |
Attenuation of the note in the format described above
in the PCjr section. |
Note that the last three bytes were around the other way in
version 1 of the AGI interpreter. The above order is opposite
from the order that would be output to the T1 sound chip.
Each voice's data section in the SOUND resource file is
usually terminated by two consecutive 0xFF codes. Another way of
checking for the end is to see if it has reached the start of the
next voice section, or in the case of the noise voise, the end of
the SOUND data.
PLAYING THE SOUNDS ON A SOUND CARD
Writing a program to play the tunes will require four pointers
which keep track of where in each voice segment the program
currently is since all four voices are played simultaneously. The
first voice is the melody and is the voice that is played on the
PC speaker in today's modern PC compatibles, the other two voices
being ignored. I'd imagine that other platforms such as the Amiga
and Macintosh would probably play all three voices.
A program would start by reading each of the four offsets in
the header. It would then go through a loop which begins by
reading the first note of each voice section. The duration's are
then monitored and when each note finishes, another note is read.
Note that the notes for each voice will usually finish at
different times. The program finishes when all of the voice
sections have been entirely played. This will usually occur for
each voice at the same time but not necessarily I don't think.
Then of course you could always convert the AGI SOUND to a
MIDI file and play that which will sound a hundred times better
:)
CALCULATING FREQUENCIES WHEN PLAYING NOTES ON A SOUND CARD
My program reads in the duration as a 16 bit word. It then
loads the two following bytes and calculates the frequency as
follows:
Freq. = 111860 / (((Byte-2 AND 0x3F) * 16) + (Byte-3 MOD 16));
The 111860 comes from the PCjr discussion above. Note that the
bytes are in the opposite order from that mentioned in the PCjr
information.
Remember also that the SOUND format includes volume
information for each voice. The exact conversion from the decible
values to the volume control on todays sound cards is uncertain
at this stage.
APPENDIX 1: SOUND FORMAT SUMMARY
The header consists of four two-byte offsets, one for each
voice. The low byte is first, followed by the high byte. Each
offset points to the note data for the relevant voice. The note
data for a voice consists entirely of five-byte note entries of
the following format:
FIRDT BYTE
SECOND BYTE
Note duration (low byte and then high byte).
THIRD BYTE
---> In the case of a tone voice,
7 6 5 4 3 2 1 0
0 . . . . . . . Always 0.
. X . . . . . . Unused, ignored.
. . F0 F1 F2 F3 F4 F5 6 of 10-bits in frequency count.
---> In the case of the noise voice, this byte is equal to zero.
FOURTH BYTE
---> In the case of a tone voice,
7 6 5 4 3 2 1 0
1 . . . . . . . Always 1.
. R0 R1 R2 . . . . Register number in T1 chip (0, 2, 4).
. . . . F6 F7 F8 F9 4 of 10-bits in frequency count.
F = frequency = 111860 / (((Byte-3 AND 0x3F) * 16) + (Byte-4 MOD 16))
R = register address
---> In the case of the noise voice,
7 6 5 4 3 2 1 0
1 . . . . . . . Always 1.
. 1 1 0 . . . . Register number in T1 chip (6)
. . . . X . . . Unused, ignored; can be set to 0 or 1
. . . . . FB . . 1 for white noise, 0 for periodic
. . . . . . NF0 NF1 2 noise frequency control bits
NF0 NF1 Noise Frequency
0 0 1,193,180 / 512 = 2330
0 1 1,193,180 / 1024 = 1165
1 0 1,193,180 / 2048 = 583
FIFTH BYTE
7 6 5 4 3 2 1 0
1 . . . . . . . Identifies first byte (command byte)
. R0 R1 R2 . . . . Register number in T1 chip (1, 3, 5, or 7).
. . . . A0 A1 A2 A3 4 attenuation bits
A0 A1 A2 A3 Value Attenuation (decibels)
. . . 1 1 2
. . 1 . 2 4
. 1 . . 4 8
1 . . . 8 16
1 1 1 1 Volume off
Register Addresses:
R0 R1 R2 Parameter
0 0 0 Voice 1 frequency control number (10 bits)
0 0 1 Voice 1 attenuation (4 bits)
0 1 0 Voice 2 frequency control number (10 bits)
0 1 1 Voice 2 attenuation (4 bits)
1 0 0 Voice 3 frequency control number (10 bits)
1 0 1 Voice 3 attenuation (4 bits)
1 1 0 Noise voice control (4 bits; 3 used)
1 1 1 Noise voice attenuation (4 bits)
The note data for one voice is terminated by two consecutive
0xFF values.
APPENDIX 2: AGI v1.12 SOUND FORMAT
The sound format used in version 1.12 of the AGI interpreter
was quite different from the format described above for AGIv2 and
AGIv3. It still uses the PCjr format for the note data but it
does not store the duration as a separate field. The best way to
describe it is by an example:
90 80 16 B0 A0 15 D0 C0 0E FF E4 00 80 17 A0 16 C0 11 00 80 16
B1 A0 14 C0 12 00 80 16 B2 A0 16 C0 13 00 ...
The first thing to point out is that the PCjr note data is in
the opposite order to AGIv2. Secondly, all four parts are
included together rather than in separate sections. Taking the
above example, lets look at the first note and show the
equivalent AGIv2 notation.
90 80 16 --> 03 00 16 80 90
Now, the duration isn't immediately obvious, but we will come
to that in a short while. The followint three bytes give the
first note for the second part, the third part, and the noise
part (at least as far as this example is concerned).
B0 A0 15 --> 03 00 15 A0 B0
D0 C0 0E --> 03 00 0E C0 D0
FF E4 00 --> 33 00 00 E4 FF
The data that follows after these initial four starting notes
is basically any changes in the note value which each 3 duration
step. For example,
80 17 --> 03 00 17 80 90
Note that 0x90 doesn't need to be stored because that byte has
retained its value. Every 0x00 byte that is encountered is the
end of one set of note changes. Each set of note changes is the
equivalent of a duration of 3 in the AGIv2 format. Continuing
with our example,
A0 16 --> 03 00 16 A0 B0
C0 11 --> 03 00 11 C0 D0
The example now encounters a 0x00 byte which means that the
noise voice isn't changed at this point. In fact, from the AGIv2
equivalent note above, you will see that the noise note will not
change until 49 (or 0x33) sets of note changes have been
processed.
80 16 --> 03 00 16 80 90
B1 A0 14 --> 03 00 14 A0 B1
C0 12 --> 03 00 12 C0 D0
How exactly the AGIv1.12 interpreter knows which voice is
having its notes changed, and which bytes of the note are being
changed, is not yet certain. On some occassion a sets of changes
will contain only one byte which corresponds to one of the bytes
which makes up one of the voices note value, but how it knows
which one is a mystery to me.
On other occassions, there could be a whole chain of 0x00
bytes which means that during that whole time, none of the voices
are changing their notes value.