8.2 Format of WORDS.TOK
by Lance Ewing
<be@ihug.co.nz>
Last updated: 31 August 1997
The WORDS.TOK file is used to store the games vocabulary, i.e.
the dictionary of words that the interpreter understands. These
words are stored along with a word number which is used by the
'said' test commands as argument values for that command. Many
words can have the same word number which basically means that
these words are synonyms for each other as far as the game is
concerned.
The file itself is both packed and encrypted. Words are stored
in alphabetic order which is required for the compression method
to work.
THE FIRST SECTION
At the start of the file is a section that is always 26x2
bytes long. This section contains a two byte entry for every
letter of the alphabet. It is essentially an index which gives
the starting location of the words beginning with the
corresponding letter.
| Byte |
Purpose |
| 0-1 |
Hi and then Lo byte for 'A' offset. |
| ..... |
|
| 50-51 |
Hi and then Lo byte for 'Z' offset. |
| 52- |
Words section. |
The important thing to note from the above is that the normal
Lo-Hi byte order convention used everywhere else in the AGI
system is not used here. For example, 0x00 and 0x24 means 0x0024,
not 0x2400. This method is used later on for word numbers as
well.
All offsets are taken from the beginning of the file. If no
words start with a particular letter, then the offset in that
field will be 0x0000.
THE WORDS SECTION
Words are stored in a compressed way in which each word will
use part of the previous word as a starting point for itself. For
example, "forearm" and "forest" both have the
prefix "fore". If "forest" comes immediately
after "forearm", then the data for "forest"
will specify that it will start with the first four characters of
the previous word. Whether this method is used for further
confusion for would be cheaters or whether it is to help in the
searching process, I don't yet know, but it most certainly isn't
purely for compression since the WORDS.TOK file is usally quite
small and no attempt is made to compress any of the larger files
(before AGI version 3 that is).
| Prefix |
Char.1 |
Char.2 |
...... |
Last Char |
WordNum Hi |
WordNum Lo |
Prefix - Number of characters to include from start of prevous
word.
Char.n - 0x7F xor Char.n gives the ASCII code for the character.
Last Char - 0x7F xor (Char.n & 0x7F) gives ASCII code. Top
bit is set to indicate end of word.
WordNum Hi - Hi byte of word number.
WordNum Lo - Lo byte of word number.
If a word does not use any part of the previous word, then the
prefix field is equal to zero. This will always be the case for
the first word starting with a new letter. There is nothing to
indicate where the words starting with one letter finish and the
next set starts, infact the words section is just one continuous
chain of words conforming to the above format. The index section
mentioned earlier is not needed to read the words in which
suggests that the whole WORDS.TOK format is organised to find
words quickly.
A NOTE ABOUT WORD NUMBERS
Some word numbers have special meaning. They are listed below:
| Word # |
Meaning |
| 0 |
Words are ignored (e.g. the, at). |
| 1 |
Anyword.
e.g.
if (said(take, anyword)) {
print("You can't - Blackbeard has chopped both
your arms off.");
} |
| 9999 |
ROL (Rest Of Line). It does matter what
the rest of the input list is. |