SCUMM v5 sound — SOUN resources and sound-gated waits
What MI1's SOUN resources contain, and the one place sound timing leaks
into game logic: cutscenes and transitions pace themselves by busy-waiting
on sound completion, so the interpreter has to know how long each sound
plays. This note is the data and that behavior; how GrogVM times sounds
to satisfy it is engine/audio.md.
1. Sound-gated waits
The canonical pacing idiom (g#57, g#107, g#108, g#122, g#131, g#143, room-1 ENCD, …):
startSound N
breakHere
isSoundRunning g0 = sound N
equalZero g0 -> -10 ; g0 != 0 (still running) → loop back to breakHere
<loadRoom / startSound next / putActorInRoom / …>
equalZero jumps when the value is non-zero, so this is "yield each
game-frame, re-poll, fall through when the sound ends." For the transition
to be paced at all, the interpreter must report isSoundRunning truthfully
for the sound's real length — which means knowing that length.
The two distinct loop shapes:
equalZero -> -10(jump back tobreakHere) — a real wait gate: hold until the sound ends. These are what must hold.equalZero -> 2(jump forward over thestartSound) — "start it unless it's already running." Not a wait.
VAR_MUSIC_TIMER (14) is a separate clock — auto-incremented per jiffy,
polled by the credits cutscene — unrelated to isSoundRunning. There is no
wait-for-sound opcode; all sound waits are isSoundRunning polls.
2. The SOUN resource formats
SOUN blocks live in MONKEY.001, indexed by the DSOU lane
({room, offset} per sound id); resolve one exactly like a global script.
The top-level SOUN block uses the inclusive size convention (header
included); everything nested below it is payload-only (exclusive).
MI1 (CD-DOS-VGA) has 105 SOUN blocks in three timing-relevant shapes.
SOU container → device renditions
A SOU block holds one or more renditions of the same sound for
different hardware, the first listed being the primary one:
SBL— digitized PCM. ItsAUdtchunk wraps a Creative Voice (VOC) block-1: a type byte (0x01), a 24-bit LE length, a time-constant, a codec byte (0x00= 8-bit unsigned PCM), then the samples. The sample rate is encoded in the time-constant —rate = 1e6 / (256 - tc)— so it is per-sound, not fixed. (MI1'sAUhdpayload is a constant00 00 80of unclear meaning; MI1's SBL sounds carrytc = 110→ ~6849 Hz. Playback length =sampleBytes / rate.)ROL/ADL/SPK— standard MIDI (Roland MT-32 / AdLib / PC-speaker).MThdgives ticks-per-quarter (division, 480 in MI1); the length is eachMTrk's delta-times summed against the tempo map (FF 51 03Set-Tempo events; default 500000 µs/quarter).
24-byte CD-audio trigger
Sound ids 100–129 are not SOU containers but 24-byte commands
(0x18 …) that trigger a redbook CD track:
- byte 16 = CD track number (1–24).
- byte 17 = loop flag:
0x01one-shot,0xfflooping. - bytes 18–20 = start position within the track, binary MSF (minutes, seconds, frames at 75/s) — a trigger can cue mid-track. A one-shot's playback length is then the track's remainder after the cue, not the whole track.
Fine print (MI1): one trigger uses the cue — #108 = track 17 from
01 23 30 (1 m 35 s 48 f ≈ 95.6 s, see §4). Bytes 8/9 look like a volume
pair (0xc8 c8 everywhere except 0xff ff on the two track-17 triggers);
bytes 21–23 are always zero (presumably an end position, unused — playback
runs to the track's end).
The track audio is not in MONKEY.001 — it ships as separate
TrackN.* files (the IT CD-DOS-VGA rip uses FLAC TrackN.fla, the EN rip
MP3 TrackN.mp3; the original pressing had true redbook CD sessions). A
trigger's playback length is therefore that track file's length. The IT and
EN encodes agree to within a couple of frames (same music).
MI2 has no external track files — its sounds are all SOU containers
(SBL/MIDI in MONKEY2.001), so the CD-trigger shape doesn't appear.
3. Which sounds gate
Every MI1 wait-gated sound is one of: an SBL effect (#28, ~2.7 s), a MIDI
piece (#50, ~4.8 s), or a one-shot CD track (#104–107 = track 6, the
~12.5 s "Il Viaggio" voyage theme; #117 = track 7). No wait-gate ever polls
a looping sound, so none can hang. Looping CD music (byte 17 = 0xff)
plays until explicitly stopped and never gates a wait.
4. A worked example — the title → lookout music handoff
The opening of MI1 shows how the pieces compose. CD track 17 holds two musical segments back to back: the title theme, then the lookout piece from ≈ 95.6 s in. Two different triggers play the two halves:
- The credits/title cutscene (global #152, room 10) starts #110 —
track 17 from the top, one-shot — and stops it on both exits: the
natural end waits on
VAR_MUSIC_TIMER > 5700(≈ 95 s, the length of the title segment), and the ESC override path runs the samestopSound 110; endCutScene. The theme never survives the title. - Boot then proceeds to the lookout (room 38). Its ENCD starts the
room's AdLib bed #98 only when the boot script is not running
(
getScriptRunning(1)gate) — i.e. on later revisits (room 37 starts #98 unconditionally). On the boot path the lookout cutscene (room-local #203, via local #200) instead starts #108 — track 17 again, cued at01 23 30: the lookout segment.
So the 5700-jiffy gate and the #108 cue point to the same seam inside one CD track; the music "changes" at the lookout because playback jumps to the track's second half.
5. Beyond timing — the audio payload
Beyond the timing above, a SOUN also fully describes its audio — the
SBL samples, the MIDI note stream for each device, the iMUSE control data
(soundKludge / VAR_SOUNDRESULT; 0 MI1 uses, loud-halts). Synthesizing
it is output-backend territory — see
engine/audio.md §4.