Architecture — Layers & Seams

GrogVM is a from-scratch TypeScript reimplementation of SCUMM v5 that runs natively in the browser, built to learn the engine by rebuilding it. It currently targets The Secret of Monkey Island (CD, VGA, 256-color). Monkey Island 2: LeChuck's Revenge (DOS) is the next planned target: it shares the same engine version, container layout, and graphics pipeline, but it is not currently supported.

This page is the map: the layers, what each one owns, and the seams between them. Subsystem behavior lives in the sibling docs, linked throughout.

At a glance

        ┌────────────┬───────────────────────────────────────────┐
        │  pages     │  every page is markdown; path = route     │
        ├────────────┼───────────────────────────────────────────┤
        │  site      │  static page shell — nav, theme           │
        ├────────────┼───────────────────────────────────────────┤
        │  app       │  islands: Library · Explorer · Player     │
        ├────────────┼───────────────────────────────────────────┤
        │  platform  │  browser adapters — files, IndexedDB,     │
        │            │  routing                                  │
        ├────────────┼───────────────────────────────────────────┤
        │  engine    │  headless core — no DOM, no browser       │
        │            │  globals; same code runs in Node          │
        │            │  seams: renderer · clock · audio backend  │
        └────────────┴───────────────────────────────────────────┘
          imports point only downward

          beside the stack: build — the owned markdown→HTML
          generator; offline tooling, nothing imports it

The app ships as static files with no application backend. The engine at the bottom is fully headless: everything it needs from the outside world (a renderer, a clock, an audio backend) is injected, which is what lets the same code run a real game in the browser and a scripted playthrough in Node.

1. Scope & non-goals

The non-goals shape the code as much as the goals do:

2. Principles

  1. Learning first. Code clarity beats performance; a 50-line decoder that reads clearly beats a 10-line one full of bit tricks. The engine doubles as an inspection tool, so debug and inspection surfaces are permanent features, not scaffolding.
  2. Verify behavior, not bookkeeping. A claim about the engine is trusted once the real game has been driven to the point that exercises it — render the pixels, reproduce the flow (see the test harness).
  3. Fail loud. An unimplemented opcode or a VM-level error does not get a silent stub: the VM halts into a first-class, inspectable halt state that carries the trace that led to it.
  4. Determinism by injection. Nothing inside the engine reads wall-clock time or global entropy: the clock and the random source are injected, so the same inputs replay the same game headlessly.
  5. No premature abstraction. Interfaces exist only where a second implementation is real. The handful that do exist — the renderer, the audio backend, the clock — are exactly what lets the engine run without a DOM.

3. The layer map

What each layer in the stack owns:

Fine print — the static build. The site is a multi-page static build: each page is a real HTML file, so refresh and deep links work and the content pages are crawler-indexable without a custom server or SPA fallback. Each page also publishes its markdown beside the HTML at <page url>.md (append .md to the path), with relative .md links rewritten to absolute so the markdown stands on its own. Per-page bundles mean the engine chunk loads only on the screens that run it. Client-only parameters — which installed game to open — ride the query string, since they are IndexedDB keys local to one browser profile and not meaningful to share.

4. The engine core

One picture before the parts — how a tick becomes pixels:

   game files (in memory, decrypted up front, block tree parsed once)
                        │
                        │  injected resolvers — the VM asks for
                        │  "costume 12", never sees bytes or offsets
                        ▼
  clock ──jiffy──▶ ┌───────────────────────────────────────────┐
  (injected)       │  VM — the mainline                        │
                   │  scripts · sentences · actor walking ·    │
                   │  costume animation · camera · audio clock │
                   └────────────────────┬──────────────────────┘
                                        │  compose (indexed pixels)
                                        ▼
                   ┌───────────────────────────────────────────┐
                   │  frame compositor → screen composer       │
                   │  room scene → camera slice → verb band +  │
                   │  dialog text → one 320×200 framebuffer    │
                   └────────────────────┬──────────────────────┘
                                        │  present (palette + indices)
                                        ▼
                            renderer seam (injected)
                     Canvas2D in the browser · an in-memory
                     recorder under the test harness

Resources — everything in memory

A game is two files: an index (directories of every resource) and a resource container, both XOR-encrypted byte-wise with 0x69 (see the index file). On open, both are read fully into memory and decrypted up front, and the container's block tree is parsed once — a few megabytes per game, which buys a simple synchronous engine with no streaming, no read-ahead, and no I/O during play.

Individual resources are decoded on demand through resolver functions injected into the VM at boot — the VM asks for "costume 12" and never knows about bytes, offsets, or encryption. Decoded costumes and sounds are cached for the life of the session; the games' resource sets are small enough that nothing is ever evicted. Rooms are loaded one at a time, fully decoded (background, z-planes, objects, walk boxes) on entry.

The VM — the engine's mainline

The VM is deliberately broader than a bytecode interpreter: like the original engine's mainline, its per-jiffy tick is the single canonical driver of everything that advances with time — script scheduling, sentence processing, actor walking, costume animation, camera pan and follow, and the audio clock. The browser session and the headless harness both drive this same tick, so the timing model lives in one place (see timing for the jiffy/frame split it implements).

Beyond the dispatch loop, the VM owns the world state the bytecode acts on: variables and script slots, actors, object state, the camera, dialog state, and box-flag overrides. Scripts schedule cooperatively — each runs until it yields — and a VM-level error freezes the machine into an inspectable halt snapshot with a trace ring of the last dispatched opcodes.

The design is intended to run both MI1 and MI2 on one engine. Code branches on the detected game identity at the small number of known difference points (see game identity); the codebase is never forked per game. Current support is MI1 only.

Graphics — indexed pixels end to end

Decoders (backgrounds, costumes, fonts, z-planes) produce palette indices, never colors. The palette travels separately, and RGBA exists only inside the renderer at present time. Nothing pre-multiplies the palette into pixels, which is what keeps palette effects cheap and renderers swappable.

The renderer seam

The engine talks to an injected renderer with a deliberately small contract: take a palette, a transparent index, a surface size, and an indexed framebuffer to present. Two implementations exist — a Canvas2D renderer in the browser, and an in-memory recorder that lets headless tests assert on real presented pixels.

Each produced frame is recomposed from scratch — room background, drawn objects, then actors sorted by depth with z-plane occlusion. No dirty-rect tracking: a 320×200-class indexed buffer is cheap enough that the simple approach wins.

Frame ownership — the engine emits the complete screen

The engine composes and presents the entire visible game image as one indexed framebuffer: the camera's window into the room (objects, actors, z-plane occlusion), the verb/inventory panel, the sentence line, and dialog / system text. Room-scene assembly lives in the frame compositor; the screen composer layers the verb band and text over the camera-sliced scene and is what actually crosses the renderer seam (see the session, "Frame production"). The whole pipeline stays pure indexed-pixel work — text glyphs and verb sprites stamp CLUT indices; RGBA conversion still happens only at present.

Because the presented framebuffer is the complete frame, a renderer implementation renders the whole game, and frame-level pixel tests can assert dialog and verb pixels through the in-memory recorder. The shell paints only non-game chrome over the blit: the cursor crosshair and the debug overlays. The engine also owns the verb hit-test — the session uses it for the hover highlight and the shell for click routing, so painted verbs and clickable verbs can never disagree.

Sound — a timing seam first

Sound sits behind an injected audio backend. Its first job is not output but time: scripts pace cutscenes by polling whether a sound is still playing, so the backend tracks real durations on the engine's own tick clock — deterministic and savestate-safe (see audio timing).

The session — one object runs the game

The shell holds a single engine session that wires VM, compositor, renderer, and an injected clock into a small control surface — play, pause, single-step, input, and snapshot/restore. Its loop throttles and batches ticks at a fixed 60 Hz, auto-pauses an idle game, and never touches the DOM (see the session).

Save states

A save is a versioned snapshot of the live VM — variables, script slots with their bytecode, actors, objects, audio timers. Each live slot's bytecode is stored verbatim because its program counter points mid-stream: re-deriving the source on restore (global script, entry script, local, verb, sentence) is fragile, and the bytecode itself is small. Restore boots a fresh VM and applies the snapshot. The format is GrogVM's own (original SCUMM saves are a non-goal) and moves forward without compatibility shims: a format change bumps the version rather than accreting fallbacks.

The disassembler — the static companion

Alongside the executing dispatcher lives a read-only disassembler used for reverse-engineering and the script inspectors. Both are consumers of a single opcode registry: each opcode family is defined once — its opcode bytes, its operand layout (written against an operand-reader interface with a live implementation that dereferences variables and a static one that labels them), its execution, and its disassembly text. An operand encoding therefore exists in exactly one place and the two consumers cannot drift (see opcode dispatch). The one exception is expression, whose stream interleaves nested opcode execution with decoding: its live side stays a streaming evaluator and only the subop shapes are shared. A corpus integration test disassembles every script in the installed game and holds the line at zero misalignments.

5. The player & the explorer

The Player hosts one session and shows two faces of the same running tick: the clean play surface, and a read-only debug drawer with live VM inspection (slots, variables, trace, actors, walk-box overlay) that reads the session's VM directly — privileged access, by design, because a learning tool keeps its internals visible. The drawer observes; it never drives the clock, so a stray click in it can't desync the game you're playing. The Explorer is a separate, session-free screen: pure static analysis of an installed game's resources using the same decoders the VM uses, so it works even on a game the VM cannot yet play.

The engine always works in native low-resolution pixels — mouse input, geometry, and the framebuffer all share the original game's units. The shell upscales the canvas with CSS (image-rendering: pixelated); no scaling exists inside the engine.

6. Testing

Three tiers, one principle — the browser is for integration and rendering issues, never the primary verifier:

7. Decided trade-offs

Choices that could have gone another way, and why they went this way: