Lessons learned from my first dive into WebAssembly

It began as a water sort puzzle solver, constructed similarly to my British Square solver. It was nearly playable, so I added a user interface with SDL2. My wife enjoyed it on her desktop, but wished to play on her phone. So then I needed to either rewrite it in JavaScript and hope the solver was still fast enough for real-time use, or figure out WebAssembly (WASM). I succeeded, and now my game runs in browsers (source). Like before, next I ported my pkg-config clone to the WASM System Interface (WASI), whipped up a proof-of-concept UI, and it too runs in browsers. Neither use a language runtime, resulting in little 8kB and 28kB WASM binaries respectively. In this article I share my experiences and techniques.

WASM is a specification defining an abstract stack machine with a Harvard architecture, and related formats. There are just four types, i32, i64, f32, and f64. It also has “linear” octet-addressable memory starting at zero, with no alignment restrictions on loads and stores. Address zero is a valid, writable address, which resurfaces some, old school, high level language challenges regarding null pointers. There are 32-bit and 64-bit flavors, though the latter remains experimental. That suits me: I appreciate smaller pointers on 64-bit hosts, and I wish I could opt into more often (e.g. x32).

As browser tech goes, they chose an apt name: WebAssembly is to the web as JavaScript is to Java.

There are distinct components at play, and much of the online discussion doesn’t do a great job drawing lines between them:

A combination of compiler-linker-runtime is conventionally called a toolchain. However, because almost any Clang installation can target WASM out-of-the-box, and we’re skipping the language runtime, you can compile any of programs discussed in this article, including my game, with nothing more than Clang (invoking wasm-ld implicitly). If you have a WASM runtime, which includes your browser, you can run them, too! Though this article will mostly focus on WASI, and you’ll need a WASI-capable runtime to run those examples, which doesn’t include browsers (short of implementing the API with JavaScript).

I wasn’t particularly happy with the WASM runtimes I tried, so I cannot enthusiastically recommend one. I’d love if I could point to one and say, “Use the same Clang to compile the runtime that you’re using to compile WASM!” Alas, I had issues compiling, the runtime was buggy, or WASI was incomplete. However, wazero (Go) was the easiest for me to use and it worked well enough, so I will use it in examples:

$ go install github.com/tetratelabs/wazero/cmd/wazero@latest

The WASM Binary Toolkit (WABT) is good to have on hand when working with WASM, particularly wasm2wat to inspect WASM modules, sort of like objdump or readelf. It converts WASM to the WebAssembly Text Format (WAT).

Learning WASM I had quite some difficultly finding information. Outside of the WASM specification, which, despite its length, is merely a narrow slice of the ecosystem, important technical details are scattered all over the place. Some is only available as source code, some buried comments in GitHub issues, and some lost behind dead links as repositories have moved. Large parts of LLVM are undocumented beyond an mention of existence. WASI has no documentation in a web-friendly format — so I have nothing to link from here when I mention its system calls — just some IDL sources in a Git repository. An old wasi.h was the most readable, complete source of truth I could find.

Fortunately WASM is old enough that LLMs are well-versed in it, and simply asking questions, or for usage examples, was more effective than searching online. If you’re stumped on how to achieve something in the WASM ecosystem, try asking a state-of-the-art LLM for help.

Example programs

Let’s go over concrete examples to lay some foundations. Consider this simple C function:

float norm(float x, float y)
{
    return x*x + y*y;
}

To compile to WASM (32-bit) with Clang, we use the --target=wasm32:

$ clang -c --target=wasm32 -O example.c

The object file example.o is in WASM format, so WABT can examine it. Here’s the output of wasm2wat -f, where -f produces output in the “folded” format, which is how I prefer to read it.

(module
  (type (;0;) (func (param f32 f32) (result f32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func $norm (type 0) (param f32 f32) (result f32)
    (f32.add
      (f32.mul
        (local.get 0)
        (local.get 0))
      (f32.mul
        (local.get 1)
        (local.get 1)))))

We can see the ABI taking shape: Clang has predictably mapped float into f32. It similarly maps char, short, int and long onto i32. In 64-bit WASM, the Clang ABI is LP64 and maps long onto i64. There’s a also $norm function which takes two f32 parameters and returns an f32.

Getting a little more complex:

__attribute((import_name("f")))
void f(int *);

__attribute((export_name("example")))
void example(int x)
{
    f(&x);
}

The import_name function attribute indicates the module will not define it, even in another translation unit, and that it intends to import it. That is, wasm-ld will place it in the import table. The export_name function attribute indicates it’s an entry point, and so wasm-ld will list it in the export table. Linking it will make things a little clearer:

$ clang --target=wasm32 -nostdlib -Wl,--no-entry -O example.c

The -nostdlib is because we won’t be using a language runtime, and --no-entry to tell the linker not to implicitly export a function (default: _start) as an entry point. You might think this is connected with the WASM start function, but wasm-ld does not support the start section at all! We’ll have use for an entry point later. The folded WAT:

(module $a.out
  (type (;0;) (func (param i32)))
  (import "env" "f" (func $f (type 0)))
  (func $example (type 0) (param i32)
    (local i32)
    (global.set $__stack_pointer
      (local.tee 1
        (i32.sub
          (global.get $__stack_pointer)
          (i32.const 16))))
    (i32.store offset=12
      (local.get 1)
      (local.get 0))
    (call $f
      (i32.add
        (local.get 1)
        (i32.const 12)))
    (global.set $__stack_pointer
      (i32.add
        (local.get 1)
        (i32.const 16))))
  (table (;0;) 1 1 funcref)
  (memory (;0;) 2)
  (global $__stack_pointer (mut i32) (i32.const 66560))
  (export "memory" (memory 0))
  (export "example" (func $example)))

There’s a lot to unfold:

As mentioned earlier, address zero is valid as far as the WASM runtime is concerned, though dereferences are still undefined in C. This makes it more difficult to catch bugs. Given a null pointer this function would most likely read a zero at address zero and the program keeps running:

int get(int *p)
{
    return *p;
}

In WAT:

(func $get (type 0) (param i32) (result i32)
  (i32.load
    (local.get 0)))

Since the “hardware” won’t fault for us, ask Clang to do it instead:

$ clang ... -fsanitize=undefined -fsanitize-trap ...

Now in WAT:

(module
  (type (;0;) (func (param i32) (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func $get (type 0) (param i32) (result i32)
    (block  ;; label = @1
      (block  ;; label = @2
        (br_if 0 (;@2;)
          (i32.eqz
            (local.get 0)))
        (br_if 1 (;@1;)
          (i32.eqz
            (i32.and
              (local.get 0)
              (i32.const 3)))))
      (unreachable))
    (i32.load
      (local.get 0))))

Given a null pointer, get executes the unreachable instruction, causing the runtime to trap. In practice this is unrecoverable. Consider: nothing will restore __stack_pointer, and so the stack will “leak” the existing frames. (This can be worked around by exporting __stack_pointer and __stack_high via the --export linker flag, then restoring the stack pointer in the runtime after traps.)

WASM was extended with bulk memory operations, and so there are single instructions for memset and memmove, which Clang maps onto the built-ins:

void clear(void *buf, long len)
{
    __builtin_memset(buf, 0, len);
}

(Below LLVM 20 you’ll need -mbulk-memory.) In WAT we see this as memory.fill:

(module
  (type (;0;) (func (param i32 i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func $clear (type 0) (param i32 i32)
    (block  ;; label = @1
      (br_if 0 (;@1;)
        (i32.eqz
          (local.get 1)))
      (memory.fill
        (local.get 0)
        (i32.const 0)
        (local.get 1)))))

That’s great! I wish this worked so well outside of WASM. It’s one reason w64devkit has -lmemory, after all. Similarly __builtin_trap() maps onto the unreachable instruction, so we can reliably generate those as well.

What about structures? They’re passed by address. Parameter structures go on the stack, then its address passed. To return a structure, a function accepts an implicit out parameter in which to write the return. This isn’t unusual, except that it’s challenging to manage across module boundaries, i.e. in imports and exports, because caller and callee are in different address spaces. It’s especially tricky to return a structure from an export, as the caller must somehow allocate space in the callee’s address space for the result. The multi-value extension solves this, but using it in C involves an ABI change, which is still experimental.

Allocating memory

WASM programs can grow linear memory using an sbrk-like memory.grow instruction. It operates in quantities of pages (64kB), and returns the old memory size. Because memory starts at zero, the old memory size is also the base address of the new allocation. Clang provides access to this instruction via an undocumented built-in.

unsigned long __builtin_wasm_memory_grow(int, unsigned long delta);

The first parameter selects a memory because someday there might be more than one. From this built-in we can define sbrk:

void *sbrk(long size)
{
    unsigned long npages = (size + 0xffffu) >> 16;  // round up
    unsigned long old    = __builtin_wasm_memory_grow(0, npages);
    if (old == -1ul) {
        return 0;
    }
    return (void *)(old << 16);
}

To which Clang compiles (note the memory.grow):

(func $sbrk (param i32) (result i32)
  (select
    (i32.const 0)
    (i32.shl
      (local.tee 0
        (memory.grow
          (i32.shr_u
            (i32.add
              (local.get 0)
              (i32.const 65535))
            (i32.const 16))))
      (i32.const 16))
    (i32.eq
      (local.get 0)
      (i32.const -1))))

Now we can properly allocate memory arenas or other allocators:

typedef struct {
    char *beg;
    char *end;
} Arena;

Arena newarena(long cap)
{
    Arena a = {0};
    a.beg = sbrk(cap);
    if (a.beg) {
        a.end = a.beg + cap;
    }
    return a;
}

Water Sort Game

Something you might not have expected: My water sort game imports no functions! It only exports three functions:

void      game_init(i32 seed);
DrawList *game_render(i32 width, i32 height, i32 mousex, i32 mousey);
void      game_update(i32 input, i32 mousex, i32 mousey, i64 now);

The game uses IMGUI-style rendering. The caller passes in the inputs, and the game returns a kind of display list telling it what to draw. In the SDL version these turn into SDL renderer calls. In the web version, these turn into canvas draws, and “mouse” inputs may be touch events. It plays and feels the same on both platforms. Simple!

I didn’t realize it at the time, but building the SDL version first was critical to my productivity. Debugging WASM programs is really dang hard! WASM tooling has yet to catch up with 1995, let alone 2025. Source-level debugging is still experimental and impractical. Developing applications on the WASM platform. It’s about as ergonomic as developing in MS-DOS. Instead, develop on a platform much better suited for it, then port your application to WASM after you’ve got the issues worked out. The less WASM-specific code you write, the better, even if it means writing more code overall. Treat it as you would some weird embedded target.

The game comes with 10,000 seeds. I generated ~200 million puzzles, sorted them by difficulty, and skimmed the top 10k most challenging. In the game they’re still sorted by increading difficulty, so it gets harder as you make progress.

WASM System Interface

WASI allows us to get a little more hands on. Let’s start with a Hello World program. A WASI application exports a traditional _start entry point which returns nothing and takes no arguments. I’m also going to set up some basic typedefs:

typedef unsigned char       u8;
typedef   signed int        i32;
typedef   signed long long  i64;
typedef   signed long       iz;

void _start(void)
{
}

wasm-ld will automatically export this function, so we don’t need an export_name attribute. This program successfully does nothing:

$ clang --target=wasm32 -nostdlib -o hello.wasm hello.c
$ wazero run hello.wasm && echo ok
ok

To write output WASI defines fd_write():

typedef struct {
    u8 *buf;
    iz  len;
} IoVec;

#define WASI(s) __attribute((import_module("wasi_unstable"),import_name(s)))
WASI("fd_write")  i32  fd_write(i32, IoVec *, iz, iz *);

Technically those iz variables are supposed to be size_t, passed through WASM as i32, but this is a foreign function, I know the ABI, and so I can do as I please. I absolutely love that WASI barely uses null-terminated strings, not even for paths, which is a breath of fresh air, but they still marred the API with unsigned sizes. Which I choose to ignore.

This function is shaped like POSIX writev(), but does not move the file cursor. That requires fd_seek, or, probably better, tracking it yourself and using fd_pwrite, but that won’t matter for this program. I’ve also set it up for import, including a module name. The oldest, most stable version of WASI is called wasi_unstable. (I suppose it shouldn’t be surprising that finding information in this ecosystem is difficult.)

Every returning WASI function returns an errno value, with zero as success rather than some kind of in-band signaling. Hence the final out parameter unlike POSIX writev().

Armed with this function, let’s use it:

void _start(void)
{
    u8    msg[] = "hello world\n";
    IoVec iov   = {msg, sizeof(msg)-1};
    iz    len   = 0;
    fd_write(1, &iov, 1, &len);
}

Then:

$ clang --target=wasm32 -nostdlib -o hello.wasm hello.c
$ wazero run hello.wasm
hello world

Keep going and you’ll have something like printf before long. If the write fails, we should probably communicate the error with at least the exit status. Because _start doesn’t return a status, we need to exit, for which we have proc_exit. It doesn’t return, so no errno return value.

WASI("proc_exit") void proc_exit(i32);

void _start(void)
{
    // ...
    i32 err = fd_write(1, &iov, 1, &len);
    proc_exit(!!err);
}

To get the command line arguments, call args_sizes_get to get the size, allocate some memory, then args_get to read the arguments. Same goes for the environment with a similar pair of functions. The sizes do not include a null pointer terminator, which is sensible.

Now that you know how to find and use these functions, you don’t need me to go through each one. However, opening files is a special, complicated case:

WASI("path_open") i32 path_open(i32,i32,u8*,iz,i32,i64,i64,i32,i32*);

That’s 9 parameters — and I had thought Win32 CreateFileW was over the top. It’s even more complex than it looks. It works more like POSIX openat(), except there’s no current working directory and so no AT_FDCWD. Every file and directory is opened relative to another directory, and absolute paths are invalid. If there’s no AT_FDCWD, how does one open the first directory? That’s called a preopen and it’s core to the file system security mechanism of WASI.

The WASM runtime preopens zero or more directories before starting the program and assigns them the lowest numbered file descriptors starting at file descriptor 3 (after standard input, output, and error). A program intending to use path_open must first traverse the file descriptors, probing for preopens with fd_prestat_get and retrieving their path name with fd_prestat_dir_name. This name may or may not map back onto a real system path, and so this is a kind of virtual file system for the WASM module. The probe stops on the first error.

To open an absolute path, it must find a matching preopen, then from it construct a path relative to that directory. This part I much dislike, as the module must contain complex path parsing functionality even in the simple case. Opening files is the most complex piece of the whole API.

I mentioned before that program data is below the Clang stack. With the stack growing down, this sounds like a bad idea. A stack overflow quietly clobbers your data, and is difficult to recognize. More sensible to put the stack at the bottom so that it overflows off the bottom of memory and causes a fast fault. Fortunately there’s a switch for that:

$ clang --target=wasm32 ... -Wl,--stack-first ...

This is probably what you want to use by default, and I don’t know why it’s not the default for the linker.

u-config

The above is in action in the u-config WASM port. You can download the WASM module, pkg-config.wasm, used in the web demo to run it in your favorite WASI-capable WASM runtime:

$ wazero run pkg-config.wasm --modversion pkg-config
0.33.3

Though there are no preopens, so it cannot read any files. The -mount option maps real file system paths to preopens. This mounts the entire root file system read-only (ro) as /.

$ wazero run -mount /::ro pkg-config.wasm --cflags sdl2
-I/usr/include/SDL2 -D_REENTRANT

I doubt this is useful for anything, but it was a vehicle for learning and trying WASM, and the results are pretty neat.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)