Journey to Poom


Doomed Assault

As in many gamedev stories, Poom should not have existed.

It came to be possible on Pico8 thanks to a (still unpublished) project, an Assault demake - a game I used to play at arcades as a kid (dual stick! excellent music! ultra low bass explosions!).

Early 2020, quite proud of my silky smooth rotozoom engine running entirely in memory, well below 50% cpu with enemy units.

@Eniko & @Electricgryphon had already paved the way, I went slightly further:

load #assault (requires a 0.1.12c version to run at full speed)

Engine is looking good, time to export to HTML and demo it. I run a couple of tests on my home computers and mobiles... 

It did not went well, performance was all over the place and required a powerful PC to run at full speed.

Reporting the bug (?) to Zep (Joseph White, Pico8 author), it became apparent the game relied too much on binary operations and trashed the web player. The delicate balance of simulated API costs did not account for so many "low level" operations per frame.

Too much binary ops? Nah...

poke4(
  mem,
  bor(
      bor(
          bor(rotr(band(shl(m[bor(band(mx,0xffff),
              band(lshr(srcy,16),0x0.ffff))],shl(band(srcx,7),2)),0xf000),28),
          rotr(band(shl(m[bor(band(mx+mdx1,0xffff),
                band(lshr(srcy-ddy1,16),0x0.ffff))],shl(band(srcx+ddx1,7),2)),0xf000),24)),
          bor(rotr(band(shl(m[bor(band(mx+mdx2,0xffff),
                band(lshr(srcy-ddy2,16),0x0.ffff))],shl(band(srcx+ddx2,7),2)),0xf000),20),
          rotr(band(shl(m[bor(band(mx+mdx3,0xffff),
                band(lshr(srcy-ddy3,16),0x0.ffff))],shl(band(srcx+ddx3,7),2)),0xf000),16))
      ),
      bor(
      bor(rotr(band(shl(m[bor(band(mx+mdx4,0xffff),
        band(lshr(srcy-ddy4,16),0x0.ffff))],shl(band(srcx+ddx4,7),2)),0xf000),12),
      rotr(band(shl(m[bor(band(mx+mdx5,0xffff),
        band(lshr(srcy-ddy5,16),0x0.ffff))],shl(band(srcx+ddx5,7),2)),0xf000),8)),
      bor(rotr(band(shl(m[bor(band(mx+mdx6,0xffff),
        band(lshr(srcy-ddy6,16),0x0.ffff))],shl(band(srcx+ddx6,7),2)),0xf000),4),
      band(shl(m[bor(band(mx+mdx7,0xffff),
        band(lshr(srcy-ddy7,16),0x0.ffff))],shl(band(srcx+ddx7,7),2)),0xf000))
      )
  )

Long story short, Pico8 0.2 is out shortly after - binary operators and tline ("textured line") are a thing...

The new addition to the Pico8 API manual reads:

tline x0 y0 x1 y1 mx my [mdx mdy] [layers]
    Draw a textured line from (x0,y0) to (x1,y1), sampling colour values from the map.

Assault port to tline proved the huge potential of the function:

45% cpu - 1024x1024 pix map - rotating enemies - nuklear blast!

Half joking I told Zep about the flood of Doom clones we will have in no time...

What if, say, I dig myself a bit more into binary space partitioning (BSP trees) & portals?

A sign that current project is going to have some competition...

Doom Engine? Nah!

Good thing approaching a (very successful) game 30 years late is that documentation & tooling is top notch!

Driven by portability & extensibility, level geometry is now expressed using an open text format (UDMF), saving the need for tedious binary unpacking of WAD structure (to some extent...) and perfect for quick hacking.

May 9-10th: armed with Real-Time Collision Detection from Christer Ericson, ANTLR for UDMF parsing and a good pinch of Python I got my first BSP compiler up and a running Pico8 renderer:


each polygon has its own color - notice split on pillar
https://twitter.com/FSouchu/status/1259520453128990721?s=2

Sorry Assault, see you on the other side!

Next 2 weeks are spent digging into WAD structure (I think I know what a linedef is by now...) and going over Zdoom wiki a million times.

My custom compiler was quickly dropped in favor of the "official" zbsp compiler (nothing beats years of bug fixing!), Python code fully decodes binary WAD files, complex geometry is supported, including textured walls & floors:


https://twitter.com/FSouchu/status/1266474492890624011?s=20

Slow motion rendering (without textures), each color represents a convex sub-sector:


Quake To The Rescue

Since textures are in, the engine departed from "canon" front-to-back Doom rendering to back to front.

I really never thought front to back woudl work anyway - programming for Pico8 is more akin to targeting a very low end GPU.

As per Doom standard, perspective correct texturing & shading work for cheap as long as the world is made of flat walls & floors.

Fast depth shading (the farther you look, the darker it is) is done using standard palette swapping. Depth information indexes a gradient table, stored in Pico8 ram. It makes full palette swap faster than regular method pal() call, saving precious cycles in the core wall & floor rasterization loop:

-- pal1: normalized integer depth
if(pal0!=pal1) memcpy(0x5f00,0x4300|pal1<<4,16) pal0=pal1

Back to front rendering means loosing natural "narrowing" of level scanning of standard Doom (see Black Book for an extensive explanation!) and the limited performance impact of large levels that comes with it.

That's where having read Mike Abrash Black Book years ago helped.

I knew Quake had the same issue and solved it using a Potentially Visible Set (PVS). A PVS, generated at compile time, is the set of all potentially visible convex sub-sectors from any given given sub-sector (read it twice, slowly!). 

Of course, zbsp doesn't generate such information...

Some more Python code & maths (and cursing!), it works!!!

Image

bold numbers are all sub-sectors visibles from sub-sector 38

Surprisingly, there is little literature on how to generate a proper PVS, most notable sources I found:

Source Engine PVS - A Closer Look

Potentially Visible Sets explained 

PVS is encoded as a bitfield, stored as 32bits chunks to keep memory usage under control (a much simpler version than RLE encoding of Quake PVS):

-- pvs (packed as a bit array)
unpack_array(function()
    -- visible sub-sector id (16 bits)
    local id=unpack_variant()
    -- pack as a bitfield
    pvs[id\32]=bor(pvs[id\32],0x0.0001<<(id&31))
end)

Virtual Sprites

Working some more on getting rotating sprites into game engine, it is obvious sprite real-estate is going to be a major pain point. For reference, a single soldier pose is eating almost half of the space available. And that's for a "small" sprite...


I knew the game would be multi-cart, that is, storage may not be my main concern and Pico8 has a large 2MB of LUA ram to play with.

What if sprites could fit in memory, how to then best use built-in Pico8 sprite scaling capabilities (read, sspr)?

What if I had some kind of fast memory allocator? Last recenly used cache is a  good design pattern for this case, simple enough if chunks are of fixed size.

What if I split sprites into little (but not too small) chunks, say 16x16 and get their actual sprite location from that "virtual memory" allocator?

July 15th, this is indeed a very workable approach - memcpy is fast enough to swap required sprite tiles on the fly. Sprites can be up to 16 tiles, e.g. up 64 by 64 pixels, allowing large monsters like Cyberdemon at 1/2 of their original size. 

Say hi to Cyb'!

Virtual Sprite Engine (tm!) integrated in game - monsters can be much more than a blurry pixelated mess!

Actors are registered in each sub-sector they are touching, based on their radius. Multiple actors on a given sub-sector are sorted using a basic insertion sort, assuming the number of "co-located" actors is usually low:

      -- all things in sub-sector
      for thing,_ in pairs(segs.things) do
        local x,y=thing[1],thing[2]
        local ax,az=m1*x+m3*y+m4,m9*x+m11*y+m12
        -- todo: take radius into account 
        if az>8 and az<854 and ax<az and -ax<az then
          -- default: insert at end of sorted array
          local w,thingi=128/az,#things+1
          -- basic insertion sort
          for i,otherthing in ipairs(things) do          
            if(otherthing[1]>w) thingi=i break
          end
          -- perspective scaling
          add(things,{w,thing,63.5+ax*w,63.5-(thing[3]+m8)*w},thingi)
        end
      end

Once sorted, sprites are rendered right after sector's polygons.

Note that registering actor into multiple sub-sector also solves the issue of overlapping sprite/polygons. 

See how back to front rendering of convex sectors are erasing part of Cacodaemon in below gif. Sprite is rendered multiple times, and the last render (in the nearest sub-sector) fixes the image:

 

Registering actors per sub-sector (e.g. BSP leaves) is also used to speed up player/actor & actor/actor collision detection.

Summer time, at this point, my goal is clear: make the engine as easy as possible to work with an artist, I'll need a team to realize the vision.

On to gameplay! 

Decorate Love Letter

The DECORATE format is brilliant!

Each "thing" runs it's own little state machine, can reference sprites and call game functions:

actor ZombieMan : Monster 3004
{
  Health 20
  Radius 20
  Height 56
  Speed 8
  States
  {
  Spawn:
    POSS A 10 A_Look;
    Loop
  See:
    POSS A 8
    POSS B 8 A_Chase;
    Loop
  Missile:
    POSS E 10 A_FaceTarget;
    POSS F 8 A_FireBullets(22.5, 0, 1, 9, "BulletPuff");
    POSS E 8
    Goto See
  Death:
    POSS H 5
    POSS I 5 // A_Scream
    POSS J 5 // A_NoBlocking
    POSS K 5
    POSS L 60
    Stop
  }
}

The Python compiler supports a limited set of features, but enough to support key Doom gameplay elements (states, animations, function calls with parameters).

The runtime part is simple enough to fit in less than 20 lines of code:

-- vm update
tick=function(self)
  while ticks!=-1 do
    -- wait (+reset random startup delay)
    if(ticks>0) ticks+=delay-1 delay=0 return true
    -- done, next step
    if(ticks==0) i+=1
::loop::
    local state=states[i]
    -- stop (or end of vm instructions)
    if(not state or state.jmp==-1) del_thing(self) return
    -- loop or goto
    if(state.jmp) self:jump_to(state.jmp) goto loop
    -- effective state
    self.state=state
    -- get ticks
    ticks=state[1]
    -- trigger function (if any)
    -- provide owner and self (eg. for weapons)
    if(state.fn) state.fn(self.owner or self,self)
  end
end

Best of all, everything in game is described using the same syntax: weapons, bullets, items & monsters!

By August, I have a Python package that can be easily installed. Compiler supports many of key Doom features (doors, platforms, monsters, infigthing, multiple weapons, pick ups, difficulty levels...).

Main "compilation" pipeline steps:

  • Read WAD entries
  • Extract normal & pain palettes
  • Read actors (from DECORATE file)
    • Split actor sprites into unique tiles
    • Read properties
    • Decode state machine & function bindings
    • Decode sprite properties 
  • For all maps (from gameinfo file)
    • Convert texture into unique set of tiles (max. 128)
    • Read skybox image (if any)
    • Read level data
      • sectors, sides, vertices, linedefs, pvs, sub-sector & BSP nodes
      • specials (triggers)
      • active textures
      • things (e.g. monsters, weapons...)
  • Actors & map data is packed into multiple carts

The Right Match

End August, I am reaching out to Paranoid Cactus (of X-Zero fame and I am total fan of his work!), let's wait and see...

Paranoid Cactus (Simon Hulsinga IRL) replies a couple of days after and seems to be interested - good news!

September 9th, Simon delivers a first test level:


wow (it became my favorite expression throughout the project)

Code is almost complete, we should be shipping by what, end September?

....

If we knew...

8192 Tokens Forever

It so happens that Simon is a multi-classed gamedev, mastering code, art, music & gameplay (yeah, life is unfair!).

Paranoid Cactus takes the lead on gameplay decisions, we both get into a routine of challenging current features, reworking the engine to support ever increasing details while keeping tokens in check. To name a few:

  • Dedicated title screen
  • Weapon wheel
  • Save player state between levels
  • Flying monster support
  • Transparent textures
  • Secret sectors
  • Skybox
  • Sound blocked by walls
  • Non-bullet weapons (e.g. hands)

Code goes into massive refactoring, always close to the danger zone, always finding new ways to squeeze our last idea in! 

Example token optimization technique, where item identifier, property name and unpacking function are declared as a large text block:

-- layout:
-- property mask
-- property class name
-- property unpack function
local properties_factory=split("0x0.0001,health,unpack_variant,0x0.0002,armor,unpack_variant,0x0.0004,amount,unpack_variant,0x0.0008,maxamount,unpack_variant,0x0.0010,icon,unpack_chr,0x0.0020,slot,mpeek,0x0.0040,ammouse,unpack_variant,0x0.0080,speed,unpack_variant,0x0.0100,damage,unpack_variant,0x0.0200,ammotype,unpack_ref,0x0.0800,mass,unpack_variant,0x0.1000,pickupsound,unpack_variant,0x0.2000,attacksound,unpack_variant,0x0.4000,hudcolor,unpack_variant,0x0.8000,deathsound,unpack_variant,0x1,meleerange,unpack_variant,0x2,maxtargetrange,unpack_variant,0x4,ammogive,unpack_variant,0x8,trailtype,unpack_ref,0x10,drag,unpack_fixed",",",1)
-- properties: property bitfield of current actor
for i=1,#properties_factory,3 do
    if properties_factory[i]&properties!=0 then
        -- unpack & assign value to actor         
        actor[properties_factory[i+1]]=_ENV[properties_factory[i+2]](actors)
    end
end

Bumpy Zone

So far it looks like everything went smooth and nice... It did not!

Collisions

Collision went through major refactoring multiple times to make world feels solid - one of the key point of Doom.

I was actually surprised to find that Doom used a different data model to handle collision (well known BLOCKMAP), when BSP would have a perfect match (Black Book & Carmack notes confirm it was a missed opportunity). 

Poom collision code generates a list of linedefs traversed along a ray, using the BSP to traverse the world in order.

One day you have a super solid collision routine, the next day you got that gif:

Root cause is that a BSP tree loose sector spatial relationship, e.g. 2 walls might be on 2 different part of the tree, leaving a kind of "gap" when handling collision in sequence.

Solution was to treat each wall as a capsule (e.g. a ray with a radius), ensuring player path cannot fall in between wall segments.

Black sectors

PVS calculation has also been quite tricky to get right in all cases. When failing, whole sectors would end up disapearing from screen (dubbed "black sectors" by me and Simon).

Left side: standing in red sector, zone circled yellow is clearly MIA! 

Getting back to Chris Ericson reference, correct PVS calculation algorithm ended up as:

from a given sub-sector (convex zone):
 iterate over all double sided linedef
    register connected sub-sector ("other sub-sector")    
    # find all anti-portals
    for all double sided linedef from other sub-sector:
        if other linedef is front or straddling current linedef:
            register linedef pair as a portal
# clip & find visible geometry
while portal set is not empty:
    create clip region ("anti-penumbra")
    for all sides of destination portal:
        if side is back or straddling:
            clip segment
            if anything remains:
                register a new portal
                mark sub-sector visible        

Compression

Compression was not supported until late in game developpement, I had already a LZW encoder for Assault that worked rather well. 

Thing is, I soon realized that LZW decompression is a memory hog, certainly not compatible with a game already stretching available RAM...

Google to the rescue, I was sure the embedded crowd had something for me. 

A random post on some Arduino forum lead me to this: https://www.excamera.com/sphinx/article-compression.html

Compression code in Python, decompression code small enough to fit token budget and best of all, fixed memory cap!

Compression ratio is about 50%, does a good enough job on picture and world data.

Thanks James Bowman!

Out of Memory

That roadblock was a big one - playing levels back to back would tip the game over the 2MB limit.

Below is the kind of chart I used to find out if I was battling a garbage collection bug or dependency cycles or some other bugs...


The game went though heavy refactoring at this point:

  • hot reload after death was removed (ensures clean memory slate)
  • most data structures where converted to arrays, code lisibility suffered but memory usage went back to safe zone
-- 11KB
for i=1,100 do
    add(buf,{id=1,tex=2,sp=3,hp=4})
end
-- 7Kb
for i=1,100 do
    add(buf,{1,2,3,4})
end

Artwork & Level Design

Poom appeal would have been nothing without the right pixels - and we needed many many pixels...

Left to right: Original Doom Imp | automatic 50% scaling + palette conversion| hand fixed

Best part of the September to December was spent redrawing the selected cast to reach the right level of quality.

What Went Right

Choices

Pico8 games are all about making choices, even with such a large game, not every idea could make it.

Fun & gameplay came mostly first, second "hey, it fits"! I trusted Simon when it came to game design decisions, still trying to lure him with new engine features like transparent textures!

Beta Test

Mid November game is ready for testing.

As obvious as it seems, once you have spent some months working on a game, having a pair of fresh eyes is invaluable.

Sending out invites to Tom Hall (of Doom design fame and resident Pico8 Discord member) and Henri Stadolnik (of Fuz fame), they quickly proposed a good number of quality of life tweaks & great insight into level progression.

The most notable addition was mouse support (quoting Tom "I want mouse support"), wiring "mouse lock" browser handler to Pico8 GPIO to communicate coordinates back to the game.

Demoing that to Zep certainly helped bring mouse lock as a native Pico8 feature (with a under the cover patch delivered in the wee hours of the night!) - Thanks!

Conclusion

Behond the technique, the main take away is that such large game would have been difficult to pull out without teamwork, kind words & contributions from the extended Pico8 community (hey @farbs, @nucleartide, @sam, @valeradhd...!) .

Game was our biggest success so far totalling 10K+ downloads, 100K+ web sessions, hundreds of comments...

Thank you all for that (and thanks Id Software for such a timeless game)!

Reading gave you some new gameplay ideas, want to rework sprites or try to improve engine?
Hesitate no more, a full blown SDK is there - the same tools we used to make the game! 
Poom SDK (support Discord: https://discord.gg/Bmc4nxjfuE)

Get POOM

Download NowName your own price

Comments

Log in with itch.io to leave a comment.

Thank you

(+1)

Thanks for the write up and the sdk.

(+1)

happy to know if you do anything with it!!

(+1)

Thanks a lot for the write-up, really interesting!

(+1)

Delightful read.  Thanks for writing it all out.

(1 edit) (+1)

Thanks for sharing this log with us!

(+1)

very cool in-depth write-up for one of the most badass games on the site :)

(+2)

Amazing! Thanks for sharing!