Dec. 21st, 2006

wolfwings: (city of villains)

Stuck up in San Jose because there's nowhere for me to sleep at my grandmother's this week, or possibly next week either. I'll find out this weekend what's up.

On the coding front, I'm almost done re-writing (since I can't really call it a conversion anymore, it's a full re-write from base math/concepts now) the bijective Arithmetic encoder in straight C.

Before & after in here. )

I've realized one other nice trick that comes as a side-effect of Bijective compression. The routines, when properly written, by their very definition do not require 'end of file' markers, or anything except X markers for X symbols in an alphabet. I.E. If you don't have an explicit 'end of file' marker, and have to be able to have decompress(compress(file)) always return the same results as compress(decompress(file)) then you can't have things like file-length or original file-name tags inside the resulting files either. They are truly self-contained, opaque, compressed blocks of raw data. This makes them great for encapsulating in other formats, though it also makes them incredibly fragile because if any bits get changed the entire morass goes splat.

On a side-note, I've started looking into adding this compression to the Quake 3 networking protocol. This led me into looking up better methods of compressing floating-point numbers, and a rather straight-forward observation: IEEE floating-point numbers are not formatted in a way that's good to compress, but it is is a format that's easy to do comparison operations on. Amusingly, some simple bit-shuffling makes the numbers a lot more compressible byte-wise. Long story short, glue the sign bit onto the top of the Mantissa, and deal with the Exponent separately since it's already an 8-bit number. I'm sure that's in some publication out there somewhere, but I only had reason to figure it out for myself tonight for the first time ever.

The only complication will be the sheer quantity of encoder-heaps I'll have to track if I want to support ongoing heaps (the core bit of math-fu in the arithmetic encoder) between server frames instead of only one-shots for each frame. Each heap takes up about 2k, and I think I'll need four heaps tracked ideally. (Floating-point Exponent, Floating-point Sign/Mantissa, 32/16-bit Integers, and Text) So, about 8k per client connection per frame of sync delay... so roughly 8k per client per 50ms of latency between them and the server, on average. Worst-case, about 10MB of memory used if a fully-loaded 64-player Quake3 server had all the players >999 ping. At that point I believe some kind of 'skip the delta, here the full state' mode would be better. More realistically, I'll probably allocate a fixed buffer of 'heaps' ahead of time, and if no more can be allocated it'll force a client to 'zero state' frames, and flush all their heaps back into the pool. Just means I'll have to allocate one of the 'reserved' bits to flag a frame as 'heap based on delta-from frame' or 'heap based on zero' but that's pretty minor.

wolfwings: (Default)
So I decided to take the plunge into GooglePages, now that I'm done treble-verifying, proof-reading, and testing the code I've been working on for a week or so.

Easy2Port Code Repository, v0.1

The arithmetic/range coding code ends up being under 20k total, in four distinct modules and a ported example compressor/decompressor included. None of the modules (one .c and one .h file each) know anything about the internal workings of each other, and are fully 'black boxed' though the statistical heap module does have a calling convention specifically for range coding. Everything passed the strictest syntax and other forms of automated checking I could put them through, and I believe all the code is simplified enough it should be easy to port to other languages.

Thoughts? Suggestions?

Edit: Now that I'm awake, working with the GooglePages editor a bit more. My server's up, but I'm actually starting to like GP's interface a lot. It's certainly convenient. =o.O=

Style Credit