wolfwings: (Default)
[personal profile] wolfwings
...but readable code is so much more usable.

I've always been a fan of data-compression and encryption technologies, or as I think of them and numerous related areas, data-transformation systems since (for Lossless compression and encryption, the only thing I've had much interest in) that's all they really are is transformations.

Lately I've been most interested in a surprisingly little-researched area, namely the compression of hundreds or thousands of very small items, typically each less than a hundred bytes. This has obvious applications in on-line gaming, and for example to compress values in a large database of keys in the case of MU*'s such as Fuzzball.

Far too much research has been written on absolute compression ratios, but a LOT of those techniques actually decrease the compression-ratio on very small files because of the 'warm up' time before the techniques really shine. Very much akin to optimizing a vehicle for stop-and-go versus highway driving.

I've been stepping through a couple of various compression technologies, namely Dynamic Vitter Huffman compression lately, and ran across something I found useful and interesting. Namely a Bijective version Huffman compression. This has a very unique trait, namely that any bitstream is valid. Meaning this is very useful for things such as gaming traffic because you can always safely call the decompress routine on a packet of data regardless of content and it will transform without error.

And this led me to downloading the code, and trying to understand it. I... don't believe I've ever seen such horribly-mangled code before in my life, not even in horribly Pascal/x86 Assembler mashups from my demo-coding tutorial-reading days.

And this is leading me to rewrite the entire block of code, but while I was doing that I realized... I really don't know of any good repository to 'publish' example source code for things like Huffman Compression, or Range Coding. Google can find examples, but there's no single site to post Wiki-editable or even just commentable code chunks that can be used directly but are meant to be illustrative and readable more than fast or efficient.

Anyone think setting up such a site would actually be useful? Something half-way between Wikipedia, MathWorld, and SourceForge meant more for discussion and community editing of a single file of self-contained source code, but very heavilly biased towards functional source code that's easy to port (for example favoring straight C over C++ or C#, and Assembly being unwelcome) and not just commentary and mathematical equations.

(no subject)

Date: 2006-11-30 02:46 am (UTC)
From: [identity profile] alexismckee.livejournal.com
I for one would be positive of the idea of a code wiki, but don't you think it's been done before? ... Wikipedia, Mathworld and Sourceforge neither do this well, I agree :(

Oh, and in the bijective Huffman; how can decoding any bitstream without an error be useful? Surely you want an error if the thing is incorrect, no?

The usefulness is somewhat subtle.

Date: 2006-11-30 07:42 am (UTC)
From: [identity profile] wolfwings.livejournal.com
Basically if you know any bitstream can be considered valid to pass to the decompressor, you also know that any bitstream is safe to pass as well. It's also more useful if you're going to use encryption after the compression (standard practice as far as I know), because it prevents scanning the compressed-data stream for validity as a way to test encryption keys.

And I hadn't seen anyplace that did what I described, specifically inviting code that was readable and easy to port, instead of already-optimized, or in a language that can be hard to port from. So I made this post both to mention such an idea, and see if anyone knew of such a place. :-)

Re: The usefulness is somewhat subtle.

Date: 2006-11-30 07:49 am (UTC)
From: [identity profile] alexismckee.livejournal.com
Hmmm, it somehow seems to me that the only thing you really do then is to omit an integrity check, but I'm sure I'm way off track... Mind sharing the links/docs on it?

Re: The usefulness is somewhat subtle.

Date: 2006-11-30 09:03 am (UTC)
From: [identity profile] wolfwings.livejournal.com
This is a good FAQ on Bijective Compression in relation to cryptgraphy.

The most commonly referenced Bijective Compression site is really hard to read and understand unfortunately.

I look at using bijective compression as convenient, because it doesn't need to know if a piece of data is clean or tainted before application of bijective compression, it's always a safe transformation to apply. Sure, it's a bit of coder lazyness I suppose, but it is convenient to have some types of compression that you don't have to worry about crafted or just badly mangled datastreams breaking the decompressor with mathematical certainty

Re: The usefulness is somewhat subtle.

Date: 2006-11-30 09:19 am (UTC)
From: [identity profile] alexismckee.livejournal.com
Well, as I can see their only real argument for the use of bijective compression is that it eliminates a particularly annoying kind of brute force cracking... ?

(no subject)

Date: 2006-11-30 03:39 am (UTC)
From: [identity profile] qdot.livejournal.com
Waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaay back in the day of 3+ years ago, we had Sweetcode, which now seems to only exist as a RSS remnant. Wasn't really a wiki-ish type place, but still had stuff like you described.

(no subject)

Date: 2006-11-30 06:01 am (UTC)
From: [identity profile] teddytiger.livejournal.com
Off topic but I love your icon, qdot. I have Neverhood, spent $50 to buy it fresh in the box offa Ebay.
From: [identity profile] wolfwings.livejournal.com
I'll rummage around and see if I can get the MediaWiki stuff set up somewhere then.

Another site...

Date: 2006-12-02 01:46 am (UTC)
From: [identity profile] wolfwings.livejournal.com
...started but left to go stagnant by the looks of things. Around the same time Sweetcode went bamf as well, apparently.

Style Credit