In short... I started from digging through a morase of useful, but very... bloated code.
It was originally a 928k 'library' to include into a program (yes, almost a megabyte!) that would either have a 512k, 288k, or 128k codepath to solve with lots of conditional branches and other chaos. These are rough values, the actual code size could be reduced here and there via various means and simple optimization switches, but usually ended up being quite large.
From experimentation, code refactoring, and splitting of code across symetric boundaries, I've learned that I will be able to reduce the actual code down to under 16k of code+data, possibly under 8k, with no conditional branches except for the X/Y looping on the initial upconversion, and linear loops on the scanning, scaling, and downconversion operations. Even including initialization routines, the entire library will be under 32k in size. For any size of upscaling. Instead of almost a megabyte, depending on how many upscaling settings you included support for.
This code will use first-generation MMX math extensively, as the core algorithm is only meant to be used on newer machines as it IS a highly detailed algorithm by comparison to most other ones.
And that's something of a summary of what I've been managing to pull off, without the technical junk that seemed to confuse other people so badly. :-)
It was originally a 928k 'library' to include into a program (yes, almost a megabyte!) that would either have a 512k, 288k, or 128k codepath to solve with lots of conditional branches and other chaos. These are rough values, the actual code size could be reduced here and there via various means and simple optimization switches, but usually ended up being quite large.
From experimentation, code refactoring, and splitting of code across symetric boundaries, I've learned that I will be able to reduce the actual code down to under 16k of code+data, possibly under 8k, with no conditional branches except for the X/Y looping on the initial upconversion, and linear loops on the scanning, scaling, and downconversion operations. Even including initialization routines, the entire library will be under 32k in size. For any size of upscaling. Instead of almost a megabyte, depending on how many upscaling settings you included support for.
This code will use first-generation MMX math extensively, as the core algorithm is only meant to be used on newer machines as it IS a highly detailed algorithm by comparison to most other ones.
And that's something of a summary of what I've been managing to pull off, without the technical junk that seemed to confuse other people so badly. :-)