oneKpaq Linux version

Dependencies

You may want to install libdispatch, or compression will be really really
slow. If there isn't a distro package for it, you can grab it
here, and then edit
your config.mk to set HAS_LIBDISPATCH to 1, and LIBDISPATCH_INC_DIR and
LIBDISPATCH_LIB_DIR to the right values.

In any case, libdispatch is not a hard requirement. But you'll really really
want it.

Compiling

$ make

Running

For ease of use (to compress a binary immediately into another executable
binary), you probably want to run the Python script instead of the C++ program:

$ ./onekpaq.py --help
usage: onekpaq.py [-h] [--onekpaq ONEKPAQ] [--nasm NASM] [--stub STUB]
                  {1,2,3,4} {1,2,3} input [input ...] output

positional arguments:
  {1,2,3,4}          Selects the compression and decompression mode: '1' is 'single block,
                     small decoder', '2' is 'multi-block, small decoder', '3' is 'single block,
                     fast decoder', '4' is 'multi-block, fast decoder'.
  {1,2,3}            Compression complexity to use. Higher values mean higher compression, but
                     also longer compression times.
  input              Input block(s)
  output             Output executable

optional arguments:
  -h, --help         show this help message and exit
  --onekpaq ONEKPAQ  The onekpaq_encode binary to use
  --nasm NASM        The nasm binary to use
  --stub STUB        The assembly decompressor stub to use

$ ./onekpaq.py 3 3 hello hello.okp && ./hello.okp
Hello world!

Currently, this program has to be executed in the root directory of this
repository.

Original README


               .--------. .--------.   .---.
               `-----.   |`------.  `.  \   \
                     |   |        |  |.--    \
                     |   |.------'   .`---    \
                     `---'`---------'      `---'

----------------------------------------------------------------
                     oneKpaq v1.0 by TS/TDA
----------------------------------------------------------------

Overview

oneKpaq is compressor/decompressor pair for 1k / 4k intros.
It does not include any of the dynamic loader parts or hash
loading - this is pure compressor for all platfroms who want or
need a proper compressor. (x86 platform that is)

Some of its algorithms are inspired from PAQ family of packers
thus the naming. Other big inspiration was the crinkler (from
what is publicly known of it) The embeddable decompressor is
written in x86 assembly and it is 128 bytes at smallest.

Currently only 32-bit version is fully finished. if you need
64-bit, contact me. (Also 16-bit version could be doable)

Design

I set out to re-make the data compression for my future 4k/1k
intros with following guidelines

Compression ratio as close to PAQ as possible with simplified
decompressor - Same as in Crinkler
Very small decompressor to be usable in 1K prods
Static context models with adaptive multi section support for
improved compression ratio
Decompression time < 10 seconds on a modern CPU

From these ideas, I set out to experiment and ended up on
following design (copycatting following ideas from others):

8-byte window + decoded bits as a context w. mask bits for
bytes resulting a total of 255 models to choose from
model weights in the scale of 1/2^n (total being 1)
PAQ1 count boosting
Logistic mixing
Hashless implementation, slow but uses very little memory
Two modes for compression speed: small and fast (i.e. 1K/4K)
Two modes for compression efficiency: single section or
adaptive multi-section support for different kinds of data
in same binary

Usage

The compressor code is written using modern-ish C++ compatible
with recent clang and gcc although with gcc you have to modify
build flags a bit. It also should compile on VS2015, although
I was lazy and did not generate .vsproj for it, ABI would be
different as well. For the assmebly I used the ubiquitos nasm.
For parallelization I used dispatch library. Although it is
optional, I strongly recommended using it.

For the simplicity, the supported operating modes have been
enumerated as follows:

Single section decoder. Slow decoder for single section data: Decompressor is 128 bytes
Multi section decode. Slow decoder for multi-section data: Decompressor is 150 bytes
Single section decoder. Fast decoder for single section data: Decompressor is 143 bytes
Multi section decode. Fast decoder for multi-section data: Decompressor is 165 bytes

For the compressor, three complexity modes for compressing has
been defined: Low, Medium and High. These will translate to
minutes, hours and a day(s) of compressor running time.
Unsuprisingly, this is something you want to cache, and is
in fact cached by default so once searched models will be used
again.

Look for the oneKpaq_main.cpp how to use the compressor and
AsmDecode.cpp how to use the decompressor. Typical freestanding
usage of the decompressor would be something like this:

	mov ebx,source_of_compressed_code
	push ebx
	mov edi,destination_in_bss
%include "onekpaq_decompressor32.asm"
	ret

Please mind and define ONEKPAQ_DECOMPRESSOR_SHIFT for the
correct value produced, or include setting this value in your
build process.

Obviously that is 12 bytes of extra setup code, which can be
optimized further, depending on the chosen binary layout

There is also testbench directory, where is a simple test of the
current command line encoder.

Quirks

The source pointer for the input stream is not in the beginning
of the source data, but in the certain offset of it. That's
why compressor provides 2 buffers for input. It is meant that
those are concatenated and source pointer is provided between.

Source data will be completely ruined by the decompressor, it
is used as a scratch memory in the ongoing process. For this
reason the input data must reside in R/W page area.

Destination data needs to reside in BSS (or some zero filled
memory block) and this must extend from -9 offset to +1 offset
Although these extra bytes will stay as zeros, they will be read
and written to in the process of decompression.

For the small size modes, there are theoretical inputs which
produce invalid bitstream. Although I have tried to trigger it,
I have yet to stumble on it. You might not be that lucky though,
I would appreciate input in this case.

The assembly code uses the usual tricks + a little bit of
fudging to get better results when compressed itself (i.e. shell
dropping) This and the ifdefs for different modes makes it look
a bit convoluted :) Some documentation is included in the asm
source though.

Comparison

Comparison is the though part. Amount of data used typically is
too short to have any meaningful statistical analysis. Other
than that I would need a lots of different representative files
to make some sort of proper comparison. Even then, because
making 1K/4K intro is a iterative process thus this process
biases the data for the current algorithm used - against any
other, it is almost impossible to say anything for certain.

Still, my feeling is that this compares well against Crinkler,
I won't pretend it would be better, but probably within 1-3%
of the compression ratio of what Crinkler can do. All the other
algorithms* do not even get close (Except PAQ8, but that is not
for 1K/4K). However, there is room for improvement, which I'll
be planning to do when I get more data points...

*) deflate, bzip2, lzw, lzma2

Thanks

Special thanks to Firehawk - Without his help, these projects
would have stayed at the drawing board. (Also thx for the logo)

temisu/oneKpaq