Home | yEnc |
Original yEnc website
Variable (Best) Offset
February, 2002
yEnc v1 uses a fixed offset of 42. This is meant to move characters that tend to occur fewer times on average to the critical values. The result should be that less critical values will need to be escaped. (Each time a critical value occurs it needs to be escaped, adding an extra character in the output increasing the file size).

While having a fixed offset makes the encoding simple to understand and implement, it's obvious that this technique can only hope to perform a little better than average, at best.. and not at all at worst.

Having a variable offset would mean that no more escaping is done than is necessary for each file, ALWAYS. This may not not mean much improvement on average, but as good as eliminates the possibility of the worst case scenario.

Having a variable offset would be easy to implement: just add a "offset=xxx" parameter to the =ybegin line. Calculating the best offset is very easy too.

Here are some examples of file-scans to calculate the best offset. The bar-graph displays how often each byte value (0...255) occurs. Note: The red line is the value 0, the two purple lines are the values 10 and 13. The light green and two darker green lines are the values that are calculated to be the best best values to be mapped onto the afore mentioned critical values. Note: the scale is different on all pics!

JPEG
32MB of jpeg's have been scanned.

The value that occurs the most is 0, at 1.4%. The value that occurs the least is 47. The best offset is -34 (= +222).
CAB
52MB of CAB's have been scanned.

Note that all the values occur equally often (or as good as). A side-effect of compression.

The value that occurs the most is 0, at 0.5%. The value that occurs the least is 16. The best offset is -6 (= +250).
ALL
165MB of various files have been scanned.

Note the small peak in the middle - this is due to the (only) 11MB of WAV's that were part of the files being scanned!

The value that occurs the most is 0, at 5.9%. The value that occurs the least is 33. The best offset is -193 (= +63).
WAV
11MB of WAV's have been scanned.

Note the characteristic graph that's the typical result for WAV files.

The value that occurs the most is 128, at 6.5%. The value that occurs the least is 233. The best offset is -47 (= +209).



lordsnow website Copyright (C) 1999-2002 Bastiaan Ruiter