Home | yEnc |
Original yEnc website
CRC32 vs MD5 vs ...
February, 2002
In order to detect if a file has been corrupted during transmission what is usually done is is to calculate a digitial "signature". This is done at the poster's end, and the signature is transmitted along with the file to the receiver. The receiver also calculates the file's digital signature and checks if it's the same as the one that the poster supplied. If they match then the file has come through uncorrupted.

What is CRC32?
CRC stands for Cyclic Redundancy Check. The 32 comes from the fact it calculates a 32-bit checksum. CRC32 is an algorhytm for calculating a unique identifier for a file. It's used in programs like PKZIP to identify files and make sure that they are original. (there's 1 in 2^32 chance of two files getting the same CRC32 checksum and being mistaken as being the same file).

CRC's have the interesting property that they will detect ALL single burst error events of their size or less regardless of location. MD5 digests do not have that property. Unfortunately for CRC's, it is a trivial matter to generate a data sequence or modify an existing one to produce specific CRC's. So where there may be a willful file alteration.
What is MD5?
Like CRCs, MD5 will produce a resulting value (for MD5 it is called a digest) from a stream of bits. A MD5 digest is much more unique than a CRC checksum. (2^64 or 2^128?? Anyone?). However, It performs worse as far as error detection is concerned, besides which it's a lot more complex and it comes at a very huge cost in speed. MD5 is more intended for use with security.

A free to use implementation of MD5 can be found here: ftp.funet.fi/pub/crypt/hash/mds/md5/

A positive side-effect of having CRC32 checksums or MD5 digests is that the poster can use them in the Subject: line, and the downloader can use it to check if he already has the file or not. (Alternative: use the References: header?)

The good thing about CRC32 is that it's a widely used standard. MD5 seems like a good alternative, except that it needs more computational power. Which leads me to introduce "Tiger - A Fast Hash Function" - it's free of restrictions, patent free, about 2 - 3 times faster than MD5, and has a 192 bit / 24 byte output (128 bit, and 160 bit are also an option). The white paper and C and Java source can be downloaded from here: www.cs.technion.ac.il/~biham/Reports/Tiger



lordsnow website Copyright (C) 1999-2002 Bastiaan Ruiter