CRC32 vs MD5 vs ...
In order to detect if a file has been corrupted during transmission what is usually done
is is to calculate a digitial "signature". This is done at the poster's end, and the
signature is transmitted along with the file to the receiver. The receiver also calculates
the file's digital signature and checks if it's the same as the one that the poster
supplied. If they match then the file has come through uncorrupted.
What is CRC32?
CRC stands for Cyclic Redundancy Check. The 32 comes from the fact it calculates a 32-bit
checksum. CRC32 is an algorhytm for calculating a unique identifier for a file. It's used
in programs like PKZIP to identify files and make sure that they are original. (there's 1
in 2^32 chance of two files getting the same CRC32 checksum and being mistaken as being
the same file).
CRC's have the interesting property that they will detect ALL single burst error events
of their size or less regardless of location. MD5 digests do not have that property.
Unfortunately for CRC's, it is a trivial matter to generate a data sequence or modify
an existing one to produce specific CRC's. So where there may be a willful file
alteration.
What is MD5?
Like CRCs, MD5 will produce a resulting value (for MD5 it is called a digest) from a
stream of bits. A MD5 digest is much more unique than a CRC checksum. (2^64 or 2^128??
Anyone?). However, It performs worse as far as error detection is concerned, besides
which it's a lot more complex and it comes at a very huge cost in speed. MD5 is more
intended for use with security.
A free to use implementation of MD5 can be found here:
ftp.funet.fi/pub/crypt/hash/mds/md5/
A positive side-effect of having CRC32 checksums or MD5 digests is that the poster can
use them in the Subject: line, and the downloader can use it to check if he already has
the file or not. (Alternative: use the References: header?)
The good thing about CRC32 is that it's a widely used standard. MD5 seems like a good
alternative, except that it needs more computational power. Which leads me to introduce
"Tiger - A Fast Hash Function" - it's free of restrictions, patent free, about 2 - 3 times
faster than MD5, and has a 192 bit / 24 byte output (128 bit, and 160 bit are also an
option). The white paper and C and Java source can be downloaded from here:
www.cs.technion.ac.il/~biham/Reports/Tiger