Re: [SLUG] Are md5sums errors frequent?

From: Ian C. Blenke (ian@blenke.com)
Date: Sun Jul 23 2006 - 17:26:21 EDT


Eben King wrote:
> An md5sum has what, 32 hex digits? That's 128 bits, so there can only
> be 2^32 (about 4.3 billion) different md5sums. So the odds of two
> different files having the same md5sum is 1/(2^32).

In binary, 128bits means 2^128 possible bit combinations, by definition.

In decimal, a 128-bit hash can have 3.4 x 10^38 possible values, which is:

    340,282,366,920,938,463,463,374,607,431,768,211,456 possible hashes

That's a _very_ big number.

There are other hashes out there as well, like SHA1 (160 bit). You
probably have "sha1sum" on your box.

An IP address, by comparision, is a 4 "hex digit" number with 32bits.
That means there are 2^32 possible IPs (about 4.3 billion), though the
actual yield is far smaller than this due to subnetting, netblock
assignments, and various other human imposed limitations.

>> I assumed that the odds of a wrong md5sum on a legitmate downloaded
>> file, where there were no apparent problems during the download,
>> would be something in the realm of winning the lottery.
>
> Well, a small lottery. ISTR the Florida Lotto (pick 6 balls out of
> 40) is 13 billion or so combinations.

It's actually _far_ less likely than that. It's like the lottery winners
of lottery winners of lottery winners...

OTOH, both MD5 and SHA1 have been shown to be somewhat vulnerable to
hash collision attempts, though neither have been "broken" per se.

    http://en.wikipedia.org/wiki/SHA-1
    http://en.wikipedia.org/wiki/MD5

As computing progresses and algorithms are weakened, increasing the bit
lengths to increase compute times is the only way to address these
issues (barring the creation and adoption of new hash algorithms).
SHA256 is generally recommended over SHA1 at the moment, and MD5 is
generally discounted for cryptographic use if it can be avoided.

If you're really paranoid, use two (or more) different hashing
algorithms. It's astronomically more unlikely that two hashing
algorithms will return the correct hash pair for differing inputs.

- Ian C. Blenke <ian@blenke.com> http://ian.blenke.com/



-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 14:55:56 EDT