Friday, November 23, 2007

Exploiting md5 and other hashing functions collisions for fun and profit

The crytographic hash function MD5 has been broken. In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they describe an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.

More links can also be found on Wikipedia's MD5 page.

SHA-0 has also been broken and The security of SHA-1 has been somewhat compromised by cryptography researchers. Chinese cryptographers showed that SHA-1 is not collision-free. That is, they developed an algorithm for finding collisions faster than brute force. There was also an attack reported in RIPEMD.

Practical applications of md5 collisions:

  • Magnus Daum and Stefan Lucks have created two PostScript files with identical MD5 hash, of which one is a letter of recommendation, and the other is a security clearance.
  • Eduardo Diaz has described a scheme by which two programs could be packed into two archives with identical MD5 hash. A special "extractor" program turn one archive into a "good" program and the other into an "evil" one.
  • Here's a pair of valid X.509 certificates that have identical signatures. The hash function used is MD5.
  • Here's a paper demonstrating a technique for finding MD5 collisions quickly: eight hours on 1.6 GHz computer.
  • Hashclash - Vulnerability of software integrity and code signing applications to chosen-prefix collisions for MD5
  • The Status of MD5 after a recent attack (1996 whitepaper)
The following is an improvement of Diaz's example, which does not need a special extractor. Here are two pairs of executable programs (one pair runs on Windows, one pair on Linux).
  • Windows version:
    • hello.exe. MD5 Sum: cdc47d670159eef60916ca03a9d4a007
    • erase.exe. MD5 Sum: cdc47d670159eef60916ca03a9d4a007
  • Linux version (i386):
    • hello. MD5 Sum: da5c61e1edc0f18337e46418e48c1290
    • erase. MD5 Sum: da5c61e1edc0f18337e46418e48c1290

What does this mean? You should use at least 2 hashing algorithms (RIPEMD-160, Tiger, WHIRLPOOL, SHA-256, SHA-512), as the chances of finding the same collisions in more than 1 hashing algorithm are practically 0.


Cosu said...

md5 is getting a serious beating as time passes. There are a number of hash repositories that can get you the phrase that produces a given hash. Computers are producing via bruteforce more and more phrases.

This is useful when trying to find passwords from a given md5 hash. There are a lot of auth systems that still use just plain md5 to authenticate the user.

Bellow you can find three useful links:

The last one also has an SHA1 repository.
Have fun and don't use such services for evil!

Piku said...

It has already been hacked These guys created 12 PDF files with same MD5.
I recommend using SHA1 (or better).

cmihai said...

2 More Cosu:

Anonymous said...

1 more

cmihai said...

MD5 considered harmful today
Creating a rogue CA certificate: