06.10.05
The end of MD5.
Bruce Schneier, a security expert, talks about a new set of MD5 collisions generated by two researchers in Bochum. This renders MD5 not safe, i.e. completely useless. A very interesting read indeed.
For those of you who have never heard of MD5 before, a simple explanation is in order. Keep in mind that I am not a cryptography expert, and I am trying to understand these things myself. MD5 is what is known as a hash function. What it does effectively is, given a text, to produce a smaller text, consisting only of 128 characters, called the hash value, which can be computed very fast. The simplest way this can be used is to test if some huge download you just performed went ok. Some distributors offer the MD5 value of the file you are downloading, which they have computed in advance. Then after downloading the file, you can compute the MD5 value of the file you downloaded, and if the two values agree, then you can be almost certain that the file you downloaded is exactly the same as the file that the distributor has in their server.
This is really great, and one can start thinking of all sorts of applications for it. For instance, a friend could send you a file, and via some other route send you the MD5 value for the file. Then you can compute the MD5 value of the file you have received, and if they match up, then you can be pretty certain that no-one tempered with the file on its way to you. The key thing about MD5 is that “its inverse is not computable”, i.e. in plain English, there is no way to produce specific MD5 values on demand, by just doing something clever to the file. In fact, this is so in a very strong sense, it’s not that we don’t know of any way, but that we believe that no-one could ever find a way. Simpler and faster methods than MD5 have been used since the advent of computers, to check that there are no errors during the transmission of data.
The problem that arises though is the following: Since what we did is take the characters in some text, however many they are, and producing 128 characters out of them somehow, there will be lots of texts that give the same set of 128 characters, and hence have the same MD5 value. I.e. the hash-function is not 1-1 as we say. So how do we know that the file we received is not one of those other millions of files that have the same MD5 value? The simple answer is, we don’t. But what we believe is that the chance that this other file will be meaningful is miniscule. In other words, we believe that even if someone were to tamper with our file on its way to us, they would not be able to produce a file that has the same MD5 value and can harm us.
Alas, this is no longer the case for MD5. The above sited document shows two files in PostScript format, a standard printing language, that have the same MD5, but while one of them is an innocuous letter of recommendation, the other one can be used to gain access to secret information. Researchers have known this has been possible for a long time, but no-one before had ever provided such a clear cut example of just how non-secure MD5 really is. What is even more scary, this is not just an individual example, this is a method that can be used in essentially every programming language, and that includes for instance PDF files. This is particularly dangerous, since MD5 is very widely used at this point.
So what can be done? Well, lots of things actually! There are lots of other hash-functions that are still considered secure, Tiger for instance, and SHA-2. One could possibly also apply a number of hash-functions to the same file, as secondary checks. The important thing though is to become aware of these issues, and keep them in mind. The internet is certainly not a safe place, but lots of really smart people are working really hard to make it a safer place.
Later.
Sameer said,
June 10, 2005 at 6:01 pm
I think the next generation of hashes will / should use two different hashing techniques to create a multiple hash. Then the probablity of creating collisions for both hashes will be almost zero.