## MD5 Collisions and Factorisation

In “Primes, or are they?” I produced a pair of numbers with the same hash, one of which was prime and one of which wasn’t. The snag was that the one that wasn’t wasn’t particularly easy to factor, so it took the fun out of it somewhat.

I was musing about this the other day and realised that the answer was to search for a pair where one is prime and the other is easy to factor. Obvious when you think about it. So, I wrote code to find colliding pairs and check for small factors (where “small” means “less than 10 million”). Turns out it isn’t that hard to find numbers with a few small factors, but the more you want, the harder it gets. Much harder quite quickly, it seems.

Rather than bore you with two really big numbers, I thought the statistics on how many factors were more interesting.

90 have 0 small factors

334 have 1 small factors

523 have 2 small factors

508 have 3 small factors

361 have 4 small factors

157 have 5 small factors

75 have 6 small factors

31 have 7 small factors

11 have 8 small factors

3 have 9 small factors

As you can see, generating these collisions is easy. Getting a lot of small factors is harder. Which means, of course, that the non-prime is *still* hard to factor. Damn.

Here’s the formula. If the length of the number is n bits, and the length of the largest prime is m bits, set u = n/m, and the probability that n can be factored completely over the primes of m bits or less is 1 in u^u (that’s u to the u power). Or in other words you have to generate about u^u numbers to get one that is this “smooth”.

Comment by Cyphrpunk — 20 Oct 2005 @ 0:43

One more point. For your example, you were trying to factor a 1152 bit number over a prime base that went up to 24 bit numbers. That mean u = 1152/24 or 48, and the number of times you will have to try is u^u or approximately 5×10^80. You probably don’t have that much time on your hands.

However you don’t need such small primes. Your goal is to find a modulus that lets you compute discrete logs. Using the Pohlig-Hellman algorithm you want it to factor into primes each of which is small enough to let you solve the DL problem modulo that prime. Using straightforward DL algorithms for those sub-problems, like Pollard rho, takes time proportional to the square root of the prime size. If you can afford to spend say 2^40 time to find a discrete log, this lets you use primes of up to 80 bits in size.

For such primes, u = 1152/80 or 14.4, and u^u is about 2^56, so it would take you that many tries to find such a modulus. Again that’s way too long.

If you are willing to take longer on your discrete log, say up to 2^64 time (maybe you’re a super spy agency), you could use primes up to 128 bits and u = 1152/128, for u^u = 400 million which would be doable for such an organization.

If you just want to come up with a factorable number in about 2000 tries, which is about as much time as you took to create your table, that requires a u of about 4.828 and a prime size of 1152/4.828 or 239 bits. That’s about the best you could hope for in terms of smooth moduli, that you might find one all of whose factors are less than 239 bits, if you try 2000 times.

Comment by Cyphrpunk — 21 Oct 2005 @ 0:01

[…] Inspired by a comment on one of my posts, I thought of a better way to produce smooth collisions than searching and factoring. The essence of it is this… […]

Pingback by Links » Smooth(er) MD5 Collisions — 22 Oct 2005 @ 7:03