There’s been a lot of heat about Cryptocat lately. But not much light. In summary, one side says you can’t trust software you download from the ‘net to not reveal all your secrets, and the other side says that’s all we got, so suck it up. So, how do we fix this problem?

First off, lets take a look at the core of the problem: if you download something from the ‘net, how can you be sure what you got is what was advertised? One of the much-lauded benefits of open source is that it can be reviewed – experts can take a look and see whether it really does what it says. So, that deals with half the problem. But how do we know we got what the experts reviewed?

I propose that the answer is publicly verifiable logs. The idea is that anyone can operate a log of “stuff” that can be verified by anyone else. What do I mean by “verified”? I mean that if two people see the log, they can mutually check that they saw the same thing. Of course, this is trivial if you are prepared to send the whole log to each other – just check they’re identical. The trick is to do this verification efficiently.

Luckily we have a way to do that: Merkle Trees. These allow us to summarise the log with a short chunk of binary (the “root”). If we both get the same root, then we both have the same log. What’s more, they also allow an efficient proof that any particular item is in the log – given the item, the log can show a chain of hashes leading to the root. This chain proves that the item actually is in the log summarised by the root.

What’s more, with only a bit more cunningness, we can also efficiently show that any version of the log (with more data appended) contains any prior version. In other words, we can show that the log never deletes anything, but only grows by adding new things at the end.

Got it? To reiterate: it is possible to create a log that can demonstrate that everyone sees the same version, and that as it grows, everyone continues to see the same data added to it. What’s more, these things can be done efficiently[1].

Now we have that piece of machinery, how do we use it to solve the “Cryptocat problem”? Simple: every time Cryptocat does a new release, it pushes a copy of the source into the verifiable log. Every time you download Cryptocat, you verify that the version you are given is in the public log, and refuse to run it if not. And we’re done.

If Cryptocat ever decides to release a version that, say, reveals your keys, or decrypts your chats for a third party, then that version is on display for all to see. Cryptocat will get caught – and likely caught quite quickly. If Cryptocat tries to avoid this publication, then you won’t run it, so you’ll be safe.

Admittedly this does not actually _prevent_ Cryptocat from shafting you, but it does mean it is very unlikely to get away with it, and having done it once, it will probably not get the chance to do it to anyone again…

Note that it doesn’t matter if the author of Cryptocat is the one who made the change, or someone who hacked his site, or a man-in-the-middle. If they do not publish source, then you won’t run it. And if they do publish source, they get caught.

Incidentally, I originally proposed publicly verifiable logs for fixing PKI but they have many uses. Also, for Certificate Transparency, we are implementing a publicly verifiable log. I would be very happy to help with a version for logging software instead of certificates.

[1] To get an idea of what I mean by “efficiently” a proof that two log versions are consistent or that a particular item is in a particular log version consists of log_2(n) hashes, where n is the number of items in the log. So, for a log with a billion items, this proof would have around 30 entries, each, say, 32 bytes long. So, it takes me less than 1 kB for a proof about a log with a billion entries. How about a trillion? Just ten more entries, i.e. under 1,300 bytes.