Ben Laurie blathering

Verifiable Logs: Solving The “Cryptocat Problem”

There’s been a lot of heat about Cryptocat lately. But not much light. In summary, one side says you can’t trust software you download from the ‘net to not reveal all your secrets, and the other side says that’s all we got, so suck it up. So, how do we fix this problem?

First off, lets take a look at the core of the problem: if you download something from the ‘net, how can you be sure what you got is what was advertised? One of the much-lauded benefits of open source is that it can be reviewed – experts can take a look and see whether it really does what it says. So, that deals with half the problem. But how do we know we got what the experts reviewed?

I propose that the answer is publicly verifiable logs. The idea is that anyone can operate a log of “stuff” that can be verified by anyone else. What do I mean by “verified”? I mean that if two people see the log, they can mutually check that they saw the same thing. Of course, this is trivial if you are prepared to send the whole log to each other – just check they’re identical. The trick is to do this verification efficiently.

Luckily we have a way to do that: Merkle Trees. These allow us to summarise the log with a short chunk of binary (the “root”). If we both get the same root, then we both have the same log. What’s more, they also allow an efficient proof that any particular item is in the log – given the item, the log can show a chain of hashes leading to the root. This chain proves that the item actually is in the log summarised by the root.

What’s more, with only a bit more cunningness, we can also efficiently show that any version of the log (with more data appended) contains any prior version. In other words, we can show that the log never deletes anything, but only grows by adding new things at the end.

Got it? To reiterate: it is possible to create a log that can demonstrate that everyone sees the same version, and that as it grows, everyone continues to see the same data added to it. What’s more, these things can be done efficiently[1].

Now we have that piece of machinery, how do we use it to solve the “Cryptocat problem”? Simple: every time Cryptocat does a new release, it pushes a copy of the source into the verifiable log. Every time you download Cryptocat, you verify that the version you are given is in the public log, and refuse to run it if not. And we’re done.

If Cryptocat ever decides to release a version that, say, reveals your keys, or decrypts your chats for a third party, then that version is on display for all to see. Cryptocat will get caught – and likely caught quite quickly. If Cryptocat tries to avoid this publication, then you won’t run it, so you’ll be safe.

Admittedly this does not actually _prevent_ Cryptocat from shafting you, but it does mean it is very unlikely to get away with it, and having done it once, it will probably not get the chance to do it to anyone again…

Note that it doesn’t matter if the author of Cryptocat is the one who made the change, or someone who hacked his site, or a man-in-the-middle. If they do not publish source, then you won’t run it. And if they do publish source, they get caught.

Incidentally, I originally proposed publicly verifiable logs for fixing PKI but they have many uses. Also, for Certificate Transparency, we are implementing a publicly verifiable log. I would be very happy to help with a version for logging software instead of certificates.

[1] To get an idea of what I mean by “efficiently” a proof that two log versions are consistent or that a particular item is in a particular log version consists of log_2(n) hashes, where n is the number of items in the log. So, for a log with a billion items, this proof would have around 30 entries, each, say, 32 bytes long. So, it takes me less than 1 kB for a proof about a log with a billion entries. How about a trillion? Just ten more entries, i.e. under 1,300 bytes.


  1. Maybe I’m missing something or maybe it’s implicit, but should the logs be broadcast to many broadly-distributed (geographically and across different institutions) sites?

    It seems to me that if you download an app from some site X and then you download a log from site X, you can’t be any better off, whether the attacker is in control of X or your communications route to X.

    Comment by Johnicholas — 14 Aug 2012 @ 17:26

  2. Cool idea 😉 I have a few questions,

    I’m assuming that some details around what it means to “push a copy of the source into the verifiable log” were intentionally glossed. But I’m not sure exactly what you intended them to be.

    How does one decide whether to trust the public log? Signed by cryptocat devs? If yes, how do we handle the browser infrastructure around what javascript verifiable log signing keys we trust? And if we have to do something like that to trust the log, which would require much browser work, is there much of an advantage to the (very cool, very clever) merkle tree over modifying browsers to handle signed javascript directly?

    I guess I keep getting a little confused on the threats we’re trying to handle here as well. You point out that this doesn’t prevent anyone from burning users, just from burning users without detection. But if the primary concern w/ web only cryptocat is that MITM is easier than people think because SSL is less secure than people think… I guess I’m just not sure that this is much comfort.

    Comment by Emile — 14 Aug 2012 @ 17:41

  3. Git at is core is an implementation of a very sophisticated system to manage mercle trees of source code which can be signed with GPG keys.

    Comment by Thomas Koch — 14 Aug 2012 @ 19:00

  4. Ben, am I right in thinking that in the case of web-delivered apps, this technique would necessarily require some sort of installed code (i.e. browser extension, plug-in, native function etc.)? ie the “checker” can’t also be part of the js code to be checked?

    Comment by Jack — 15 Aug 2012 @ 14:37

  5. Doesn’t matter anyway. If someone has the skill and access to perform this level of MITM attack how much harder can it be for them to compromise the OS? If they have OS control you can’t trust any code that’s being executed.

    Comment by madman600 — 15 Aug 2012 @ 16:33

  6. #2: Because the log is verifiable, you don’t actually need to trust it. The important thing is that the set of logs needs to be reasonably small and widely known, so that they can be monitored.

    #4: Yes, I am not suggesting that this can be done without browser or OS support.

    Comment by Ben — 15 Aug 2012 @ 17:46

  7. Re: Thomas Koch,

    Yes, I was thinking along those lines too; and started wondering if we couldn’t have things like:

    (Obviously, can’t just use ‘:’ between repo URI, path to file in repo, and revision IDs, but you get the idea.) Then, with some handwaving at all the code signing key trust infrastructure in the browser, that seems like it might give us exactly what we want.

    But, of course, that seems pretty heavy weight to let a browser grab some javascript 😉

    Comment by Emile — 15 Aug 2012 @ 17:49

  8. Doh, right, can’t just type html in as an example and have it show up. Comment seven had a script example where the “src” attribute was (git repo URI):(path to .js in repo):(git revision hash). But it got sanitized to /dev/null by comment system.

    Comment by Emile — 15 Aug 2012 @ 17:52

  9. #6: Oh, I think I was assuming we had a log-per-site kind of thing.

    In fact, the more I think about this, the more I think I didn’t actually understand the proposal.

    What are the nodes in the tree? If part of the point is to never transmit the whole log (tree) around, but rather for each client to construct it themselves as they download new versions of stuff, how is it structured such that clients don’t have to have all versions of the file in order to construct an accurate log? Where are clients getting the external root hash of the log from, in order to verify that the log they are constructing is the same?

    Comment by Emile — 15 Aug 2012 @ 18:11

  10. I don’t understand how browser plugin prevents MITM attack. If attacker has the power to compromise SSL and is running a proxy type MITM setup…they could use the same client code, so how would a browser plugin be able to tell?

    Comment by madman600 — 16 Aug 2012 @ 16:20

  11. This is a great idea, but not necessarily enough to solve “the cryptocat problem”. For traditional applications, sure, a verifiable log + source audits would be enough: the source typically doesn’t include eval(download(some_uri)), and if it does, a source audit could in principle spot it.

    However, one of the complaints about security-critical webapps is the browser security model. In webapps, the same-origin policy means that every piece of Javascript can be hijacked/subverted by any other piece of Javascript from the same domain. So this doesn’t protect you unless it can be made mandatory for entire origins, with no unsigned/unlogged code from the whole domain being executed?

    Comment by smcv — 16 Aug 2012 @ 17:50

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress