Network Working GroupB. Laurie
Internet-DraftA. Langley
Expires: September 2, 2012E. Kasper
 March 1, 2012


Certificate Transparency
certificate-transparency-draft

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on September 2, 2012.

Copyright Notice

Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



1.  Design

The aim of Certificate Transparency is to have every issued public end-entity TLS certificate recorded in one or more certificate logs. In order to detect mis-issuance of certificates, all logs are public. In particular, domain owners will be able to monitor logs for certificates issued on their own domain.

In order to protect clients from unlogged certificates, logs provide audit proofs of all recorded certificates, and clients can choose not to trust certificates that are not accompanied by an appropriate audit proof. Audit proofs are embedded in the TLS handshake to for privacy and performance reasons.

In order to ensure the correct behaviour of logs, logs also provide a global audit proof over the entire log. Any misbehaviour of logs can be detected through cross-checks on the global audit proof.

Logs are only expected to certify that they have seen a certificate and thus, we do not specify any revocation mechanism for audit proofs in this document. Logs will be append-only, and audit proofs will be valid indefinitely.



2.  Cryptographic components



2.1.  Merkle Hash Trees

Logs use binary Merkle hash trees for efficient audit proofs. The hashing algorithm is SHA-256. The input to the Merkle tree hash is a list of data entries; these entries will be hashed to form the leaves of the Merkle hash tree. The output is a single 32-byte root hash. Given an ordered list of n inputs, D[0:n] = {d(0), d(1), ..., d(n-1)}, the Merkle Tree Hash (MTH) is thus defined as follows:

The hash of an empty list is the hash of an empty string:

MTH({}) = SHA-256().

The hash of a list with one entry is:

MTH({d(0)}) = SHA-256(0 || d(0)).

For n > 1, let k be the largest power of two smaller than n. The Merkle Tree Hash of an n-element list D[0:n] is then defined recursively as

MTH(D[0:n]) = SHA-256(1 || MTH(D[0:k]) || MTH(D[k:n])),

where || is concatenation and D[k1:k2] denotes the length (k2 - k1) list {d(k1), d(k1+1),..., d(k2-1)}.

Note that we do not require the length of the input list to be a power of two. The resulting Merkle tree may thus not be balanced, however, its shape is uniquely determined by the number of leaves. This Merkle tree is essentially the same as the history tree proposal except our current definition omits dummy leaves.



2.1.1.  Merkle audit paths

A Merkle audit path for a leaf in a Merkle hash tree is the list of all additional nodes in the Merkle tree required to compute the Merkle Tree Hash for that tree. If the root computed from the audit path matches the true root, then the audit path is proof that the leaf exists in the tree.

Given an ordered list of n inputs to the tree, D[0:n] = {d(0), ..., d(n-1)}, the Merkle audit path PATH(m, D[0:n]) for the (m+1)th input d(m), 0 <= m < n, is defined as follows:

The path for the single leaf in a tree with a one-element input list D[0:1] = {d(0)} is empty:

PATH(0, {d(0)}) = {}

For n > 1, let k be the largest power of two smaller than n. The path for the (m+1)th element d(m) in a list of n > m elements is then defined recursively as

PATH(m, D[0:n]) = PATH(m, D[0:k]) : MTH(D[k:n]) for m < k; and

PATH(m, D[0:n]) = PATH(m - k, D[k:n]) : MTH(D[0:k]) for k <= m < n,

where : is concatenation of lists and D[k1:k2] denotes the length (k2 - k1) list {d(k1), d(k1+1),..., d(k2-1)} as before.



2.1.2.  Merkle consistency proofs

Merkle consistency proofs prove the append-only property of the tree. A Merkle consistency proof for a Merkle Tree Hash MTH(D[0:n]) and a previously advertised hash MTH(D[0:m]) of the first m leaves, m <= n, is the list of nodes in the Merkle tree required to verify that the first m inputs D[0:m] are equal in both trees. Thus, a consistency proof must contain a set of intermediate nodes (i.e., commitments to inputs) sufficient to verify MTH(D[0:n]), such that (a subset of) the same nodes can be used to verify MTH(D[0:m]). We define an algorithm that outputs the (unique) minimal consistency proof.

Given an ordered list of n inputs to the tree, D[0:n] = {d(0), ..., d(n-1)}, the Merkle consistency proof PROOF(m, D[0:n]) for a previous root hash MTH(D[0:m]), 0 < m < n, is defined as PROOF(m, D[0:n]) = SUBPROOF(m, D[0:n], true):

The subproof for m = n is empty if m is the value for which PROOF was originally requested (meaning that the subtree root hash MTH(D[0:m]) is known):

SUBPROOF(m, D[0:m], true) = {}

The subproof for m = n is the root hash committing inputs D[0:m] otherwise:

SUBPROOF(m, D[0:m], false) = {MTH(D[0:m])}

For m < n, let k be the largest power of two smaller than n. The subproof is then defined recursively.

If m <= k, the right subtree entries D[k:n] only exist in the current tree. We prove that the left subtree entries D[0:k] are consistent and add a commitment to D[k:n]:

SUBPROOF(m, D[0:n], b) = SUBPROOF(m, D[0:k], b) : MTH(D[k:n]).

If m > k, the left subtree entries D[0:k] are identical in both trees. We prove that the right subtree entries D[k:n] are consistent and add a commitment to D[0:k].

SUBPROOF(m, D[0:n], b) = SUBPROOF(m - k, D[k:n], false) : MTH(D[0:k]).

Here : is concatenation of lists and D[k1:k2] denotes the length (k2 - k1) list {d(k1), d(k1+1),..., d(k2-1)} as before.

The number of nodes in the resulting proof is bounded above by ceil(log2(n)) + 1.



2.1.3.  Example

The binary Merkle tree with 7 leaves:

            hash
           /    \
          /      \
         /        \
        /          \
       /            \
      k              l
     / \            / \
    /   \          /   \
   /     \        /     \
  g       h      i      j
 / \     / \    / \     |
 a b     c d    e f     d6
 | |     | |    | |
d0 d1   d2 d3  d4 d5

The audit path for d0 is [b, h, l].

The audit path for d3 is [c, g, l].

The audit path for d4 is [f, j, k].

The audit path for d6 is [i, k].

The same tree, built incrementally in four steps:

    hash0          hash1=k
    / \              /  \
   /   \            /    \
  /     \          /      \
  g      c         g       h
 / \     |        / \     / \
 a b     d2       a b     c d
 | |              | |     | |
d0 d1            d0 d1   d2 d3

          hash2                    hash
          /  \                    /    \
         /    \                  /      \
        /      \                /        \
       /        \              /          \
      /          \            /            \
     k            i          k              l
    / \          / \        / \            / \
   /   \         e f       /   \          /   \
  /     \        | |      /     \        /     \
 g       h      d4 d5    g       h      i      j
/ \     / \             / \     / \    / \     |
a b     c d             a b     c d    e f     d6
| |     | |             | |     | |    | |
d0 d1   d2 d3           d0 d1   d2 d3  d4 d5

The consistency proof between hash0 and hash is PROOF(3, D[0:7]) = [c, d, g, l]. c, g are used to verify hash0, and d, l are additionally used to verify hash.

The consistency proof between hash1 and hash is PROOF(4, D[0:7]) = [l]. hash can be verified, using hash1=k and l.

The consistency proof between hash2 and hash is PROOF(6, D[0:7]) = [i, j, k]. k, i are used to verify hash1, and j is additionally used to verify hash.



3.  Log Format



3.1.  Log Entries

Certificate owners may submit each leaf certificate to certificate logs for public auditing. Each submitted certificate must be accompanied by all additional certificates required to verify the certificate. The root certificate may be omitted.

Logs may verify the leaf certificate, using their set of trusted roots. Logs shall periodically publish received certificates. Logs may refuse to publish certificates they cannot validate.

Each certificate entry in the log MUST include the following components:

enum { x509_chain_entry(0), (65535) } LogEntryType;

struct {
    LogEntryType type;
    select (type) {
    case x509_chain_entry: X509ChainEntry;
    } entry;
} LogEntry;

struct {
    ASN.1Cert leaf_certificate;
    ASN.1Cert certificate_chain<0..2^24-1>;
} X509ChainEntry;

[benl: Should we consider a small number for the certificate_chain? Or perhaps note that a log can impose a limit?]

leaf_certificate is the end-entity certificate submitted for auditing.

certificate_list is the collection of additional certificates required to verify the leaf certificate. This collection MUST be ordered by the content of each certificate, read first byte first, in ascending order. [agl: why aren't we requiring them to be in verification order? It would make the logs simplier which is a good thing.]



3.2.  Log Segments

A certificate log MUST periodically publish all new log entries as a log segment. The log MUST sign these entries by constructing a binary Log Segment Merkle Hash Tree with log entries as consecutive inputs to the tree, and signing the root of that tree. The hashing algorithm is SHA-256. The signature algorithm is ECDSA with the curve NIST P-256.

Log segments MUST have an incrementing sequence number. The first sequence number MUST be zero and the increment MUST be one. At every update, logs MUST sign a secondary Segment Info Merkle Hash Tree of all log segment signatures issued so far, ordered in the tree according to the sequence number.

Structure of the signature:

enum { log_segment(0), segment_info(1), (255) } TreeType;
digitally-signed struct {
    TreeType tree_type;
    uint32 sequence_number;
    select (tree_type) {
        case log_segment: uint32 tree_size;
        case segment_info: struct { };
    }
    opaque sha256_root_hash[32];
} MerkleTreeSignature;

The encoding of the digitally-signed element is defined in [RFC5246] (Dierks, T. and E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.2,” August 2008.).

tree_type is the type of the Merkle tree (either a Log Segment tree, or a Segment Info tree).

sequence_number is the number of the corresponding segment.

tree_size is the number of leaves in the tree. In the case of a SegmentInfo signature, tree size equals sequence_number + 1 and is thus omitted.

sha256_root_hash is the root of the Merkle Hash Tree.

For every segment, logs MUST publish all the log entries corresponding to that segment, as well as a segment info record containing the signatures on the Log Segment Merkle Hash Tree and the current Segment Info Merkle Hash Tree.

Structure of the segment info record:

struct {
    uint32 sequence_number;
    uint32 timestamp;
    uint32 segment_size;
    MerkleTreeSignature signed_log_segment_tree;
    MerkleTreeSignature signed_segment_info_tree;
} SegmentInfo;

sequence_number is the number of the segment.

segment_size is the size of the segment, i.e., the number of leaves in the log segment tree.

signed_log_segment_tree is the signature on the Log Segment Merkle Hash Tree.

signed_segment_info_tree is the signature on the current Segment Info Merkle Hash Tree.



3.3.  Audit Proofs

It is possible to audit the entire log by computing the root_hash values corresponding to individual segment trees, reconstructing the current Segment Info Merkle Hash Tree and verifying the signed_segment_info_tree signature. We rely on cross-checks of this signature between auditors to verify that their views of the log are consistent.

Additionally, logs provide two types of audit proofs for partial checks.

Logs MUST provide servers with an audit proof for each logged certificate. Servers may present audit proofs from one or more logs to the client together with the certificate. Clients may reject certificates that do not have a desired number of valid audit proofs. [ben: can anyone request these proofs? don't see why not]

Logs MUST also provide audit proofs for the Segment Info Merkle Hash Tree. Clients may request such audit proofs for a segment to verify that the segment appears in the global log.

Structure of the audit proof:

struct {
    opaque sha256_hash[32];
} MerkleNode;
struct {
    uint32 sequence_number;
    select (TreeType) {
        case log_segment: uint32 tree_size;
        case segment_info: struct { };
    }
    uint32 leaf_index;
    MerkleTreeSignature signed_tree;
    MerkleNode audit_path<0..2^16-1>;
} AuditProof;

[ekasper: If we want to allow other hash functions than sha2, the hash function of the audit_path can be read from the SignatureAndHashAlgorithm field in signed_tree. (We require the two hash functions to always be the same.)]

sequence_number is the segment number associated with the tree.

tree_size is the number of leaves in the tree. In the case of a SegmentInfo AuditProof, tree size equals sequence_number + 1 and is thus omitted.

leaf_index is the index of the audited node in the Merkle tree, from 0 to tree_size - 1.

audit_path is a list of additional nodes in the Merkle tree required for reconstructing the root hash. Nodes must be listed from leaf to root level, i.e., in the order they are used in the Merkle Tree Hash computation, as defined in Section 2.1.1 (Merkle audit paths). Notice that the left-right ordering is determined by the position of the leaf. The leaf node under audit as well as the root node shall be omitted from the path.

signed_tree is the signature on the tree.

Servers who send an audit proof for a certificate must also serve all intermediate certificates recorded in the certificate_list of the X509ChainEntry. Clients who verify the certificate audit proof use these certificates to reconstruct the leaf node of the Log Segment Merkle Hash Tree. The leaf node, leaf_index and audit_path together determine the root hash of the tree. A valid audit proof MUST satisfy the following:

Clients may construct a different certificate chain than the one recorded in the X509ChainEntry to verify the validity of the leaf certificate. Clients who do so must additionally check that the public key sequence of the constructed chain matches the public key sequence in the LogEntry certificate_list, with only two exceptions [agl: here we're talking about sequences of public keys, but we ordered the certificates by contents, destroying the sequence]:



4.  Efficiency Considerations

The two-tier Merkle proof design serves the purpose of keeping communication overhead low. Logs should choose an appropriate refresh rate. The log refresh period should be short enough to keep certificate issuance latency low and certificate audit proofs small. At the same time, it should be long enough to make integrity cross-checks efficient (clients should not have to query and compare segment audit proofs for new segments too often).

Auditing the log for integrity does not require third parties to maintain a copy of the entire log. The Segment Info Merkle Hash Tree root hash can be updated incrementally as new segments become available, without recomputing the entire tree. Third party auditors need only store a logarithmic number of intermediate nodes in the Segment Info Merkle Hash Tree.

Additionally, the Merkle consistency proofs defined in Section 2.1.2 (Merkle consistency proofs) can be used to efficiently prove the append-only property of an incremental update to the Segment Info Merkle Hash Tree, without auditing the entire tree. [ekasper: are logs required to provide these proofs?]



5. References

[RFC5246] Dierks, T. and E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.2,” RFC 5246, August 2008.


Authors' Addresses

  Ben Laurie
  
  Adam Langley
  
  Emilia Kasper