Ben Laurie blathering

Stupid: A Metalanguage For Cryptography

Various threads lately have got me thinking about implementing cryptography and cryptographic protocols. As I have mentioned before, this is hard. But obviously the task itself is the same every time, by its very nature – if I want to interoperate with others, then I must implement effectively the same algorithm as them. So why do we ever implement anything more than once? There are various reasons, varying as to their level of bogosity. Here’s a few

  • Trust: “I don’t want to trust anyone else’s code”. This, in my view is mostly bogus. If you don’t trust others to write your crypto, then you’ve got some pretty big problems on your hands…
    • You’re likely to be using some pretty heavyweight stuff like SSL and/or X.509, and reimplementing those is a seriously major job. Are you really going to do that?
    • Unless you are also a cryptographer, then you’re trusting the guys that designed the crypto you’re implementing anyway.
    • Ditto protocol desginer.
  • Languages: an implementation in Java isn’t going to work so well in Python. And although its true that you can plug C implementations into almost everything, there are legitimate and not-so-legitimate reasons for not wanting to do so…
    • You don’t trust native code: see above.
    • It makes your distribution hard to build and/or use and tricky to install.
    • You are running on a platform that doesn’t allow native code, for example, a web hosting company, or Google’s App Engine.
    • Native code has buffer overflows and MyFavouriteLanguage doesn’t: true, but probably the least of your worries, given all the other things that you can get wrong, at least if the native code is widely used and well tested.
  • Efficiency: you are not in the long tail of users who’s transactions per second is measured in fractions. In this case, you may well want specialised implementations that exploit every ounce of power in your platform.

Of these, reimplementation for efficiency clearly needs a completely hand-crafted effort. Trust issues are, in my view, largely bogus, but if you really want to go that way, be my guest. So what does that leave? People who want it in their chosen language, are quite happy to have someone else implement it and are not in need of the most efficient implementation ever. However, they would like correctness!

This line of thinking let me spend the weekend implementing a prototype of a language I call “Stupid”. The idea is to create a language that will permit the details of cryptography and cryptographic protocols to be specified unambiguously, down to the bits and bytes, and then compile that language into the language of your choice. Because we want absolute clarity, Stupid does not want to be like advanced languages, like OCaml and Haskell, or even C, where there’s all sorts of implicit type conversions and undefined behaviour going on – it wants it to be crystal clear to the programmer (or reviewer) exactly what is happening at every stage. This also aids the process of compiling into the target language, of course. So, the size of everything wants to be measured in bits, not vague things like “long” or “size_t”. Bits need to be in known places (for example, big-endian). Operations need to take known inputs and produce known outputs. Sign extension and the like do not want to happen magically. Overflow and underflow should be errors, unless you specifically stated that they were not, and so on.

To that end, I wrote just enough compiler to take as input a strawman Stupid grammar sufficient to do SHA-256, and produce various languages as output, in order to get a feel for what such a language might look like, and how hard it would be to implement.

The result is: you can do something rough in a weekend 🙂

Very rough – but it seems clear to me that proceeding down this road with more care would be very useful indeed. We could write all the cryptographic primitives in Stupid, write relatively simple language plugins for each target language and we’d have substantially improved the crypto world. So, without further ado, what does my proto-Stupid look like? Well, here’s SHA-256, slightly simplified (it only processes one block, I was going cross-eyed before I got round to multiple blocks). Note, I haven’t tested this yet, but I am confident that it implements (or can be easily extended to implement) everything needed to make it work – and the C output the first language plugin produces builds just fine with gcc -Wall -Werror. I will test it soon, and generate another language, just to prove the point. In case the code makes your eyes glaze over, see below for some comments on it…

"This code adapted from Wikipedia pseudocode";

"Note 2: All constants in this pseudo code are in big endian";

"Initialize variables";
"(first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):";
uint32 h0 = 0x6a09e667;
uint32 h1 = 0xbb67ae85;
uint32 h2 = 0x3c6ef372;
uint32 h3 = 0xa54ff53a;
uint32 h4 = 0x510e527f;
uint32 h5 = 0x9b05688c;
uint32 h6 = 0x1f83d9ab;
uint32 h7 = 0x5be0cd19;

"Initialize table of round constants";
"(first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):";
array(uint32, 64) k =
(0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2);

"For now, dummy in the message instead of declaring a function wrapper";
"Also, for now, allow enough room in the input for padding, etc, to simplify the loop";
uint32 message_bits = 123;
array(uint8, 64) message =
(0x12, 0x34, 0x56, 0x78, 0x9a, 0xbc, 0xde, 0xf0,
0x0f, 0xed, 0xcb, 0xa9, 0x87, 0x65, 0x43, 0x21);
uint32 pad_byte = 0;
uint32 pad_bit = 0;
uint32 tmp = 0;
uint32 tmp2 = 0;
array(uint32, 16) w = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
uint32 i = 0;
uint32 s0 = 0;
uint32 s1 = 0;
uint32 a = 0;
uint32 b = 0;
uint32 c = 0;
uint32 d = 0;
uint32 e = 0;
uint32 f = 0;
uint32 g = 0;
uint32 h = 0;
uint32 maj = 0;
uint32 t1 = 0;
uint32 t2 = 0;
uint32 ch = 0;

"append the bit '1' to the message";

"note that we're using a 32-bit length for now";
"all the op32, op8 etc are _without_ wrap (where applicable) - i.e. wrap is an error";
"they also require left and right to both be the correct type and size";
"also, we have no precedence, it is up to you to bracket things";
"rshift is with zero padding";

pad_bit = 7 minus32 (message_bits mod32 8);
pad_byte = (message_bits plus32 1) rshift32 8;
message[pad_byte] = message[pad_byte] or8 (1 lshift8 pad_bit);

"append k bits '0', where k is the minimum number >= 0 such that the
resulting message length (in bits) is congruent to 448 (mod 512)";

"eq32 and friends return a boolean value (which is not even a bit)";

if (pad_bit eq32 0) {
pad_bit = 7;
pad_byte = pad_byte plus32 1;
} else {
pad_bit = pad_bit minus32 1;

"bor is like C || (i.e. RHS is only executed if LHS is false)";

"448/8 = 56";
while (((pad_byte mod32 512) ne32 56) bor (pad_bit ne32 7)) {
message[pad_byte] = message[pad_byte] and8 (not8 (1 lshift8 pad_bit));
if (pad_bit eq32 0) {
pad_bit = 7;
pad_byte = pad_byte plus32 1;
} else {
pad_bit = pad_bit minus32 1;

"append length of message (before pre-processing), in bits, as 64-bit big-endian integer";

message[pad_byte] = 0;
message[pad_byte plus32 1] = 0;
message[pad_byte plus32 2] = 0;
message[pad_byte plus32 3] = 0;

message[pad_byte plus32 7] = mask32to8 message_bits;
tmp = message_bits rshift32 8;
message[pad_byte plus32 6] = mask32to8 message_bits;
tmp = message_bits rshift32 8;
message[pad_byte plus32 5] = mask32to8 message_bits;
tmp = message_bits rshift32 8;
message[pad_byte plus32 4] = mask32to8 message_bits;

"for each chunk (we only have one, so don't bother with the loop for now)";

" break chunk into sixteen 32-bit big-endian words w[0..15]";
tmp = 0;
while(tmp ne32 16) {
tmp2 = tmp lshift32 2;
w[tmp] = ((widen8to32 message[tmp2]) lshift32 24)
plus32 ((widen8to32 message[tmp2 plus32 1]) lshift32 16)
plus32 ((widen8to32 message[tmp2 plus32 2]) lshift32 8)
plus32 (widen8to32 message[tmp2 plus32 3]);
tmp = tmp plus32 1;

" Extend the sixteen 32-bit words into sixty-four 32-bit words";
i = 16;
while(i ne32 64) {
s0 = (w[i minus32 15] rrotate32 7) xor32 (w[i minus32 15] rrotate32 18) xor32 (w[i minus32 15] rshift32 3);
s1 = (w[i minus32 2] rrotate32 17) xor32 (w[i minus32 2] rrotate32 19) xor32 (w[i minus32 2] rshift32 10);
w[i] = w[i minus32 16] plus32 s0 plus32 w[i minus32 7] plus32 s1;

" Initialize hash value for this chunk:";

a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
f = h5;
g = h6;
h = h7;

" Main loop:";

i = 0;
while(i ne32 64) {
s0 = (a rrotate32 2) xor32 (a rrotate32 13) xor32 (a rrotate32 22);
maj = (a and32 b) xor32 (a and32 c) xor32 (b and32 c);
t2 = s0 plus32 maj;
s1 = (e rrotate32 6) xor32 (e rrotate32 11) xor32 (e rrotate32 25);
ch = (e and32 f) xor32 ((not32 e) and32 g);
t1 = h plus32 s1 plus32 ch plus32 k[i] plus32 w[i];
h = g;
g = f;
f = e;
e = d plus32 t1;
d = c;
c = b;
b = a;
a = t1 plus32 t2;

" Add this chunk's hash to result so far:";

h0 = h0 plus32 a;
h1 = h1 plus32 b;
h2 = h2 plus32 c;
h3 = h3 plus32 d;
h4 = h4 plus32 e;
h5 = h5 plus32 f;
h6 = h6 plus32 g;
h7 = h7 plus32 h;

"end of outer loop (when we do it)";

"Obviously I can also do this part, but right now I am going cross-eyed";
"Produce the final hash value (big-endian):
digest = hash = h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7";

Notice that every operator specifies the input and output sizes. For example plus32 means add two 32-bit numbers to get a 32-bit result, with wrap being an error (this probably means, by the way, that the last few plus32s should be plus32_with_overflow, since SHA-256 actually expects overflow for these operations). So far we only deal with unsigned quantities; some “overflows” are actually expected when dealing with negative numbers, so that would have to be specified differently. Also, I didn’t deal with the size of constants, because I wasn’t sure about a good notation, though I am leaning towards 23_8 to mean an 8-bit representation of 23 (subscripted, like in TeX).

Because Stupid really is stupid, it should be very easy to write static analysis code for it, enabling checks to be omitted sometimes – for example, the fact that we only subtract 1 from pad_bit if pad_bit is non-zero means that we would not have to check for underflow in that case.

Anyway, I’m a bit bemused after writing a lot of rather repetitive code for the compiler, so I think I’ll wait for reactions before commenting further – but it does seem to me that this is a project worth pursuing. The compiler itself, whilst somewhat crude, particularly since it doesn’t yet do most of the checks I suggest should be there, is pretty small and easily understood: less than 1,500 lines of Perl and YAPP. I won’t bore you with the details, but if you want a peek, here’s a tarball.


  1. I right now call ditto on the inevitable “I program in Stupid” t-shirt design….


    Comment by Pamela Dingle — 24 Jan 2010 @ 21:58

  2. Dibs, I meant. Dibs!!!

    Comment by Pamela Dingle — 24 Jan 2010 @ 21:59

  3. Love the unexpected type conversion into a smiley!
    Yet another hazard for language designers.

    Comment by Andrew Yeomans — 25 Jan 2010 @ 7:56

  4. Recommended reading:

    Comment by Adam Langley — 25 Jan 2010 @ 12:18

  5. This reminds me greatly of the commercial product Cryptol, which used one source language to specify cryptographic primitives that could be compiled to C, optimized for any of several CPUs, or to VHDL for FPGAs. I wonder if Galois ever released an open version…

    Looks like yes:

    Comment by Brian Sniffen — 25 Jan 2010 @ 15:05

  6. Cryptol looks interesting (if rather less low level than Stupid), but it does not appear to be open.

    Comment by Ben — 25 Jan 2010 @ 15:19

  7. in your tarball, sub Stupid::XOr32::emitCode seems to be the same definition as sub Stupid::RShift32::emitCode, with the code looking more like a shift than an xor.

    Comment by benc — 25 Jan 2010 @ 16:35

  8. Thanks for that, fixed. Also added a missing increment in the SHA-256 code.

    Comment by Ben — 25 Jan 2010 @ 17:04

  9. Re. Cryptol, the Galois Cryptol implementation is freely available, but not open source (e.g. see the download link: ).

    The language standard is open source.

    An example of Cryptol, to implement the Skein algorithm,

    Comment by Don Stewart — 26 Jan 2010 @ 4:26

  10. Don: give me a break. That’s not “freely available”. It doesn’t produce output. It is limited to 1 year. I can’t talk about it if I use it.

    Comment by Ben — 27 Jan 2010 @ 16:16

  11. This initiative is interesting, but I think the problem is more important than that, it is not only to emit code for a given cipher or cryptosystem to a target language but it is also to generate or interface with a reliable low level library providing common arithmetic operations (because you can’t trust arithmetic operations implemented by a given language), this library would likely be independent of the target language.

    Comment by seb — 26 Jan 2010 @ 12:15

  12. But iw won’t help much the guys who want to rewrite something for performance reasons — code generated by Stupid won’t have the performance they want and there are no facilities to check that given transformations of output do not change the meaning.

    I think that they’d be helped more by a C compiler which could accept some sort of hints regarding code generation (ie. so that I would be able to say f.e.: this condition is equivalent to overflow in that operation; use the overflow flag).

    Comment by Robert Obryk — 28 Jan 2010 @ 11:57

  13. Robert: I said it is not aimed at people who need performance, however, it is clear that you could write an optimiser for Stupid.

    Comment by Ben — 28 Jan 2010 @ 12:01

  14. […] can see the amusement I can derive from Stupid is going to be endless. If somewhat […]

    Pingback by Links » Stupid Haskell, Google Code — 30 Jan 2010 @ 16:40

  15. We used to have this; it was called “machine language”. Even “assembler language” was OK until we added macros to it. If I were writing crypto code I’d want to do it in assembler (without macros) or in something like Stupid.

    I’m tempted to like the idea of this initiative, but what’s nagging at the back of my mind is that the name may be too appropriate; if it facilitates the creation of crypto implementations by people who really aren’t smart enough to produce them, it’s a bad thing.

    Still, if it makes stupid mistakes by skilled implementors less likely, it’s a good thing.

    Maybe the right approach is to open source everything but the code generator (including a really slow emulator), and provide code generation as a cloud service at $10,000 per invocation, to drive off the unserious.

    Comment by Bob Blakley — 30 Jan 2010 @ 21:07

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress