Links

Ben Laurie blathering

27 Jan 2008

Open Source Is Just Economics

Filed under: General,Open Source,Programming — Ben @ 5:39

A number of conversations I have had recently indicate to me that a lot of the world still doesn’t get what’s behind open source. It’s easy: economics.

The first thing you can trivially explain is why people work on open source at all. This has been a source of a vast amount of speculation, particularly irritatingly by sociologists. Ben Hyde has a fantastic list to which I will only add the explanation I love to hate: geek pride. We do it just to show off to each other.

Nope, it’s all bollocks – the motivation is simple: by solving your common problem together, you reduce your costs. There is absolutely no point in financing five different companies to produce five different products that don’t quite do what you want – far better to tweak the open source thing to do exactly what you need (often expressed as “scratching your itch” around the ASF).

Some people whine that, because this is an option open only to geeks, open source is not really available to completely open participation. Well, kinda. If you aren’t a geek yourself, you can always hire one. What do you mean, you don’t want to spend your money on free stuff? Why not? We all spend our time on it. Time that we could convert into money, if we so chose.

So why don’t we? Because participating in the open source projects we participate in is worth more to us, in purely monetary terms, in the long run. This is why I no longer have much to do with Apache: it does what I need. I have no itch to scratch.

This leads me into the second easily explainable fact. People complain that open source projects don’t care about users. It’s true. They don’t – they care about people who are participating in the costs of producing the software. If you aren’t contributing, why would your voice matter?

Of course, you have to be careful when applying these obvious truths to what you see around you. For example, the presence of companies like Red Hat in the market complicates analysis. They have their own set of economic drivers, including the needs of their customers, which they then apply to the calculation around their participation in various projects. As the reach of open source extends, so do end users actually start to get an indirect say in what happens. But it costs them. Money.

Back in the good old days, it was so much simpler. All it cost me then was time.

25 Jan 2008

Caja, Shindig and OpenSocial

Filed under: Open Source,Programming,Security — Ben @ 23:07

Its been a while since I wrote about Caja but we’ve been working hard on it and it has come along in leaps and bounds, thanks to my excellent team at Google.

Today I’m very pleased to be able to point you at a test gadget container which supports Cajoling of gadgets. This is based on the open source OpenSocial container, Shindig.

Here’s the announcement, and there’s also some documentation on how to get things working with Caja. We’ve even included a couple of malicious gadgets which are defeated by Caja.

Feedback, as always, welcome.

19 Jan 2008

Deputy, Delta and Type Checking in C

Filed under: Programming,Security — Ben @ 19:28

Another thing I never write about but am very interested in is static analysis. For the non-geeks amongst my readers, static analysis is all about looking at code to see what you can figure out about it. For example, you might try to find input values that cause a buffer overflow. Or you might check to see that strings are correctly escaped before being posted to a Web page (that is, the bug that is at the heart of cross-site scripting has been avoided).

Of course static analysis is usually done by programs, perhaps with the assistance of the programmer, rather than by people, so I am always on the look out for new approaches and new software. Unfortunately, as in many areas of academia, the gap between theory and practice is rather large so I do not find myself exactly overwhelmed with choice.

So far the only thing that I’ve been even a little happy with is Coverity. This still gets it wrong about half the time, but that’s a pretty tolerable ratio given how painful a manual audit would be. In contrast, some of the other tools I have tried over the years have false positive rates well over 99.9%.
Most of them just plain don’t work. Pretty much all of them are not supported. And those that are, like Coverity, cost a fortune.

If I was not a convert to the cause of static analysis, I would despair. As it is, I do occasionally feel tempted to sit down for a year or two and tackle the problem myself but sanity soon prevails and I put that idea off for another decade. So, I was happy to come across Deputy recently, after an animated thread or two on the Robust Open Source mailing list (which I am shocked to discover I have been on since it started, way back in 1998 – archives here and here).

Deputy attempts to provide type safety in C programmes. This is, of course, impossible … but it has a good attempt at it. Although ordinary programmers might not think so, to the academic type safety means enforcing things like array lengths, so our favourite C security problem, the buffer overflow, would be a thing of the past if we had typesafe programs.

Anyone who had read the code I wrote for safe stacks in OpenSSL or the module API in Apache 2.0 will know that I am a big fan of type safety in C. Both of these try to ensure that if you get confused about what type you should be using, you will get a compile-time failure. Unfortunately C provides the programmer with a plethora of ways to both deliberately and accidentally avoid any safety nets you might put out for him. The idea behind Deputy is to make it possible to do the type checking rather more rigorously. In order to allow this, you have to provide deputy with extra clues.

The syntax is a little idiosyncratic, but generally the annotation is quite straightforward, for example

void * (DALLOC(n) malloc)(size_t n);

would tell it that malloc is a memory allocator that returns n bytes of memory. Deputy catches many errors at compile time, but those it can’t it will attempt to catch at runtime instead, by injecting extra code to make sure pointers stay within bounds, for example. I haven’t got that far, though, because my benchmark for these projects is to use them on something real, like OpenSSL. I am pleased to report, though, that Deputy has so far built several OpenSSL source files without driving me completely crazy. But more on that later.

In the course of using Deputy I have been reminded of two things worth mentioning in passing. One is a trick we use in OpenSSL to do type checking. If you want to ensure that something is of type T, then you can write this

(1 ? x : (T)0)

weird, huh? How it works is that both sides of a ? : operator must have the same type, so if x is not of type T, then you will get a compile-time error. Very handy in macros, especially where you are abusing types heavily – for example when you are implementing a generic stack, but you wan to ensure that any particular stack consists only of one type of object (see safestack.h in OpenSSL for an example).

The other is delta. Delta is a very cute tool that cuts down a file with an “interesting” feature to a smaller one with the same feature. For example, suppose (as happened to me) I have an error that I can’t reproduce in a small example. Now what? Delta to the rescue. Today I had a problem with Deputy wanting me to add extra annotation that seems unnecessary. Small examples of essentially the same code did not show the same issue. What do do? Delta reduced the original offending source from 2424 lines to just 18 that produce the same bug. And it did it in about 5 minutes.

For interest, here are the 18 lines

typedef unsigned int __uint32_t;
typedef __uint32_t __size_t;
typedef __size_t size_t;
void *malloc(size_t);
void *memcpy(void * , const void * , size_t);
# 77 "mem.c"
static void * (DALLOC(n) *malloc_func)(size_t n) = malloc;
static void *default_malloc_ex(size_t num, const char *file, int line) {
return malloc_func(num);
}
static void *(*malloc_ex_func)(size_t, const char *file, int line) = default_malloc_ex;
void *CRYPTO_realloc_clean(void *str, int old_len, int num, const char *file, int line) {
void *ret = ((void *)0);
ret=malloc_ex_func(num,file,line);
if(ret) {
memcpy(ret,str,old_len);
}
}

Funnily enough delta was created to assist in debugging another static analysis system, Oink. So far, I’ve never used it for anything else.

13 Jan 2008

Be Careful With The Social Graph

Filed under: Anonymity/Privacy,Identity Management,Security — Ben @ 19:50

Bob Blakley is concerned that if we open up the social graph, then we’ll kill social networking (if I were you I’d skip the rather complicated and irrelevant analogy he kicks off with: to mangle my friend Jennifer Granick‘s oft-given advice, we should talk about the thing itself and not what it is like). His core point is that its not OK for Scoble to move his relationship data from one system to another because he doesn’t own that data – it is jointly owned by him and those with whom he has relationships.

Whilst I agree that it may not be OK to move such data around, I think Bob is wrong about the details. Plus he picked a terrible example: it hardly matters what Scoble did with his friends list because anyone can already see it.

And this precisely illustrates what seems most important to me: when I share social data, I do so under certain conditions, both explicit and implict. What I care about, really, is that those conditions continue to be met. I don’t really mind who does the enforcing, so long as it is enforced. So, it seems to me that its OK to create the social graph, you just have to be exceedingly careful what you do with it.

This presents two, in my view, enormous technical challenges. The first is dealing with a variety of different conditions applying to different parts of the graph. Even representing what those conditions are in any usable way is a huge task but then you also need to figure out how to combine them, both when multiple conditions apply to the same piece of data (for example, because you figured it out twice in different ways) or when the combination of various pieces of data, each with its own conditions, yield something new.

Once you’ve done that you are faced with a much larger problem: working out what the implicit conditions were and enforcing those, too. The huge adverse reaction we saw to Facebook’s Beacon feature shows that such implicit conditions can be unobvious.

Anyway, the bottom line is that those in favour of the social graph tend to see it as some nodes, representing people, and edges, representing relationships. What they ignore is the vast cloud of intertwined agreements and understandings woven around all those edges and nodes. But those are absolutely vital to the social graph. Without them, as Bob says

Opening the social graph will destroy social networks, and turn them into sterile public spaces in which formation of meaningful and intimate relationships is not possible.

So, by all means, open the social graph but do it really carefully.

One thing I’ll note in passing: it is very common, in human relationships, to reveal far more than you are supposed to – under condition that the recipient of the revelation maintains absolute secrecy about it. For example, everyone knows that Alice is bonking Bob except Alice’s husband and Bob’s wife. This is because a series of “absolute secrecy” conditions and careful thought have neatly partitioned the world with respect to this piece of information. Usually. Should a good social graph emulate this?

12 Jan 2008

Me-ville Versus The Global Village

Thanks to Adriana, I just came across an intriguing post on VRM. In it, two completely different versions of VRM are presented (he thinks he presented four, but I claim that the “vendor control” end of the spectrum is CRM, not VRM).

In Me-Ville, everything is anonymous and reputation/value-based. In the Global Village, its all about long-term relationships. I think this divide is interesting and sums up the differences in the approach taken by techies, like Alec Muffett and me versus the approach the fluffier, social people like Adriana Lukas and Doc Searls would like to take.

Who’s right? Well, normally I’d say I am, but I’m not sure I really know in this case. But recognition is the first step towards reconciliation.

7 Jan 2008

Presence is a Privacy Problem

Filed under: Anonymity/Privacy,Crypto,Security — Ben @ 20:16

I don’t know why I’ve never written about this before. One thing that’s always bugged me about instant messaging is that I can’t choose who sees my presence and who doesn’t. As a result, I don’t advertise presence, as people who IM with me will know.

Why do I care? Mostly because I am being a purist. But the purist point is this: by my presence information I give away information that can be correlated across channels. To take Kim Cameron’s favourite example, if my alter ego LeatherBoy always comes online at the same time as me, someone who can view both alter egos can eventually make the correlation. There are other channels – for example if LeatherBoy is always online when I buy something at Amazon, then, again, one can start to entertain the notion that we are the same.

There are people I wouldn’t mind assisting in organising their time by advertising my presence to. And probably others to whom I’d like it to be fabricated. But I can’t do that. IM is broken.

I did toy with turning it on, but with the definition of idle turned up really high (like, after 100 minutes), but the problem there is you can time my actual idle time from my advertised time and likewise the time I come back online. Clients don’t (currently) offer the option of being somewhat random about when they start to advertise a status change.

At least, though, I can fix that problem by modifying the client code. The selective presence problem is less tractable: the protocols do not support it.

Powered by WordPress