Links

Ben Laurie blathering

30 Oct 2007

On Liberty

Filed under: Civil Liberties — Ben @ 15:15

In 1869, John Stuart Mill wrote “On Liberty”. My attention was drawn to it by a young adult of my acquaintance who had been set it for homework. Its a fairly long essay, but it can really be summarised by this paragraph

But there is a sphere of action in which society, as distinguished from the individual, has, if any, only an indirect interest; comprehending all that portion of a person’s life and conduct which affects only himself, or if it also affects others, only with their free, voluntary, and undeceived consent and participation. When I say only himself, I mean directly, and in the first instance: for whatever affects himself, may affect others through himself; and the objection which may be grounded on this contingency, will receive consideration in the sequel. This, then, is the appropriate region of human liberty. It comprises, first, the inward domain of consciousness; demanding liberty of conscience, in the most comprehensive sense; liberty of thought and feeling; absolute freedom of opinion and sentiment on all subjects, practical or speculative, scientific, moral, or theological. The liberty of expressing and publishing opinions may seem to fall under a different principle, since it belongs to that part of the conduct of an individual which concerns other people; but, being almost of as much importance as the liberty of thought itself, and resting in great part on the same reasons, is practically inseparable from it. Secondly, the principle requires liberty of tastes and pursuits; of framing the plan of our life to suit our own character; of doing as we like, subject to such consequences as may follow: without impediment from our fellow-creatures, so long as what we do does not harm them, even though they should think our conduct foolish, perverse, or wrong. Thirdly, from this liberty of each individual, follows the liberty, within the same limits, of combination among individuals; freedom to unite, for any purpose not involving harm to others: the persons combining being supposed to be of full age, and not forced or deceived.

In other words

  • Think what you want.
  • Write what you want.
  • Do what you want, so long as it does not harm others.
  • Be with who you want.

These seem self-evident to me. How have we managed to move so far from these basic principles?

BBC on the iPlayer

Filed under: Anonymity/Privacy,Digital Rights,Open Source — Ben @ 13:56

An interesting podcast with Ashley Highfield, Director Future Media & Technology.

We’re not doing enough [about open source] and it is something I want to turn up the heat on

Well, that’s a good start, but he then goes on to say

The problem at the moment, there is no open source DRM. It’s almost a contradiction in terms, if you have DRM how can you have it open source? Because open source people will be able to find out how it works and get round it.

Oh, dear. Because, of course, no-one will work out how the Microsoft DRM works, just like they haven’t worked out all the other DRMs out there. Not.

In any case, this entirely misses the point: there is no DRM on the broadcast signal, nor was there on old-fashioned video tapes. Why are downloads different? Why is it not sufficient to rely on the law, as has always happened in the past? Why not assume that your users are mostly honest rather than treat them like criminals?

Clearly there’s a vast amount of money to be made by selling “DRM” solutions to gullible old media companies. It is sad that the BBC, who don’t even have to protect their profits, do not have the collective brains to see through this scam.

Perhaps there is light at the end of the tunnel?

Where do we go from here? … The solution then is to say either we look at a future beyond DRM or we’re going to find it very hard to put our content onto open source solutions.

But he is just teasing – they don’t actually look at this future, so I guess their choice is to not put their content onto open source solutions!

On eating your bandwidth

We do make people aware of it

so that’s alright then. He goes on to say

We’ve also got to … work better with the ISPs to ensure that they don’t throttle … iPlayer type content

I think he needs to add Parliament to his list of people to work better with, after the recent lunacy from Lord Triesman
.

They go on to try to justify the use of DRM in terms of maintaining contact with their audience and their responsibility for the quality of the broadcasts – others could, it seems, put out crappy versions of their free stuff. But hold on, why would anyone download the crappy version when you could have the good version for free from the BBC? Not explained, I suppose it must be obvious.

But it’ll all be alright in the future broadcasting panopticon, when omniscient and omnipotent Auntie can rule, godlike, over all use of “their” content.

Once we get to that stage, where the content, wherever it goes, can have all the rules associated with how it should behave, and once its able to tell us who’s viewing it, where they’re viewing it … then it doesn’t really matter where the content goes

Oh goody! So if I lie back and allow total privacy rape, then kind, generous Auntie will consider relaxing DRM.

29 Oct 2007

Consultation Considered “Potentially” Harmful?

Filed under: Civil Liberties — Ben @ 14:27

Apparently the government wants to know about “risks to children from exposure to potentially harmful or inappropriate material on the internet and in video games”.

As soon as people start talking about potential harm, alarm bells start ringing. This phrase, repeated ad nauseam throughout the consultation, suggests to me a willingness to accept opinion and hearsay in lieu of hard evidence. And that, in turn, suggests to me that the conclusion of the consultation is foregone: the Internet is “potentially” harmful, as are video games, we should do more to protect vulnerable children, censorship is good, regulation is good, liberty is bad, free thinking is bad. You read it here first.

The Open Rights Group are planning to submit a response – if you’d like to help form that response, your comments are welcome.

23 Oct 2007

Aubergine and Halloumi: a Snack

Filed under: Recipes — Ben @ 15:31

Ingredients

Aubergine
Halloumi
Olive oil
Cumin seeds
Salt
Pepper

Slice the aubergine longways into slices about 1/4″ thick. Brush with olive oil, salt, pepper and lots of whole cumin seeds. Slice the halloumi into slightly thinner slices. Grill halloumi and aubergine slices till brown both sides. Combine, one slice of each, and eat.

A nice starter if you don’t mind a staggered start – must be served hot!

Groklaw on the iPlayer

Filed under: General — Ben @ 9:37

Groklaw has an interesting interview with Mark Taylor of the Open Source Consortium about the BBC’s iPlayer. Some fascinating things I didn’t know come out in this interview

the BBC management team who are responsible for the iPlayer are a checklist of senior employees from Microsoft who were involved with Windows Media. A gentleman called Erik Huggers who’s responsible for the iPlayer project in the BBC, his immediately previous job was director at Microsoft for Europe, Middle East & Africa responsible for Windows Media. He presided over the division of Windows Media when it was the subject of the European Commission’s antitrust case. He was the senior director responsible. He’s now shown up responsible for the iPlayer project.

What a coincidence, given that Acacia have also been in the news over the employment of an ex-MSFT senior manager

Acacia Research Corporation (NASDAQ:ACTG – News) announced today that its Acacia Technologies group, a leader in technology licensing, has named Brad Brunell as Senior Vice President.

Mr. Brunell joins Acacia from Microsoft, where during his 16 year career he held a number of management positions, including General Manager, Intellectual Property Licensing.

But back to the iPlayer…

One of the points that they made to us was that they [the BBC Trust] basically relied upon the information conveyed to them by the BBC management team responsible for the iPlayer and that’s not something that they intend to continue doing

it’s peer-to-peer, and in fact one of the more worrying aspects is that you have no control over your node. It loads at boot time under Windows, the BBC can use as much of your bandwidth as they please (laughter), in fact I think OFCOM, you know, made some kind of estimate as to how many hundreds of millions of pounds that would cost everyone

Useful Legal Resource

Filed under: General — Ben @ 5:09

I was bemoaning the lack of an online copy of all our laws when Lilian Edwards pointed out BAILII. Good call!

12 Oct 2007

Configuring Apache httpd

Filed under: Open Source,Programming — Ben @ 11:02

(I’m sure most people just call it Apache but at least one vocal person in the ASF has always insisted we should call it Apache httpd, as opposed to, say, Apache Tomcat (which everyone calls Tomcat, anyway))

Since my work on the Bandit identity selector, I have been keen to get the other end working – that is, the server side. As Java drives me nuts, I was pleased to be reminded of the existence of an Apache module, mod_auth_infocard (sorry, “Apache Authentication Module for CardSpace”), from Ping Identity. So, I’ve been playing with it – but I haven’t finished; more on that later. Today I want to talk about configuring Apache, using it as an example.

The Apache developers (against my occasional protests) have always insisted on distributing the most awesomely revolting “default” configuration file with Apache. Distributions tend to go in for even huger ones, too. It has always been a source of great distress to me because almost none of that configuration is actually needed. The end result is that people end up with configurations that are hard to maintain, because they don’t know which bits are actually necessary for their site, and which bits are just left lying around afterwards.

So, I have always maintained that the right way to configure Apache (and pretty much any other software) is to start with no configuration and keep fixing it until it does what you want. Since I’ve just had to do exactly that for mod_auth_infocard, I thought I’d document the process, which involves a bit of magick, but mostly just reading.

First off, I want to get Apache running standalone, without the module added in. My first step is to just run it…

$ httpd
(13)Permission denied: make_sock: could not bind to address [::]:80
(13)Permission denied: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs

I’m not root, so no surprise that I can’t open port 80. So, I want to write my own configuration file and put it on a different port. Just to show I make mistakes, too, my first attempt was this

$ httpd -d .
httpd: Could not open configuration file ./etc/apache22/httpd.conf: No such file or directory

Close, but no cigar. At least I can take a good guess where the default config is now :-). Next attempt

$ httpd -d `pwd` -f `pwd`/conf/httpd.conf
httpd: Could not open configuration file /home/ben/hgq/Apache_Module_for_CardSpace/www-example/conf/httpd.conf: No such file or directory

That’s because I haven’t created it yet, so, I put an empty file there

$ httpd -d `pwd` -f `pwd`/conf/httpd.conf
no listening sockets available, shutting down
Unable to open logs

Progress, of a kind. I happen to know the directive to use to set the listening socket, but if I didn’t, I’d do this

$ httpd -L|more
<directory (core.c)
Container for directives affecting resources located in the specified directories
Allowed in *.conf only outside <Directory>, <Files> or <Location>
<Location (core.c)
Container for directives affecting resources accessed through the specified URL paths
Allowed in *.conf only outside <Directory>, <Files> or <Location>
.
.
.

and so forth. I add this to conf/httpd.conf

Listen 8080

and the next run gives me

[Fri Oct 12 10:29:20 2007] [warn] (2)No such file or directory: Failed to enable the 'httpready' Accept Filter
(13)Permission denied: httpd: could not open error log file /var/log/httpd-error.log.
Unable to open logs

The first is a warning only, so I’ll ignore it. The second doesn’t say so, but is in fact fatal, so I’d better fix it. Next update to httpd.conf

ErrorLog logs/error.log

This will be relative to the server root (set with the -d flag), so I also have to create the logs directory

[Fri Oct 12 10:31:56 2007] [warn] (2)No such file or directory: Failed to enable the 'httpready' Accept Filter

is all I get this time. But, having been here before, I know I need to also look at the error log

$ cat logs/error.log
[Fri Oct 12 10:31:56 2007] [error] (13)Permission denied: could not create /var/run/httpd.pid
[Fri Oct 12 10:31:56 2007] [error] httpd: could not log pid to file /var/run/httpd.pid

again, an easy fix

PidFile run/httpd.pid

and, of course, create the run directory. Now we get (in the error log)

[Fri Oct 12 10:39:06 2007] [emerg] (2)No such file or directory: Couldn't create accept lock (/var/run/accept.lock.26590) (5)

fixed with

LockFile run/accept.lock

and now I see

[Fri Oct 12 10:40:00 2007] [notice] Apache/2.2.6 (FreeBSD) configured -- resuming normal operations

This means its running – I check by browsing there, and get a page

Not Found

The requested URL / was not found on this server.

with the corresponding error

[Fri Oct 12 10:40:57 2007] [error] [client 193.133.15.218] File does not exist: /www

Since I’m not currently interested in serving any documents, I won’t fix this error, but FYI you can change this directory with DocumentRoot. OK, so I have a running Apache. My next task is to get the module running, and this is where the magick comes in. First off, because the server is now working, I have to either restart it, or stop and start it each time, so I write a little script to save typing

#!/bin/sh

[ -f run/httpd.pid ] && kill `cat run/httpd.pid`
httpd -d `pwd` -f `pwd`/conf/httpd.conf

In order to load the module, I add

LoadModule auth_infocard_module ../src/.libs/mod_auth_infocard.so

obviously, your paths may vary. Now when I try a run, we’re back to a non-starting server

httpd: Syntax error on line 12 of /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/www-example/conf/httpd.conf: Cannot load /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so into server: /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so: Undefined symbol "_ZTVN10__cxxabiv117__class_type_infoE"

Its a little curious to call this a syntax error. What’s really happening is that the module is referencing some dynamic libraries that have not been loaded. The undefined symbol, to the seasoned programmer, is clearly a C++ mangled function name. Since it is C++, I could guess that the missing library is the standard C++ library, and indeed, adding this (before the LoadModule)

LoadFile /usr/lib/libstdc++.so

moves us onwards

httpd: Syntax error on line 12 of /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/www-example/conf/httpd.conf: Cannot load /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so into server: /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so: Undefined symbol "_ZNK11xercesc_2_713XMLAttDefList14isSerializableEv"

Again, I can guess that this is from Xerces (I had to configure it when I was building the module – that gives me a clue!) … but suppose I couldn’t? Then what? c++filt to the rescue

$ c++filt _ZNK11xercesc_2_713XMLAttDefList14isSerializableEv
xercesc_2_7::XMLAttDefList::isSerializable() const

The Xerces library gets loaded

LoadFile /usr/local/lib/libxerces-c.so

The rest is more grind of the same nature (btw, if you can’t find where a symbol lives, I would recommend the judicious use of find, nm and grep). The final httpd.conf looks like this

Listen 8080
ErrorLog logs/error.log
PidFile run/httpd.pid
LockFile run/accept.lock

LoadFile /usr/lib/libstdc++.so
LoadFile /usr/local/lib/libxerces-c.so
LoadFile /usr/local/lib/libxml-security-c.so
LoadFile /usr/local/lib/libxml2.so
LoadFile /usr/lib/libssl.so

LoadModule auth_infocard_module ../src/.libs/mod_auth_infocard.so

Note that the module is loaded, but isn’t doing anything yet. That’s for another thrilling episode.

11 Oct 2007

Crypto Porn

Filed under: Crypto — Ben @ 21:07

Dan Bernstein makes pretty pictures of crypto.

(via the cryptography mailing list)

7 Oct 2007

rss2email, quilt and FeedBurner

Filed under: Open Source,Programming — Ben @ 17:31

(It recently occurred to me that I rarely talk about what I do best, which is write code. So, this is an experimental kind of post wherein I write in far too much detail about some piece of coding. I’d be interested to know whether people want to read this kind of stuff)

Recently, FeedBurner did something that I found irritating – they removed the author’s name from the HTML in their RSS feed and instead put it in its own field. This didn’t actually break my RSS reading setup, which uses rss2email to convert RSS to, err, email (though apparently I am not the only one affected by changes in FeedBurner’s RSS feed) but it did mean I could no longer tell who had written any particular post on, for example, BoingBoing.

Today I got annoyed enough about this to decide to do something about it. Since, of course, I am using open source tools, I can fix them. Normally the way I would proceed with this would be to compare the version I am currently running against the original source, using diff, of course, then upgrade to the latest version and apply my changes to it.

This is always a slightly painful process, so over the years I have played with a couple of ways to make it less painful. Well, usually. Sometimes you have to do a make distclean or some other variant to avoid getting generated files in the diff.

One early experiment was to use CVS vendor branches. I’ve never really got on well with this, for various reasons. Firstly, the standard advice for merging vendor changes into the main tree is to run

$ cvs checkout -jFSF:yesterday -jFSF wdiff

pretty obviously this only works if you don’t import more than once a day, though you can fix this using tags, but my main problem is that I’ve always found this command completely meaningless to me. Which is perhaps why I suffered from my other problem with this approach, which was that over time it appeared to gradually drift away from both the vendor source and my patches, in apparently random ways.

More recently, I’ve tended to just grab the tarball, unpack it, rename it (typically to <package>-ben) unpack it a second time and make my changes to the -ben version. Then when I’m done I can do

$ make clean
$ cd ..
$ diff -urN <package> <package-ben>

and presto, a patch. One snag with this scheme has always been that you then end up with one monolithic patch for everything. This causes two issues; firstly, when I want to apply the patch to a new version, its hard to see which changes go together, especially when they span multiple files, and so can get tricky to make sure you resolve conflicts correctly. Secondly, if I want to contribute the patches back upstream, which I often do, developers usually want patches separated by functionality, so they can review them more easily.

It turns out that this is hardly a new problem, and a friend of mine recently turned me on to quilt. quilt is pretty cool. It automates the production of diffs. It has the idea of a “stack” of patches, so I can divide stuff up according to functionality, and have a patch for each, which I can apply and unapply at the drop of a hat. The patches themselves just live as, well, patchfiles, so I can send them in emails and stuff without any problems. So, for my inaugural use of quilt, I decided to attempt my rss2email upgrade using it.

Unfortunately, despite my claim above to be somewhat organised about patching software, it turns out that I didn’t actually save the original version of rss2email that I started from, and I can’t find it on the web, either. I blame rss2email‘s somewhat eccentric distribution method, which doesn’t start with a tarball, but instead just hands you links to individual files. I seem to remember I had to seek some of them out first time around, too. In the end I decided to just start from scratch. I know what I want, so I just need to keep hacking until I get it.

Step one is to add the convenience script I use to run rss2email, r2e. First off, tell quilt we’re making a new patch

$ quilt new add_r2e.diff

now add the new file to the patch

$ quilt add r2e

once that’s done, I can create r2e (apparently I have to do the add before the actual creation), and get quilt to update the patch accordingly

$ quilt refresh

and if I want, take a look at it

$ quilt diff
Index: rss2email/r2e
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ rss2email/r2e 2007-10-07 11:50:44.000000000 +0100
@@ -0,0 +1,4 @@
+#!/bin/sh
+# need this line to run installed version from cron...
+#cd ~/.rss2email/
+/usr/local/bin/python rss2email.py feeds.dat $*

Next, I have a somewhat different version of config.py from the distributed one, so

$ quilt new my_config.diff
$ quilt add config.py
...edit config.py...
$ quilt refresh

an interesting thing to note here is that as I went along I wanted to make further changes to config.py even though I now had other patches stacked on top of this one. A cute feature of quilt is that you can still do that, so long as later patches don’t make conflicting changes, by making the edit, then doing

$ quilt refresh my_config.diff

If later patches do conflict, then you can either pop patches until you get back to this one, make your change, refresh, then push, resolving conflicts as you go, or create a new patch at the top of the stack that makes the change. Which I’d do would depend on whether the change fits logically in the existing patch or not. The patch isn’t very fascinating, but for completeness, here it is

$ quilt diff -P my_config.diff
Index: rss2email/config.py
===================================================================
--- rss2email.orig/config.py 2007-10-07 11:59:57.000000000 +0100
+++ rss2email/config.py 2007-10-07 13:01:33.000000000 +0100
@@ -1,5 +1,6 @@
-SMTP_SEND = 1
-SMTP_SERVER = "my.mailserver.com"
-AUTHREQUIRED = 0
-SMTP_USER="username"
-SMTP_PASS="password"
+SMTP_SEND = 0
+#SMTP_SERVER = "my.mailserver.com"
+#AUTHREQUIRED = 0
+#SMTP_USER="username"
+#SMTP_PASS="password"
+HTML_MAIL = 1

Next, I wanted to be able to make changes to the config for debugging, without having to keep different versions of the config file for “production” and debug versions. So, I decided to add a second “local config” file, called, amazingly, local_config.py.

$ quilt new local_config.diff
$ quilt add rss2email.py
$ quilt add local_config.py
... edit ...
$ quilt refresh

Slightly cheating here, I am anticipating my next change, which is to add more verbosity, so I can see what’s going on. Here’s the output from quilt when asked to show this patch a bit later in the process

$ quilt diff -P local_config.diff
Index: rss2email/local_config.py
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ rss2email/local_config.py   2007-10-07 12:08:33.000000000 +0100
@@ -0,0 +1,2 @@
+VERBOSE = 1
+VERYVERBOSE = 1
Index: rss2email/rss2email.py
===================================================================
--- rss2email.orig/rss2email.py 2007-10-07 12:07:15.000000000 +0100
+++ rss2email/rss2email.py      2007-10-07 12:32:31.691629000 +0100
@@ -206,6 +206,12 @@
 except:
        pass
 
+# Read options from local config file, if present (useful for debugging)
+try:
+       from local_config import *
+except:
+       pass
+
 ### Import Modules ###
 
 import cPickle as pickle, md5, time, os, traceback, urllib2, sys, types
Warning: more recent patches modify files in patch local_config.diff

Note the handy warning at the end.

I try to avoid ever having to rely on my memory (though I do still find I sometimes have to think hard to remember the name of a piece of software I only occasionally use, so I can find it on my disk again – any suggestions?), so the next thing I do is add a Makefile for testing

$ quilt new testing.diff
$ quilt add Makefile
... edit ...
$ quilt refresh

and the diff, by yet another means

$ cat patches/testing.diff 
Index: rss2email/Makefile
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ rss2email/Makefile  2007-10-07 12:20:07.000000000 +0100
@@ -0,0 +1,5 @@
+test:
+       rm -f feeds.dat
+       ./r2e new ben@links.org
+       ./r2e add http://www.boingboing.net/atom.xml
+       ./r2e run --no-send

(quilt maintains the patches/ directory for you). Finally I’m ready to do some real work! I want to know what would be sent in email, and what the parsed RSS looks like. I think you have got the hang of creating patches by now, so I’ll just show you the patch itself…

$ cat patches/verbosity.diff 
Index: rss2email/rss2email.py
===================================================================
--- rss2email.orig/rss2email.py 2007-10-07 12:32:31.691629000 +0100
+++ rss2email/rss2email.py      2007-10-07 12:48:45.000000000 +0100
@@ -430,11 +430,18 @@
 #@timelimit(FEED_TIMEOUT)
 def parse(url, etag, modified):
        if PROXY == '':
-               return feedparser.parse(url, etag, modified)
+               parsed = feedparser.parse(url, etag, modified)
        else:
                proxy = urllib2.ProxyHandler( {"http":PROXY} )
-               return feedparser.parse(url, etag, modified, handlers = [proxy])
-
+               parsed = feedparser.parse(url, etag, modified, handlers = [proxy])
+       if VERYVERBOSE:
+               import pprint
+               pp = pprint.PrettyPrinter(indent = 2)
+               for entry in parsed['entries']:
+                       print "++++++++++"
+                       print pp.pprint(entry)
+                       print "++++++++++"
+       return parsed
 
 ### Program Functions ###
 
@@ -707,7 +714,17 @@
                if action == "run": 
                        if args and args[0] == "--no-send":
                                def send(sender, recipient, subject, body, contenttype, extraheaders=None, smtpserver=None):
-                                       if VERBOSE: print 'Not sending:', unu(subject)
+                                       if VERYVERBOSE:
+                                               print "From: ", sender
+                                               print "To: ", recipient
+                                               print "Subject: ", subject
+                                               print "Content-type: ", contenttype
+                                               for hdr in extraheaders.keys():
+                                                       print hdr, ": ", extraheaders[hdr]
+                                               print
+                                               print unu(body)
+                                               print "-------------------"
+                                       elif VERBOSE: print 'Not sending:', unu(subject)
 
                        if args and args[-1].isdigit(): run(int(args[-1]))
                        else: run()

Now I can see what is going on!

(At this point, I get less Popper and more Feyerabend, as I am now writing this post as I work on the code, instead of after the fact)

I can’t actually remember the changes I made to the original rss2email so, as I said, I am results-oriented here. My first complaint is that the author no longer appears in the output, and if I do a make, I can see that this is still true, even using the updated version, as this sample shows

From: "Boing Boing" <bozo@dev.null.invalid>
To: ben@links.org
Subject: China's net cops apparently trying to block RSS
Content-type: html
Date : Sun, 07 Oct 2007 12:01:44 -0000
User-Agent : rss2email

China’s net cops apparently trying to block RSS

I’ve been poring through emails and web comments from Boing Boing tv viewers today, and noticed a number of messages that read more or less like this:

Hi, I’m in mainland China, and for some reason I can’t subscribe to subscribe to Boing Boing tv‘s RSS feed. — and come to think of it, I can’t subscribe to feeds for Boing Boing or Boing Boing Gadgets, either. Dude WTF?

Our RSS feeds are not broken, nor are they the only ones affected, not by a long shot. According to various reports, authorities in China are attempting to block *all* RSS feeds to keep out information that may be critical of the nation’s government. Link to item on Ars Technica.

URL: http://feeds.feedburner.com/~r/boingboing/iBag/~3/166381853/china-blocks-all-rss.html

Note that this isn’t quite exactly what was output – I removed FeedBurner’s snoopy images. More on that later. But as you can see, no mention of an author (though the output is quite a bit prettier than I’m used to). Looking at the parsed RSS feed, though, I see

{ 'author': u'Xeni Jardin',
  'content': [ { 'base': 'http://feeds.feedburner.com/boingboing/iBag',
                 'language': None,
                 'type': 'text/html',
                 .
                 .
                 .

At this point I should note that the version of rss2email I’ve been running up to now did not, as far as I can tell, in any way process this field. Also, I’ve exchanged email with BoingBoing and they say they haven’t changed anything. I conclude, therefore, that FeedBurner has, as people suspect, probably changed the format (from including the author version in the post content to only having it in the markup). However, the new version does look for author information, which it tries to include as the “From” field in the email. Here’s what it does

def getName(r, entry):
	"""Get the best name."""

	feed = r.feed
	if r.url in OVERRIDE_FROM.keys():
		return OVERRIDE_FROM[r.url]
	
	name = feed.get('title', '')

	if 'name' in entry.get('author_detail', []): # normally {} but py2.1
		if entry.author_detail.name:
			if name: name += ": "
			det=entry.author_detail.name
			try:
			    name +=  entry.author_detail.name
			except UnicodeDecodeError:
			    name +=  unicode(entry.author_detail.name, 'utf-8')

	elif 'name' in feed.get('author_detail', []):
		if feed.author_detail.name:
			if name: name += ", "
			name += feed.author_detail.name
	
	return name

which would work fine, if only there were an author_detail field!

The short answer is that this is a bug in feedparser.py (this is another really good reason for using quilt: this particular patch will have to go to someone different to get incorporate)

$ cat patches/add_author.diff 
Index: rss2email/feedparser.py
===================================================================
--- rss2email.orig/feedparser.py        2006-01-11 05:00:52.000000000 +0000
+++ rss2email/feedparser.py     2007-10-07 15:29:10.000000000 +0100
@@ -976,7 +976,10 @@
             author = context.get(key)
             if not author: return
             emailmatch = re.search(r'''(([a-zA-Z0-9\_\-\.\+]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?))''', author)
-            if not emailmatch: return
+            context.setdefault('%s_detail' % key, FeedParserDict())
+            if not emailmatch:
+                context['%s_detail' % key]['name'] = author
+                return
             email = emailmatch.group(0)
             # probably a better way to do the following, but it passes all the tests
             author = author.replace(email, '')
@@ -987,7 +990,6 @@
             if author and (author[-1] == ')'):
                 author = author[:-1]
             author = author.strip()
-            context.setdefault('%s_detail' % key, FeedParserDict())
             context['%s_detail' % key]['name'] = author
             context['%s_detail' % key]['email'] = email

and now the mail header looks like this

From: "Boing Boing: Xeni Jardin"
To: ben@links.org
Subject: China's net cops apparently trying to block RSS
Content-type: html
Date : Sun, 07 Oct 2007 14:29:17 -0000
User-Agent : rss2email

Yay, we have an author! I even like the idea of it being in the from field. At this point I could probably stop but another thing has been irritating me, and that’s FeedBurner’s web bugs at the end of each post. So, I’m going to remove them. They look like this

<a href="http://feeds.feedburner.com/~a/boingboing/iBag?a=GNLz23"><img src="http://feeds.feedburner.com/~a/boingboing/iBag?i=GNLz23" border="0" /></a></p><img src="http://feeds.feedburner.com/~r/boingboing/iBag/~4/166381853" height="1" width="1" />

Its always a bit tricky removing something like this – you want to be sure you don’t accidentally remove some other similar-looking stuff. Regular expressions are the answer, of course. They are, however, a bastard to debug since a mistake anywhere causes the whole thing to not match. My technique is to start at the left-hand end and extend the expression a piece at a time as I get it working. The one hint I have for python is that using an “r” like this, r'\w+', preserves backslashes. Anyway, here’s the patch…

$ cat patches/remove_feedburner_webbugs.diff 
Index: rss2email/rss2email.py
===================================================================
--- rss2email.orig/rss2email.py 2007-10-07 14:51:42.000000000 +0100
+++ rss2email/rss2email.py      2007-10-07 17:03:55.000000000 +0100
@@ -58,6 +58,10 @@
 # 0: Just use the DEFAULT_FROM email instead.
 USE_PUBLISHER_EMAIL = 0
 
+# 1: Remove FeedBurner web bugs (only works if HTML_MAIL = 1)
+# 0: don't
+REMOVE_FEEDBURNER_WEB_BUGS = 1
+
 # 1: Use SMTP_SERVER to send mail.
 # 0: Call /usr/sbin/sendmail to send mail.
 SMTP_SEND = 0
@@ -297,6 +301,17 @@
 
 ### Parsing Utilities ###
 
+def maybeRemoveFeedBurnerWebBugs(html):
+       """If enabled, remove FeedBurner's web bugs from the HTML supplied"""
+
+       if not REMOVE_FEEDBURNER_WEB_BUGS:
+               return
+
+       import re
+       return re.sub(r'<p><a href="http://feeds.feedburner.com/~a/[^/]+/iBag\?a=\w+"><img src="http://feeds.feedburner.com/~a/[^/]+/iBag\?i=\w+" border="0" /></a></p><img src="http://feeds.feedburner.com/~r/[^/]+/iBag/[^"]+" height="1" width="1" />',
+                     '', html)
+
+
 def getContent(entry, HTMLOK=0):
        """Select the best content from an entry, deHTMLizing if necessary.
        If raw HTML is best, an ('HTML', best) tuple is returned. """
@@ -321,7 +336,7 @@
        if conts:
                if HTMLOK:
                        for c in conts:
-                               if contains(c.type, 'html'): return ('HTML', c.value)
+                               if contains(c.type, 'html'): return ('HTML', maybeRemoveFeedBurnerWebBugs(c.value))
 
                if not HTMLOK: # Only need to convert to text if HTML isn't OK
                        for c in conts:

If I’d done this in feedparser.py then it would also work for text-mode emails, probably. I could probably be persuaded to put the patch there instead.

Anyway, now I’m done, so some retroactive tweakery of the makefile, to include an install target, and also to make sure that the patches are available to anyone reading this, yielding a new version of the makefile patch…

$ cat patches/testing.diff 
Index: rss2email/Makefile
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ rss2email/Makefile  2007-10-07 17:16:37.000000000 +0100
@@ -0,0 +1,18 @@
+test:: newfeed
+       ./r2e run --no-send
+
+testmail:: newfeed
+       ./r2e run
+
+newfeed::
+       rm -f feeds.dat
+       ./r2e new ben@links.org
+       ./r2e add http://www.boingboing.net/atom.xml
+
+install::
+       cd ~/.rss2email && tar cvfz /tmp/r2e-backup.tgz .
+       cp *.py ~/.rss2email
+
+patches::
+       cd patches && tar cvfz /tmp/r2e-patches.tgz .
+       scp /tmp/r2e-patches.tgz sump2.links.org:files

1 Oct 2007

Ben vs. Bandit, Part 2

Filed under: Identity Management,Open Source,Programming — Ben @ 10:31

As I previously wrote, I’ve been trying to get Bandit’s identity selector to work on FreeBSD. The good news is: I succeeded (with the burning of some midnight oil and the assistance of Dale Olds). The bad news is that it is a bit fragile.

Anyway, here’s how. First off, gnome-keyring-deamon has to be running. If you are running Gnome, then no doubt there’s some cute way to do it, but I’m not (for what it’s worth, I’m currently using XFCE4 as my desktop). So, I have to start it by hand (my shell is bash, btw – if you use a csh-like shell, you’ll need to do something slightly different)

eval `gnome-keyring-daemon`
export GNOME_KEYRING_SOCKET
export GNOME_KEYRING_PID

Obviously one should script this for daily use – and, of course, you need to share the same daemon across all your shells. Next, you need digitalme, built in the last phase, somewhere on your path. Then run firefox (in such a way that it sees the keyring daemon environment variables you just set), install the XPI, if you haven’t already, and follow the instructions to test it. Or, if you want to win an iPhone, follow this process instead.

There some possible pitfalls. If you run digitalme when the keyring daemon is not running, then you have shot yourself in the foot; digitalme creates its card store and expects to store some corresponding magic in a keyring it creates. If the daemon isn’t running, the store is created, but not the corresponding keyring, and it never recovers. The answer is to delete ~/.iss, where the card store is, and start again.

A second way you can go wrong is when you create the information card: you might expect Bandit to automagically grab it and save it, but it doesn’t – you have to save the card to disk and manually import it into digitalme. I’m told this will be fixed at some point.

Thirdly, the first time you try to use the card it isn’t linked to your account, so it appears to fail. Read the screen carefully – it should be offering to set the link up.

Finally, and the most annoyingly – on FreeBSD at least, gnome-keyring-daemon is weirdly flaky. It just randomly stops running, sometimes before you’ve even got Firefox started. When this happens you have no option but to restart both it and Firefox – and, if you’re unlucky, you will have caused the first of these problems and will have to delete the card store.

As you can see it’s all a bit of a hack at the moment, but at least it works. The only real issue, other than rough edges, is the flaky keyring daemon, which I am trying to debug now.

Powered by WordPress