Links

Ben Laurie blathering

1 Nov 2007

Caja: Capability Javascript

Filed under: Capabilities,Open Source,Programming,Security — Ben @ 11:44

I’ve been running a team at Google for a while now, implementing capabilities in Javascript. Fans of this blog will remember that long ago I did a thing called CaPerl. The idea in CaPerl was to compile a slightly modified version of Perl into Perl, enforcing capability security in the process.

Caja follows a similar path, except rather than modify Javascript, we restrict it to a large subset. This means that a Caja program will run without modification on a standard Javascript interpreter – though it won’t be secure, of course! When it is compiled then, like CaPerl, the result is standard Javascript that enforces capability security. What does this mean? It means that Web apps can embed untrusted third party code without concern that it might compromise either the application’s or the user’s security.

Caja will be open source, under the Apache License. We’re still debating whether we will drop our existing code for this as a starting point, or whether we want to take a different approach, but in any case, there’s plenty to be done.

Although the site has been up for a while, I was reluctant to talk about it until there was some way for you to be involved. Now there is – we have a public mailing list. Come along, read the docs (particularly the Halloween version of the spec) and join in the discussions. I’m very excited about this project and the involvement of some world class capability experts, including Mark Miller (of E fame) who is a full-time member of the Caja development team.

12 Oct 2007

Configuring Apache httpd

Filed under: Open Source,Programming — Ben @ 11:02

(I’m sure most people just call it Apache but at least one vocal person in the ASF has always insisted we should call it Apache httpd, as opposed to, say, Apache Tomcat (which everyone calls Tomcat, anyway))

Since my work on the Bandit identity selector, I have been keen to get the other end working – that is, the server side. As Java drives me nuts, I was pleased to be reminded of the existence of an Apache module, mod_auth_infocard (sorry, “Apache Authentication Module for CardSpace”), from Ping Identity. So, I’ve been playing with it – but I haven’t finished; more on that later. Today I want to talk about configuring Apache, using it as an example.

The Apache developers (against my occasional protests) have always insisted on distributing the most awesomely revolting “default” configuration file with Apache. Distributions tend to go in for even huger ones, too. It has always been a source of great distress to me because almost none of that configuration is actually needed. The end result is that people end up with configurations that are hard to maintain, because they don’t know which bits are actually necessary for their site, and which bits are just left lying around afterwards.

So, I have always maintained that the right way to configure Apache (and pretty much any other software) is to start with no configuration and keep fixing it until it does what you want. Since I’ve just had to do exactly that for mod_auth_infocard, I thought I’d document the process, which involves a bit of magick, but mostly just reading.

First off, I want to get Apache running standalone, without the module added in. My first step is to just run it…

$ httpd
(13)Permission denied: make_sock: could not bind to address [::]:80
(13)Permission denied: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs

I’m not root, so no surprise that I can’t open port 80. So, I want to write my own configuration file and put it on a different port. Just to show I make mistakes, too, my first attempt was this

$ httpd -d .
httpd: Could not open configuration file ./etc/apache22/httpd.conf: No such file or directory

Close, but no cigar. At least I can take a good guess where the default config is now :-). Next attempt

$ httpd -d `pwd` -f `pwd`/conf/httpd.conf
httpd: Could not open configuration file /home/ben/hgq/Apache_Module_for_CardSpace/www-example/conf/httpd.conf: No such file or directory

That’s because I haven’t created it yet, so, I put an empty file there

$ httpd -d `pwd` -f `pwd`/conf/httpd.conf
no listening sockets available, shutting down
Unable to open logs

Progress, of a kind. I happen to know the directive to use to set the listening socket, but if I didn’t, I’d do this

$ httpd -L|more
<directory (core.c)
Container for directives affecting resources located in the specified directories
Allowed in *.conf only outside <Directory>, <Files> or <Location>
<Location (core.c)
Container for directives affecting resources accessed through the specified URL paths
Allowed in *.conf only outside <Directory>, <Files> or <Location>
.
.
.

and so forth. I add this to conf/httpd.conf

Listen 8080

and the next run gives me

[Fri Oct 12 10:29:20 2007] [warn] (2)No such file or directory: Failed to enable the 'httpready' Accept Filter
(13)Permission denied: httpd: could not open error log file /var/log/httpd-error.log.
Unable to open logs

The first is a warning only, so I’ll ignore it. The second doesn’t say so, but is in fact fatal, so I’d better fix it. Next update to httpd.conf

ErrorLog logs/error.log

This will be relative to the server root (set with the -d flag), so I also have to create the logs directory

[Fri Oct 12 10:31:56 2007] [warn] (2)No such file or directory: Failed to enable the 'httpready' Accept Filter

is all I get this time. But, having been here before, I know I need to also look at the error log

$ cat logs/error.log
[Fri Oct 12 10:31:56 2007] [error] (13)Permission denied: could not create /var/run/httpd.pid
[Fri Oct 12 10:31:56 2007] [error] httpd: could not log pid to file /var/run/httpd.pid

again, an easy fix

PidFile run/httpd.pid

and, of course, create the run directory. Now we get (in the error log)

[Fri Oct 12 10:39:06 2007] [emerg] (2)No such file or directory: Couldn't create accept lock (/var/run/accept.lock.26590) (5)

fixed with

LockFile run/accept.lock

and now I see

[Fri Oct 12 10:40:00 2007] [notice] Apache/2.2.6 (FreeBSD) configured -- resuming normal operations

This means its running – I check by browsing there, and get a page

Not Found

The requested URL / was not found on this server.

with the corresponding error

[Fri Oct 12 10:40:57 2007] [error] [client 193.133.15.218] File does not exist: /www

Since I’m not currently interested in serving any documents, I won’t fix this error, but FYI you can change this directory with DocumentRoot. OK, so I have a running Apache. My next task is to get the module running, and this is where the magick comes in. First off, because the server is now working, I have to either restart it, or stop and start it each time, so I write a little script to save typing

#!/bin/sh

[ -f run/httpd.pid ] && kill `cat run/httpd.pid`
httpd -d `pwd` -f `pwd`/conf/httpd.conf

In order to load the module, I add

LoadModule auth_infocard_module ../src/.libs/mod_auth_infocard.so

obviously, your paths may vary. Now when I try a run, we’re back to a non-starting server

httpd: Syntax error on line 12 of /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/www-example/conf/httpd.conf: Cannot load /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so into server: /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so: Undefined symbol "_ZTVN10__cxxabiv117__class_type_infoE"

Its a little curious to call this a syntax error. What’s really happening is that the module is referencing some dynamic libraries that have not been loaded. The undefined symbol, to the seasoned programmer, is clearly a C++ mangled function name. Since it is C++, I could guess that the missing library is the standard C++ library, and indeed, adding this (before the LoadModule)

LoadFile /usr/lib/libstdc++.so

moves us onwards

httpd: Syntax error on line 12 of /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/www-example/conf/httpd.conf: Cannot load /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so into server: /disk1.1/usr/home/ben/hgq/Apache_Module_for_CardSpace/src/.libs/mod_auth_infocard.so: Undefined symbol "_ZNK11xercesc_2_713XMLAttDefList14isSerializableEv"

Again, I can guess that this is from Xerces (I had to configure it when I was building the module – that gives me a clue!) … but suppose I couldn’t? Then what? c++filt to the rescue

$ c++filt _ZNK11xercesc_2_713XMLAttDefList14isSerializableEv
xercesc_2_7::XMLAttDefList::isSerializable() const

The Xerces library gets loaded

LoadFile /usr/local/lib/libxerces-c.so

The rest is more grind of the same nature (btw, if you can’t find where a symbol lives, I would recommend the judicious use of find, nm and grep). The final httpd.conf looks like this

Listen 8080
ErrorLog logs/error.log
PidFile run/httpd.pid
LockFile run/accept.lock

LoadFile /usr/lib/libstdc++.so
LoadFile /usr/local/lib/libxerces-c.so
LoadFile /usr/local/lib/libxml-security-c.so
LoadFile /usr/local/lib/libxml2.so
LoadFile /usr/lib/libssl.so

LoadModule auth_infocard_module ../src/.libs/mod_auth_infocard.so

Note that the module is loaded, but isn’t doing anything yet. That’s for another thrilling episode.

7 Oct 2007

rss2email, quilt and FeedBurner

Filed under: Open Source,Programming — Ben @ 17:31

(It recently occurred to me that I rarely talk about what I do best, which is write code. So, this is an experimental kind of post wherein I write in far too much detail about some piece of coding. I’d be interested to know whether people want to read this kind of stuff)

Recently, FeedBurner did something that I found irritating – they removed the author’s name from the HTML in their RSS feed and instead put it in its own field. This didn’t actually break my RSS reading setup, which uses rss2email to convert RSS to, err, email (though apparently I am not the only one affected by changes in FeedBurner’s RSS feed) but it did mean I could no longer tell who had written any particular post on, for example, BoingBoing.

Today I got annoyed enough about this to decide to do something about it. Since, of course, I am using open source tools, I can fix them. Normally the way I would proceed with this would be to compare the version I am currently running against the original source, using diff, of course, then upgrade to the latest version and apply my changes to it.

This is always a slightly painful process, so over the years I have played with a couple of ways to make it less painful. Well, usually. Sometimes you have to do a make distclean or some other variant to avoid getting generated files in the diff.

One early experiment was to use CVS vendor branches. I’ve never really got on well with this, for various reasons. Firstly, the standard advice for merging vendor changes into the main tree is to run

$ cvs checkout -jFSF:yesterday -jFSF wdiff

pretty obviously this only works if you don’t import more than once a day, though you can fix this using tags, but my main problem is that I’ve always found this command completely meaningless to me. Which is perhaps why I suffered from my other problem with this approach, which was that over time it appeared to gradually drift away from both the vendor source and my patches, in apparently random ways.

More recently, I’ve tended to just grab the tarball, unpack it, rename it (typically to <package>-ben) unpack it a second time and make my changes to the -ben version. Then when I’m done I can do

$ make clean
$ cd ..
$ diff -urN <package> <package-ben>

and presto, a patch. One snag with this scheme has always been that you then end up with one monolithic patch for everything. This causes two issues; firstly, when I want to apply the patch to a new version, its hard to see which changes go together, especially when they span multiple files, and so can get tricky to make sure you resolve conflicts correctly. Secondly, if I want to contribute the patches back upstream, which I often do, developers usually want patches separated by functionality, so they can review them more easily.

It turns out that this is hardly a new problem, and a friend of mine recently turned me on to quilt. quilt is pretty cool. It automates the production of diffs. It has the idea of a “stack” of patches, so I can divide stuff up according to functionality, and have a patch for each, which I can apply and unapply at the drop of a hat. The patches themselves just live as, well, patchfiles, so I can send them in emails and stuff without any problems. So, for my inaugural use of quilt, I decided to attempt my rss2email upgrade using it.

Unfortunately, despite my claim above to be somewhat organised about patching software, it turns out that I didn’t actually save the original version of rss2email that I started from, and I can’t find it on the web, either. I blame rss2email‘s somewhat eccentric distribution method, which doesn’t start with a tarball, but instead just hands you links to individual files. I seem to remember I had to seek some of them out first time around, too. In the end I decided to just start from scratch. I know what I want, so I just need to keep hacking until I get it.

Step one is to add the convenience script I use to run rss2email, r2e. First off, tell quilt we’re making a new patch

$ quilt new add_r2e.diff

now add the new file to the patch

$ quilt add r2e

once that’s done, I can create r2e (apparently I have to do the add before the actual creation), and get quilt to update the patch accordingly

$ quilt refresh

and if I want, take a look at it

$ quilt diff
Index: rss2email/r2e
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ rss2email/r2e 2007-10-07 11:50:44.000000000 +0100
@@ -0,0 +1,4 @@
+#!/bin/sh
+# need this line to run installed version from cron...
+#cd ~/.rss2email/
+/usr/local/bin/python rss2email.py feeds.dat $*

Next, I have a somewhat different version of config.py from the distributed one, so

$ quilt new my_config.diff
$ quilt add config.py
...edit config.py...
$ quilt refresh

an interesting thing to note here is that as I went along I wanted to make further changes to config.py even though I now had other patches stacked on top of this one. A cute feature of quilt is that you can still do that, so long as later patches don’t make conflicting changes, by making the edit, then doing

$ quilt refresh my_config.diff

If later patches do conflict, then you can either pop patches until you get back to this one, make your change, refresh, then push, resolving conflicts as you go, or create a new patch at the top of the stack that makes the change. Which I’d do would depend on whether the change fits logically in the existing patch or not. The patch isn’t very fascinating, but for completeness, here it is

$ quilt diff -P my_config.diff
Index: rss2email/config.py
===================================================================
--- rss2email.orig/config.py 2007-10-07 11:59:57.000000000 +0100
+++ rss2email/config.py 2007-10-07 13:01:33.000000000 +0100
@@ -1,5 +1,6 @@
-SMTP_SEND = 1
-SMTP_SERVER = "my.mailserver.com"
-AUTHREQUIRED = 0
-SMTP_USER="username"
-SMTP_PASS="password"
+SMTP_SEND = 0
+#SMTP_SERVER = "my.mailserver.com"
+#AUTHREQUIRED = 0
+#SMTP_USER="username"
+#SMTP_PASS="password"
+HTML_MAIL = 1

Next, I wanted to be able to make changes to the config for debugging, without having to keep different versions of the config file for “production” and debug versions. So, I decided to add a second “local config” file, called, amazingly, local_config.py.

$ quilt new local_config.diff
$ quilt add rss2email.py
$ quilt add local_config.py
... edit ...
$ quilt refresh

Slightly cheating here, I am anticipating my next change, which is to add more verbosity, so I can see what’s going on. Here’s the output from quilt when asked to show this patch a bit later in the process

$ quilt diff -P local_config.diff
Index: rss2email/local_config.py
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ rss2email/local_config.py   2007-10-07 12:08:33.000000000 +0100
@@ -0,0 +1,2 @@
+VERBOSE = 1
+VERYVERBOSE = 1
Index: rss2email/rss2email.py
===================================================================
--- rss2email.orig/rss2email.py 2007-10-07 12:07:15.000000000 +0100
+++ rss2email/rss2email.py      2007-10-07 12:32:31.691629000 +0100
@@ -206,6 +206,12 @@
 except:
        pass
 
+# Read options from local config file, if present (useful for debugging)
+try:
+       from local_config import *
+except:
+       pass
+
 ### Import Modules ###
 
 import cPickle as pickle, md5, time, os, traceback, urllib2, sys, types
Warning: more recent patches modify files in patch local_config.diff

Note the handy warning at the end.

I try to avoid ever having to rely on my memory (though I do still find I sometimes have to think hard to remember the name of a piece of software I only occasionally use, so I can find it on my disk again – any suggestions?), so the next thing I do is add a Makefile for testing

$ quilt new testing.diff
$ quilt add Makefile
... edit ...
$ quilt refresh

and the diff, by yet another means

$ cat patches/testing.diff 
Index: rss2email/Makefile
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ rss2email/Makefile  2007-10-07 12:20:07.000000000 +0100
@@ -0,0 +1,5 @@
+test:
+       rm -f feeds.dat
+       ./r2e new ben@links.org
+       ./r2e add http://www.boingboing.net/atom.xml
+       ./r2e run --no-send

(quilt maintains the patches/ directory for you). Finally I’m ready to do some real work! I want to know what would be sent in email, and what the parsed RSS looks like. I think you have got the hang of creating patches by now, so I’ll just show you the patch itself…

$ cat patches/verbosity.diff 
Index: rss2email/rss2email.py
===================================================================
--- rss2email.orig/rss2email.py 2007-10-07 12:32:31.691629000 +0100
+++ rss2email/rss2email.py      2007-10-07 12:48:45.000000000 +0100
@@ -430,11 +430,18 @@
 #@timelimit(FEED_TIMEOUT)
 def parse(url, etag, modified):
        if PROXY == '':
-               return feedparser.parse(url, etag, modified)
+               parsed = feedparser.parse(url, etag, modified)
        else:
                proxy = urllib2.ProxyHandler( {"http":PROXY} )
-               return feedparser.parse(url, etag, modified, handlers = [proxy])
-
+               parsed = feedparser.parse(url, etag, modified, handlers = [proxy])
+       if VERYVERBOSE:
+               import pprint
+               pp = pprint.PrettyPrinter(indent = 2)
+               for entry in parsed['entries']:
+                       print "++++++++++"
+                       print pp.pprint(entry)
+                       print "++++++++++"
+       return parsed
 
 ### Program Functions ###
 
@@ -707,7 +714,17 @@
                if action == "run": 
                        if args and args[0] == "--no-send":
                                def send(sender, recipient, subject, body, contenttype, extraheaders=None, smtpserver=None):
-                                       if VERBOSE: print 'Not sending:', unu(subject)
+                                       if VERYVERBOSE:
+                                               print "From: ", sender
+                                               print "To: ", recipient
+                                               print "Subject: ", subject
+                                               print "Content-type: ", contenttype
+                                               for hdr in extraheaders.keys():
+                                                       print hdr, ": ", extraheaders[hdr]
+                                               print
+                                               print unu(body)
+                                               print "-------------------"
+                                       elif VERBOSE: print 'Not sending:', unu(subject)
 
                        if args and args[-1].isdigit(): run(int(args[-1]))
                        else: run()

Now I can see what is going on!

(At this point, I get less Popper and more Feyerabend, as I am now writing this post as I work on the code, instead of after the fact)

I can’t actually remember the changes I made to the original rss2email so, as I said, I am results-oriented here. My first complaint is that the author no longer appears in the output, and if I do a make, I can see that this is still true, even using the updated version, as this sample shows

From: "Boing Boing" <bozo@dev.null.invalid>
To: ben@links.org
Subject: China's net cops apparently trying to block RSS
Content-type: html
Date : Sun, 07 Oct 2007 12:01:44 -0000
User-Agent : rss2email

China’s net cops apparently trying to block RSS

I’ve been poring through emails and web comments from Boing Boing tv viewers today, and noticed a number of messages that read more or less like this:

Hi, I’m in mainland China, and for some reason I can’t subscribe to subscribe to Boing Boing tv‘s RSS feed. — and come to think of it, I can’t subscribe to feeds for Boing Boing or Boing Boing Gadgets, either. Dude WTF?

Our RSS feeds are not broken, nor are they the only ones affected, not by a long shot. According to various reports, authorities in China are attempting to block *all* RSS feeds to keep out information that may be critical of the nation’s government. Link to item on Ars Technica.

URL: http://feeds.feedburner.com/~r/boingboing/iBag/~3/166381853/china-blocks-all-rss.html

Note that this isn’t quite exactly what was output – I removed FeedBurner’s snoopy images. More on that later. But as you can see, no mention of an author (though the output is quite a bit prettier than I’m used to). Looking at the parsed RSS feed, though, I see

{ 'author': u'Xeni Jardin',
  'content': [ { 'base': 'http://feeds.feedburner.com/boingboing/iBag',
                 'language': None,
                 'type': 'text/html',
                 .
                 .
                 .

At this point I should note that the version of rss2email I’ve been running up to now did not, as far as I can tell, in any way process this field. Also, I’ve exchanged email with BoingBoing and they say they haven’t changed anything. I conclude, therefore, that FeedBurner has, as people suspect, probably changed the format (from including the author version in the post content to only having it in the markup). However, the new version does look for author information, which it tries to include as the “From” field in the email. Here’s what it does

def getName(r, entry):
	"""Get the best name."""

	feed = r.feed
	if r.url in OVERRIDE_FROM.keys():
		return OVERRIDE_FROM[r.url]
	
	name = feed.get('title', '')

	if 'name' in entry.get('author_detail', []): # normally {} but py2.1
		if entry.author_detail.name:
			if name: name += ": "
			det=entry.author_detail.name
			try:
			    name +=  entry.author_detail.name
			except UnicodeDecodeError:
			    name +=  unicode(entry.author_detail.name, 'utf-8')

	elif 'name' in feed.get('author_detail', []):
		if feed.author_detail.name:
			if name: name += ", "
			name += feed.author_detail.name
	
	return name

which would work fine, if only there were an author_detail field!

The short answer is that this is a bug in feedparser.py (this is another really good reason for using quilt: this particular patch will have to go to someone different to get incorporate)

$ cat patches/add_author.diff 
Index: rss2email/feedparser.py
===================================================================
--- rss2email.orig/feedparser.py        2006-01-11 05:00:52.000000000 +0000
+++ rss2email/feedparser.py     2007-10-07 15:29:10.000000000 +0100
@@ -976,7 +976,10 @@
             author = context.get(key)
             if not author: return
             emailmatch = re.search(r'''(([a-zA-Z0-9\_\-\.\+]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?))''', author)
-            if not emailmatch: return
+            context.setdefault('%s_detail' % key, FeedParserDict())
+            if not emailmatch:
+                context['%s_detail' % key]['name'] = author
+                return
             email = emailmatch.group(0)
             # probably a better way to do the following, but it passes all the tests
             author = author.replace(email, '')
@@ -987,7 +990,6 @@
             if author and (author[-1] == ')'):
                 author = author[:-1]
             author = author.strip()
-            context.setdefault('%s_detail' % key, FeedParserDict())
             context['%s_detail' % key]['name'] = author
             context['%s_detail' % key]['email'] = email

and now the mail header looks like this

From: "Boing Boing: Xeni Jardin"
To: ben@links.org
Subject: China's net cops apparently trying to block RSS
Content-type: html
Date : Sun, 07 Oct 2007 14:29:17 -0000
User-Agent : rss2email

Yay, we have an author! I even like the idea of it being in the from field. At this point I could probably stop but another thing has been irritating me, and that’s FeedBurner’s web bugs at the end of each post. So, I’m going to remove them. They look like this

<a href="http://feeds.feedburner.com/~a/boingboing/iBag?a=GNLz23"><img src="http://feeds.feedburner.com/~a/boingboing/iBag?i=GNLz23" border="0" /></a></p><img src="http://feeds.feedburner.com/~r/boingboing/iBag/~4/166381853" height="1" width="1" />

Its always a bit tricky removing something like this – you want to be sure you don’t accidentally remove some other similar-looking stuff. Regular expressions are the answer, of course. They are, however, a bastard to debug since a mistake anywhere causes the whole thing to not match. My technique is to start at the left-hand end and extend the expression a piece at a time as I get it working. The one hint I have for python is that using an “r” like this, r'\w+', preserves backslashes. Anyway, here’s the patch…

$ cat patches/remove_feedburner_webbugs.diff 
Index: rss2email/rss2email.py
===================================================================
--- rss2email.orig/rss2email.py 2007-10-07 14:51:42.000000000 +0100
+++ rss2email/rss2email.py      2007-10-07 17:03:55.000000000 +0100
@@ -58,6 +58,10 @@
 # 0: Just use the DEFAULT_FROM email instead.
 USE_PUBLISHER_EMAIL = 0
 
+# 1: Remove FeedBurner web bugs (only works if HTML_MAIL = 1)
+# 0: don't
+REMOVE_FEEDBURNER_WEB_BUGS = 1
+
 # 1: Use SMTP_SERVER to send mail.
 # 0: Call /usr/sbin/sendmail to send mail.
 SMTP_SEND = 0
@@ -297,6 +301,17 @@
 
 ### Parsing Utilities ###
 
+def maybeRemoveFeedBurnerWebBugs(html):
+       """If enabled, remove FeedBurner's web bugs from the HTML supplied"""
+
+       if not REMOVE_FEEDBURNER_WEB_BUGS:
+               return
+
+       import re
+       return re.sub(r'<p><a href="http://feeds.feedburner.com/~a/[^/]+/iBag\?a=\w+"><img src="http://feeds.feedburner.com/~a/[^/]+/iBag\?i=\w+" border="0" /></a></p><img src="http://feeds.feedburner.com/~r/[^/]+/iBag/[^"]+" height="1" width="1" />',
+                     '', html)
+
+
 def getContent(entry, HTMLOK=0):
        """Select the best content from an entry, deHTMLizing if necessary.
        If raw HTML is best, an ('HTML', best) tuple is returned. """
@@ -321,7 +336,7 @@
        if conts:
                if HTMLOK:
                        for c in conts:
-                               if contains(c.type, 'html'): return ('HTML', c.value)
+                               if contains(c.type, 'html'): return ('HTML', maybeRemoveFeedBurnerWebBugs(c.value))
 
                if not HTMLOK: # Only need to convert to text if HTML isn't OK
                        for c in conts:

If I’d done this in feedparser.py then it would also work for text-mode emails, probably. I could probably be persuaded to put the patch there instead.

Anyway, now I’m done, so some retroactive tweakery of the makefile, to include an install target, and also to make sure that the patches are available to anyone reading this, yielding a new version of the makefile patch…

$ cat patches/testing.diff 
Index: rss2email/Makefile
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ rss2email/Makefile  2007-10-07 17:16:37.000000000 +0100
@@ -0,0 +1,18 @@
+test:: newfeed
+       ./r2e run --no-send
+
+testmail:: newfeed
+       ./r2e run
+
+newfeed::
+       rm -f feeds.dat
+       ./r2e new ben@links.org
+       ./r2e add http://www.boingboing.net/atom.xml
+
+install::
+       cd ~/.rss2email && tar cvfz /tmp/r2e-backup.tgz .
+       cp *.py ~/.rss2email
+
+patches::
+       cd patches && tar cvfz /tmp/r2e-patches.tgz .
+       scp /tmp/r2e-patches.tgz sump2.links.org:files

1 Oct 2007

Ben vs. Bandit, Part 2

Filed under: Identity Management,Open Source,Programming — Ben @ 10:31

As I previously wrote, I’ve been trying to get Bandit’s identity selector to work on FreeBSD. The good news is: I succeeded (with the burning of some midnight oil and the assistance of Dale Olds). The bad news is that it is a bit fragile.

Anyway, here’s how. First off, gnome-keyring-deamon has to be running. If you are running Gnome, then no doubt there’s some cute way to do it, but I’m not (for what it’s worth, I’m currently using XFCE4 as my desktop). So, I have to start it by hand (my shell is bash, btw – if you use a csh-like shell, you’ll need to do something slightly different)

eval `gnome-keyring-daemon`
export GNOME_KEYRING_SOCKET
export GNOME_KEYRING_PID

Obviously one should script this for daily use – and, of course, you need to share the same daemon across all your shells. Next, you need digitalme, built in the last phase, somewhere on your path. Then run firefox (in such a way that it sees the keyring daemon environment variables you just set), install the XPI, if you haven’t already, and follow the instructions to test it. Or, if you want to win an iPhone, follow this process instead.

There some possible pitfalls. If you run digitalme when the keyring daemon is not running, then you have shot yourself in the foot; digitalme creates its card store and expects to store some corresponding magic in a keyring it creates. If the daemon isn’t running, the store is created, but not the corresponding keyring, and it never recovers. The answer is to delete ~/.iss, where the card store is, and start again.

A second way you can go wrong is when you create the information card: you might expect Bandit to automagically grab it and save it, but it doesn’t – you have to save the card to disk and manually import it into digitalme. I’m told this will be fixed at some point.

Thirdly, the first time you try to use the card it isn’t linked to your account, so it appears to fail. Read the screen carefully – it should be offering to set the link up.

Finally, and the most annoyingly – on FreeBSD at least, gnome-keyring-daemon is weirdly flaky. It just randomly stops running, sometimes before you’ve even got Firefox started. When this happens you have no option but to restart both it and Firefox – and, if you’re unlucky, you will have caused the first of these problems and will have to delete the card store.

As you can see it’s all a bit of a hack at the moment, but at least it works. The only real issue, other than rough edges, is the flaky keyring daemon, which I am trying to debug now.

29 Sep 2007

More on Cardspace and Passport, or, A Day in the Life of an Open Source Developer

Filed under: Identity Management,Open Source,Programming — Ben @ 19:02

Dale Olds is surprised. It seems mean to leave him in this state, though it seems somewhat ironic that an open source project should be choosing a thoroughly closed phone as a prize. So closed you can’t even install closed source add-ons. I’d rather have an N95, to be honest.

So, the first thing I should say is that I used the word “consumer” rather ill-advisedly. I blame OAuth, which I have been working on recently – it uses “consumer” for one of the roles in the protocol, so the word is on my mind. What I should have said was that there are few relying parties for OpenID of any significance (at least that are prepared to rely on anyone but themselves).

But OK, that aside, let’s see if I can win this phone! First off, Dale says I should read a press release. Yep, OK, Novell want us to be more aware of information cards. They also want us to know that we can do the whole thing with open source. This is, of course, fantastic. So, let’s have at it.

First off, I’m sent to the “Bandit Cards” home page. Apparently I can win an iPhone by merely getting hold of a Bandit Card – I’ll be entered into a draw. Hmm, shame, means I’m relying on luck and not my ‘4337 h4x0r sk1llz. OK, so I follow the link to create an account.

Bandit Create Account Page

OK, so let’s download one of those things.

Bandit Download Page

Hmm. No FreeBSD there, but that’s OK, this is open source. Surely I can build it. After a bit of poking around, I find a download page, from which I can retrieve a source RPM. Now, FreeBSD doesn’t understand RPMs out of the box, but it seems there’s a converter, so one quick portinstall rpm2cpio and a little bit of futzing later and I should be good to go…

[ben@euphrates ~/software/unpacked/digitalme-0.3.846]
$ ./configure
cmake: not found

Not come across cmake before, but FreeBSD’s ports system is at hand, as usual, and happily installs it for me. There, sadly, the fun appears to end:

-- Release DigitalMe build.
CMake Error: Command "/usr/local/bin/svn info /home/ben/software/unpacked/digitalme-0.3.846" failed with output:
svn: '/home/ben/software/unpacked/digitalme-0.3.846' is not a working copy

Well, quite so, it is not a working copy, because it is an RPM! However, a bit of poking suggests that this error is not as fatal as it seems – though a later error is

-- Unable to find GLIB_CONFIG_INCLUDE_DIR
-- Could not find GLib
-- Gnome Keyring not found.
CMake Error: Unable to find a secret store provider.
-- Configuring done

Is it just me, or is this rather misleading? The configuration appears to have failed, since there are no Makefiles, but it completes as if all was well. In any case, this is beginning to get a bit painful, but once more, after a bit of futzing (in CMakeModules/FindGLib.cmake and CMakeModules/FindGnomeKeyring.cmake) I manage to get it to find Glib and Gnome Keyring and we move on to the next problem

-- Looking for GTK2 ...
-- Unable to find GTK2_gdk_CONFIG_INCLUDE_DIR
CMake Error: Could not find GTK2
-- Configuring done

I’m beginning to get the hang of this – dealt with in seconds. And finally the ./configure completes without error. But still no Makefiles. Yet more poking suggests that I really should be running

./configure --debug-output

if I really want to know what’s going on. And what’s going on is this:

The end of a CMakeLists file was reached with an IF statement that was not closed properly.
Within the directory: /home/ben/software/unpacked/digitalme-0.3.846
The arguments are: NOT ${Subversion_svn_info_result} EQUAL 0
-- Configuring done

Nice. An error you only find out about if debugging is on. OK, so this exhausts my cmake-fu. Can’t figure out how to fix this one. But I am not daunted – I do what every open source developer would do – go to the bleeding edge

svn co https://forgesvn1.novell.com/svn/bandit/trunk

The code I’ve been playing with lives in the iss subdirectory. And yes! After some editing of the cmake configuration, this actually generates Makefiles! Yes! (Once I’ve sorted out the usual irritation of svn checking out into a directory called “trunk”, that is). Not that it builds – I get a ton of errors on a make. Turns out there’s a header with platform info in, and FreeBSD is not configured – although I hate the GNU configure system, this kind of stuff makes me appreciate it! More hackery and I have some kind of configuration set up for FreeBSD. Then its just a matter of build, fix, build, rinse, wash, repeat until the compile completes. Which it does, eventually.

So I am now the proud possessor of a binary called digitalme. Now what, I wonder? I guess that’s tomorrow’s job, because now I have to cook.

For the truly geeky, here’s the diff:


Index: ftk/include/ftk.h
===================================================================
--- ftk/include/ftk.h (revision 960)
+++ ftk/include/ftk.h (working copy)
@@ -41,6 +41,7 @@
#undef FTK_SPARC
#undef FTK_SPARC_PLUS
#undef FTK_X86
+ #undef FTK_FREEBSD
#undef FTK_BIG_ENDIAN
#undef FTK_STRICT_ALIGNMENT
#undef FTK_GNUC
@@ -134,6 +135,11 @@
#else
#error Platform architecture not supported
#endif
+ #elif defined(__FreeBSD__)
+ #define FTK_FREEBSD
+ #define FTK_UNIX
+ #define FTK_OSTYPE_STR "FreeBSD"
+ #define FTK_X86
#elif defined( sun)
#define FTK_SOLARIS
#define FTK_OSTYPE_STR "Solaris"
@@ -410,7 +416,9 @@
#elif defined( FTK_UNIX)
#if defined( FTK_GNUC)
#define FTKAPI
- #define FTKEXP __attribute__ ((visibility("default")))
+// BEN: this causes a million warnings, so removing pending clearer understanding
+// #define FTKEXP __attribute__ ((visibility("default")))
+ #define FTKEXP
#else
#define FTKAPI
#define FTKEXP
Index: ftk/src/ftkunix.cpp
===================================================================
--- ftk/src/ftkunix.cpp (revision 960)
+++ ftk/src/ftkunix.cpp (working copy)
@@ -428,6 +428,13 @@
{
return( f_mapPlatformError( errno, NE_FTK_FLUSHING_FILE));
}
+
+#elif defined(FTK_FREEBSD)
+
+ if( fsync( m_fd) != 0)
+ {
+ return( f_mapPlatformError( errno, NE_FTK_FLUSHING_FILE));
+ }

#else

Index: ftk/src/ftkxpath.cpp
===================================================================
--- ftk/src/ftkxpath.cpp (revision 960)
+++ ftk/src/ftkxpath.cpp (working copy)
@@ -1889,7 +1889,7 @@
break;
}

-#if defined ( FTK_LINUX) || defined ( FTK_NLM) || defined( FTK_OSX)
+#if defined ( FTK_LINUX) || defined ( FTK_NLM) || defined( FTK_OSX) || defined ( FTK_FREEBSD)
if( ui64Num > ((0xFFFFFFFFFFFFFFFFULL / 10) + (uChar - FTK_UNICODE_0)))
#else
if( ui64Num > ((0xFFFFFFFFFFFFFFFF / 10) + (uChar - FTK_UNICODE_0)))
Index: CMakeModules/FindOpenSSL.cmake
===================================================================
--- CMakeModules/FindOpenSSL.cmake (revision 960)
+++ CMakeModules/FindOpenSSL.cmake (working copy)
@@ -23,19 +23,27 @@

# Locate OpenSSL files

+# BEN: Kludge in local version of 0.9.8 - FreeBSD uses 0.9.7, which
+# doesn't actually work - so this file should not check for 0.9.7.
+# Surely there's some way to do this without hacking this file?
+
if( NOT OPENSSL_FOUND)

find_path( OPENSSL_INCLUDE_DIR ssl.h
- PATHS /usr/include
+ PATHS /home/ben/work/openssl-0.9.8/include
+ /usr/include
/usr/local/include
PATH_SUFFIXES openssl
NO_DEFAULT_PATH
)
+# remove the trailing "openssl" (this is not a kludge, it is needed)
+ STRING( REGEX REPLACE "/openssl$" "" OPENSSL_INCLUDE_DIR "${OPENSSL_INCLUDE_DIR}")
MARK_AS_ADVANCED( OPENSSL_INCLUDE_DIR)

find_library( SSL_LIBRARY
NAMES ssl.0.9.8 ssl.0.9.7 ssl
- PATHS /usr/lib
+ PATHS /home/ben/work/openssl-0.9.8
+ /usr/lib
/usr/local/lib
NO_DEFAULT_PATH
)
@@ -43,7 +51,8 @@

find_library( CRYPTO_LIBRARY
NAMES crypto.0.9.8 crypto.0.9.7 crypto
- PATHS /usr/lib
+ PATHS /home/ben/work/openssl-0.9.8
+ /usr/lib
/usr/local/lib
NO_DEFAULT_PATH
)
Index: CMakeModules/FindGTK2.cmake
===================================================================
--- CMakeModules/FindGTK2.cmake (revision 960)
+++ CMakeModules/FindGTK2.cmake (working copy)
@@ -71,6 +71,7 @@
/usr/local/include
/usr/lib
PATH_SUFFIXES gtk-2.0/include
+ gtk-2.0
NO_DEFAULT_PATH
)
mark_as_advanced( GTK2_gdk_CONFIG_INCLUDE_DIR)
Index: CMakeModules/FindGLib.cmake
===================================================================
--- CMakeModules/FindGLib.cmake (revision 960)
+++ CMakeModules/FindGLib.cmake (working copy)
@@ -28,7 +28,8 @@
find_path( GLIB_INCLUDE_DIR glib.h
PATHS /opt/gtk/include
/opt/gnome/include
- /usr/include
+ /usr/include
+ /usr/local/include
PATH_SUFFIXES glib-2.0
NO_DEFAULT_PATH
)
@@ -41,7 +42,9 @@
/opt/gnome/lib
/usr/include
/usr/lib
+ /usr/local/include
PATH_SUFFIXES /glib-2.0/include
+ /glib-2.0
NO_DEFAULT_PATH
)
MARK_AS_ADVANCED( GLIB_CONFIG_INCLUDE_DIR)
Index: CMakeModules/FindGnomeKeyring.cmake
===================================================================
--- CMakeModules/FindGnomeKeyring.cmake (revision 960)
+++ CMakeModules/FindGnomeKeyring.cmake (working copy)
@@ -34,6 +34,7 @@
GNOME_KEYRING_INCLUDE_DIR gnome-keyring.h
PATHS /usr/include
/opt/gnome/include
+ /usr/local/include
PATH_SUFFIXES gnome-keyring-1
NO_DEFAULT_PATH
)

4 Jul 2007

Java Drives Me Nuts!

Filed under: Lazyweb,Programming,Rants — Ben @ 19:34

Though I will admit that a lot of the nut-drivingness has been taken out of it by Eclipse (even if it is black magyck). So, I’ve been playing with Higgins (btw, teehee!). Or, rather, trying to. It seems Higgins is a pile of different inter-related projects. Which is good, but each one has its own dependencies which it wants to find in a subdirectory called lib. The first issue here is when I discover that something depends on stax-api-1.0.1.jar, what am I supposed to make of that? I can do a bit of googling and discover that there is such a thing out there on the interweb, download it and plug it in. But surely there’s a better way? How do I know I got the right thing? Suck it and see?

And what when the required library is called serialiser.jar? That’s just a teensy bit vague. Now what?

Then there’s the issue that each one of these projects wants its own copies of each library. Which I can do, of course, but it’s tedious! Again, I ask, surely there’s a better way?

Someone please tell me this is a solved problem and I’m a moron for whining about it.

(And I haven’t even started writing Java yet, that’s when the real nuts-drivingness sets in)

« Previous Page

Powered by WordPress