Ben Laurie blathering

Jabber Pain

For a while, its been apparent to me that Jabber was occasionally dropping messages. Last week I finally got annoyed enough to investigate it in earnest.

Unfortunately, I started off on entirely the wrong track, and blamed GTalk (sorry, guys!) – but much investigation later, with help from some very patient friends (you know who you are: thanks!), I found that it was my own Jabber server that was to blame.

However, it was not an easy journey. First of all, how do you tell messages are being dropped? I am pretty certain my server has been dropping messages since before Christmas – i.e. at least four weeks, and I am fairly certain it has been doing it ever since I first built it – which must be a year or two now. Could it be that it could drop messages for that long and no-one noticed? It seems to me, in retrospect, that it could! A wise friend of mine once said, “you know, 90% of what we say to each other could be completely different and it would make no difference”. This is even more true for IM. We send messages out. Sometimes we get answers. Sometimes we don’t. If we don’t, well, the other guy was away, or not interested, or got busy and forgot to respond. It’s fine. It was probably one of the 90%. When it’s one of the 10%, well, then we say it again. And this time we get an answer, and we’re both happy. So, it you can go on for years and not notice that stuff is missing.

It wasn’t until I started badgering my friends to tell me when they thought messages were going missing that it became clear that they were, indeed. And not just a few – a lot! I now know that it was dropping about 50% of incoming messages (i.e. messages sent to me) and no outgoing messages. God knows what kind of rude bastard my friends think I am by now! An interesting feature is that it would drop them in batches – i.e. drop for 5 minutes, forward for 5 minutes, drop for 5 minutes and so on. If it had been every second message it would have been apparent sooner, I suspect, because the conversation would be quite choppy.

But even knowing that messages were being dropped was not the end of the story. How do you figure out what is to blame? In the typical scenario, because I run my own server, there are at least 3 connections and 4 pieces of software that could be at fault.

  1. The other guy’s client,
  2. the connection from that to their server,
  3. their server,
  4. the connection from their server to my server,
  5. my server,
  6. the connection from my server to my client,
  7. my client

As I said above, I started at the wrong end – with GTalk. With some help, it became apparent that GTalk was unlikely to be to blame (and because it was upstream from the other guy’s client, we could eliminate that, too). So the next easiest target to look at was my server – which I did, with the help of tcpdump and Wireshark, though investigation was complicated by both OTR and SSL, which make it very hard to interpret and track messages. Luckily the server-to-server connection was in plain text (which is one reason I use OTR), so it could be done, with difficulty – particularly since it turned out that my jabber daemon was the culprit – so I could see messages coming in in the traces, and no corresponding activity in the server-to-client connection. Sometimes.

To cut a long story short, after much poking at my existing jabber server, which was jabberd14, I decided to replace it with jabberd2. But before I did that, I wanted to be really sure that jabberd14 was to blame, and that jabberd2 would fix it. So, I wanted the Jabber equivalent of ping. To my amazement, there appears to be no such thing! There is a Jabber ping extension but I can’t find anything that uses it. Which is the final reason I am writing this blog post – I wrote a pair of scripts that will do a Jabber ping test, and, feel free to use them. And if you are using jabberd14, I’d really like to know if you, too, get message drops…

I was planning to make them count and produce statistics and such, but I got lazy. Since you can see both ends, eyeballing them is enough to let you know what’s going on – Ping does count how many it got back, though, so you can leave it running without watching it all the time. To run them, you need two Jabber accounts, one on the suspect server and the other elsewhere. You can run them like this:

./ account1 password1
./ account2 password2

Pong will actually answer multiple Pings running simultaneously. Ping pings every 10 seconds. Output should be reasonably obvious. Because Jabber does store-and-forward, Ping will ignore Pongs from a different session. And because they use different resources, you can use the same account at both ends, if you want. Like I say, I’d be really interested to hear from anyone that experiences drops – a couple of hundred pings was always enough to show them when I was testing.

Oh yeah, and the good news: jabberd2 has now answered over 500 pings without a single drop. So, if you felt ignored, I hope things will improve!


  1. And here was I thinking you were just a rude bastard! 🙂

    Comment by Pat Patterson — 10 Jan 2009 @ 21:14

  2. And I just thought you were playing cool and not that interested in what I was saying! Well, did me no harm, I guess. So agree with your friend and his 90% thing. 🙂

    Glad it’s fixed now though!

    Comment by Adriana — 11 Jan 2009 @ 13:56

  3. unfortunately I realized a while ago that Jabber is NOT reliable at all.
    For me a message gets ALWAYS lost in the following scenario:

    1. my internet connection gets interrupted (e.g. out of wlan reception)
    2. the tcp connection to the jabber server is still alive (not yet a timeout, no RST of course)
    3. someone sends me message to me; the jabber server tries to sends it to me
    4. my jabber server is kinda surprised, cause the delivery fails (TCP failure)

    one would expect, that the message is stored (like an offline message)…
    tough, in my experience, jabber servers out there (at least ejabberd) don’t do that.

    same was reported here:

    Comment by bene — 12 Jan 2009 @ 13:10

  4. For what it’s worth, if your Pong is Google Talk, you’ll need $client->Connect (~line 18) to also have the additional options: componentname => ‘’, tls => 1.

    If you’re getting errors about “No SASL mechanism found” or “Use of uninitialized value $sid in hash element”, that might be the reason.

    Thanks for the scripts, Ben!

    Comment by Steven N. Severinghaus — 11 Mar 2010 @ 20:56

  5. well , i wish i could refer to this problem with humour , like you do. but i had a rather tense conversation with my boyfriend via gtalk. and during the conversation i kinda was surprised at his stupied answers. we even argued over a certain phrase , whether he said it or not. when i came home from work and reread the conversation i was stunned! i discovered absolutely different conversation that was going on. i didn’t see half of his messages. obviously there was some problem in the local net at my company. but how come there was no indication whatsoever that half of the messages were simply missing in the chat window, while they all appeared in the history (i verified it later with my guy). did you ever get such a weird behavior? and i will ask it again , how come absolutely no indication of a problem , no interruption or something , the conversation ran smoothly (except for the emotions), i think there should be some sort of refresh button or something.

    Comment by natalie — 1 Aug 2010 @ 22:43

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress