XML Stream Parsing in Erlang

There’s a lively debate out there about how one should communicate with clients in a game, in particular UDP vs TCP. I won’t go into details about it, but you can read a lively debate here. The choice for my ridiculous game is TCP+UDP. I want to use TCP for various communications where I want reliability, and UDP for fast updates that can afford to be lost. I don’t want to deal with having to build a reliable application protocol on top of UDP. I like the idea that a player is connected to a game or not, and I can encapsulate that idea with TCP.

So, I decided that a good first step would be to implement a simple TCP server in Erlang, extending what I introduced in previous posts. There are various examples out there, but I decided to follow “Building a Non-blocking TCP server using OTP principles” on TrapExit. Not too hard to do, and theoretically implements an echo server. However, one gotcha is that when you setup the socket, you’ll use lots of {packet, 2} options and I didn’t realize at first that this is an application-layer thing that helps Erlang decide when to throw you an event. So you will not be able to just log into telnet and bounce things off of your server, because telnet won’t know to include the packet length. It’s easy enough to do this in Perl, by prepending a packed message length to your message, but be forewarned. The easiest thing to do is to replace it with {packet, line} and then it will maybe do what you’re expecting.

Frankly, I found this whole thing a bit annoying, because it wasn’t clear from the documentation that gen_fsm was going to magically receive messages from the socket when data arrived. But I eventually sorted this out and it seems to work okay.

Now what I want to do is to implement a simple XML-based protocol for my game. The idea is that this will fit well with the finite state machine approach, and allows me to extend it to include new functionality pretty easily. It’s not very good for bandwidth usage, but I don’t expect to send much data this way. ejabberd is a good example of XML streaming in Erlang, and there are at least two libraries out there for parsing: xmerl (User’s Guide) and erlsom. Since I want streamed parsing, I’ll need an event-based parser with something like SAX. Neither of them have much documentation about event-driven usage (lack of documentation is starting to be my biggest problem with Erlang), but this thread would imply that I’m better off with Erlsom. Trying it out, however, it wouldn’t work without a complete document. I want to implement something like the “stanzas” of XMPP.

Of course, for some reason, doing even the simplest things in Erlang is like pulling teeth. I thought I’d play with xmerl to see if might do what I want. I look at the sparse documentation, and at this blog entry and write the simplest possible program for parsing an xml file.

-module(lame).
-include_lib(“xmerl/include/xmerl.hrl”).
-export([ parsemo/0 ]).

parsemo() ->
{Xml, Rest} = xmerl_scan:file(“foo.xml”),
io:format(“~p ~p”, [Xml, Rest]).

foo.xml is just about the simplest thing you could imagine:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<foo>
bar
</foo>

I try to run this thing, and get this:

Erlang (BEAM) emulator version 5.5.5

Eshell V5.5.5 (abort with ^G)
1> c(lame).
{ok,lame}
2> lame:parsemo().

=ERROR REPORT==== 18-Feb-2008::16:44:14 ===
Error in process <0.31.0> with exit value: {undef,[{xmerl_scan,file,["foo.xml"]},{lame,parsemo,0},
{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]}

** exited: {undef,[{xmerl_scan,file,["foo.xml"]},
{lame,parsemo,0},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_loop,3}]} **

What the hell? For one thing, there is almost no information about what is wrong. This is a huge failing on the part of Erlang. The error messages are ridiculous. I assume that the problem is that it doesn’t know about xmerl and can’t resolve the function call. It managed to load it just fine with the include_lib, but it doesn’t seem that it’s finding xmerl_scan:file. I don’t even know where to begin, to figure out how to get Erlang to find the module. I mean, am I insane? This is the sort of thing that makes a tool a no-go for me. I could write a streaming XML parser in Perl in about ten minutes. I know this because I’ve done it before. XML parsing is the sort of thing that should never have to be implemented in a modern development toolset – there should be clearly documented libraries with standard interfaces.

Taking a deep breath and looking at Section 6.7 of Joe Armstrong’s book, I see that this cryptic error message indicates one of four possible failure modes in trying to find xmerl_scan:file: 1) no module xmerl_scan, 2) xmerl_scan hasn’t been compiled, 3) xmerl_scan.beam can’t be found, 4) there are multiple versions of xmerl_scan floating around in the path. I was under the impression that xmerl was included in the standard libraries. I installed Erlang 5.5.5 via apt-get erlang-base-hipe (Ubuntu Gutsy), and it would seem that xmerl_scan is around, inasmuch as man xmerl_scan gives me the “docs.” However, I’m not able to actually use it.

After much wringing of hands, I installed every erlang-related package I could find via apt-cache and eventually it seemed to work. I have no idea what the problem was. Now my silly parser outputs:

{xmlElement,foo,
foo,
[],
{xmlNamespace,[],[]},
[],
1,
[],
[{xmlText,[{foo,1}],1,[],”\n bar\n”,text}],
[],
“.”,
undeclared} []

which is at least a step in the right direction. When I try to run it on a partial string, however, we fail. I think this is the behavior that xmerl intends to have. I’m starting to think that implementing whatever ejabberd does is going to be the only way to resolve this, unfortunately. So, I guess that’ll be next time.

Leave a Reply

You must be logged in to post a comment.