Learning about Geometric Algebra

17 July 2008 | 20:03 | Science and Math | No Comments

David said I should learn about Geometric Algebra. So I wrangled Steve into an explanation over beers at the Fort St. George. It looks very interesting and I wonder if there aren’t uses for it in machine learning. Here are some tutorials:



Walmart Growth Over Time

12 July 2008 | 7:22 | General | No Comments

This fantastic map shows the expansion of Walmart locations across the US over time.



Infinite Gaussian Mixture Modeling with FBM

29 May 2008 | 21:24 | Computing, Research, Science and Math | No Comments

I am writing a paper on nonparametric Bayesian density modeling and I would like to compare my technique to the standard approach of the infinite mixture of Gaussians (iMoG). You can read Carl Rasmussen’s paper to get a feel for what it’s all about. My plan is to look at hold-out log probabilities on real data. To do this, I need to have an implementation of iMoG that can give me predictive logprobs. There are a couple of MATLAB implementations (one, two), but they don’t (as far as I can tell) provide true predictive estimates directly. Rather, they give posterior samples from the parameters and you have to do something like (Chib and Jeliazkov, 2001) to get estimates of the logprobs.

Radford Neal, on the other hand, has his Software for Flexible Bayesian Modeling and Markov Chain Sampling (FBM) that implements mixtures of Gaussians. I have to learn how to use it anyway, since it seems to be the only implementation of another nonparametric density estimation procedure that I’m interested in: Dirichlet Diffusion Trees. So this post is going to be about figuring out how to do mixture models with FBM. Radford provides an example of bivariate density estimation here.

Here is my setup: my data is 5-dimensional, and I have 200 training cases and 28 test cases. I have whitened the data so that it has the sample statistics of a spherical Gaussian. I put these data into two files that are just comma-delimited with a line for each case and a column for each dimension. This is following the conventions described here as I understand them.

The first thing we do is use the mix-spec command. This command creates the “log file.” Log files in FBM are the “documents” that one operates on. They contain all of the model and results, etc. The syntax is:

mix-spec log-file N-inputs N-targets [ N-components ]
/ concentration SD-prior [ mean-prior ]

“log-file” is the name log “document” file - I’m going to call mine “mog.log.” “N-inputs” you can pretty much ignore for now, as it doesn’t seem to be implemented. Just use zero. “N-targets” is the number of variables that you wish to find the joint distribution over. In my case, this is going to be five. “N-components” is how many Gaussians you want in your mixture. If you leave it out you get the infinite Dirichlet process mixture, which is what we want. The “concentration” is the parameter that in a sense determines the “variance” of the weights that you get out of the Dirichlet prior. In the infinite case, we have to specify it as a constant multiplied by the number of components, and so we preface it with an “x”. I don’t know what a good choice is, so I’m going to pick 5 and write “x5″. The “SD-prior” is this big stack of priors on the widths of things, described here. I don’t feel like I understand very well how to specify these giant stacks of hyperpriors. The last parameter is the standard deviation of the mean, as far as I can tell. Overall, the command I’m issuing is:

> mix-spec mog.log 0 5 / x1 0.05:0.5:0.2 10

After this, you need to issue the model-spec command. The first argument is the log file, then the word “real” if you’re modeling real data. Then you issue another cryptic command about what I think is the prior on the actual mixture Gaussians. I’m just doing the example thing, since I don’t know better:

> model-spec mog.log real 0.05:0.5:0.5:1

Next, you give it data, using the data-spec command. The first argument is the log file, so “mog.log.” The next argument is “input attributes” which is zero, because it isn’t used. Then are “target attributes” which should be 5, since we have five dimensions. Then the slash and we specify our training file, followed by a “.” since we don’t have training inputs, then the same thing again for the test file:

> $FBM/data-spec $LOG 0 5 / \
macaque-5d-train1.dat . macaque-5d-test1.dat .

So, now we have the model set up and we have data associated with it. Now it’s time for some inference. You have to invoke some special commands for the infinite case that I’m just going to take directly from the example:

> mc-spec mog.log repeat 20 met-indicators 10 gibbs-params gibbs-hypers

Now, I ran it for a bunch of iterations:

> mix-mc mog.log 10000

And after this, I looked at what it came up with:

> mix-display mog.log

But most importantly, I wanted to see the logprobs it found:

> mix-pred p mog.log 5000:

The “p” means “give me each log probability” and the “5000:” says “start after the 5000th iteration.”

This seems to maybe actually do what I want…



Rock Band - Best Cooperative Game Ever

7 May 2008 | 16:19 | Gaming | No Comments

Our downstairs neighbors, who are our very very good friends are moving away. They’re huge karaoke addicts and so we got them a fun parting gift: Rock Band for the PS2. This game is so much fun, it’s painful. Obviously, I know people have liked it, and Guitar Hero has been a big hit, but I’ve never tried it. What’s really, really neat, though is that it is the best cooperative gaming setup I’ve ever seen. With the exception of minigames in Mario Party type games, cooperative play is pretty meager over all. In Rock Band, however, everyone is on equal footing and contributing to the experience. It’s pretty awesome. We pulled it out at a little house party and everyone jumped in. We’re planning a “battle of the bands” at an upcoming party where the bands are selected out of a hat and everyone has to do each instrument for at least one song. It should be pretty hilarious. The only disappointment is that all of the new downloadable content won’t be available for the PS2 version. We’re already itching for new songs.



Holly Dunsworth on NPR

5 May 2008 | 14:54 | General, Science and Math | No Comments

One of our very good friends, Holly Dunsworth, is going to be on National Public Radio (NPR) for “This I Believe” this coming weekend as a part of “Weekend Edition.” You can read her essay here. An excerpt:

I believe in evolution. It’s easy. It’s my life. I’m a paleoanthropologist. I study fossils of humans, apes, and monkeys and I teach college students about their place in nature.

Of course I believe in evolution.

But why then do I answer “no” when you ask me if I “believe in evolution”? Because if you have to ask me if I believe in evolution, than to you evolution is controversial and something to believe in as opposed to God. I answer “no” because I want to separate evolution from religion, from paralleling it with a belief in a deity.

No one has ever asked me if I believe in gravity or electricity. How absurd.



Video Card Ugliness

2 May 2008 | 16:03 | Computing, Gaming | No Comments

I have this fairly nice setup for my home office, since I work from home more or less exclusively. I like to play the occasional PC game, so I have a reasonable video card setup. I had two eVGA 7600GT video cards in an SLI configuration. Well, this all turned sour last Saturday. I pulled out the cards and…


Two broken eVGA 7600GT cardsTwo broken eVGA 7600GT video cards

Can you see the problem? Here’s a pair of close-ups:

Two broken eVGA 7600GT video cards.Busted capacitors on a 7600GT

So I ordered an upgrade (no SLI this time around, since I’m having to pay for it myself) of a PNY 8800GT. It seems pretty nice and was only about $150 after rebate from Newegg. I was tempted to go back to eVGA, but I had a pretty bad experience with them when I was setting this box up. I decided to branch out.



GP Product Model Paper Accepted at ICML 2008

1 May 2008 | 18:43 | Research, Science and Math | No Comments

Oliver Stegle and I just got our paper Gaussian Process Product Models for Nonparametric Nonstationarity (pdf) accepted at the 25th International Conference on Machine Learning in Helsinki. Here is the abstract:

Stationarity is often an unrealistic prior assumption for Gaussian process regression. One solution is to predefine an explicit nonstationary covariance function, but such covariance functions can be difficult to specify and require detailed prior knowledge of the nonstationarity. We propose the Gaussian process product model (GPPM) which models data as the pointwise product of two latent Gaussian processes to nonparametrically infer nonstationary variations of amplitude. This approach differs from other nonparametric approaches to covariance function inference in that it operates on the outputs rather than the inputs, resulting in a significant reduction in computational cost and required data for inference. We present an approximate inference scheme using Expectation Propagation. This variational approximation yields convenient GP hyperparameter selection and compact approximate predictive distributions.



Global Comparison of Acceptance of Evolution

20 April 2008 | 22:25 | Politics, Research, Science and Math | No Comments


I believe the original article is in Science, here.

The article is only available if you or your institution has an AAAS membership. This behavior seems a bit odd for a group calling it self an “Association for the Advancement of Science.”



Good Cop at Protest

16 March 2008 | 17:54 | Civil Liberties, General, Politics | No Comments

There are good cops out there, but sometimes the news makes it seem like they’re all up to no good. Here’s a short video of a good cop doing his job well at a protest in LA. It seemed like it was worth sharing.



Ungoogleable Erlang Documentation

20 February 2008 | 1:16 | Computing, Erlang | No Comments

While I’m complaining about Erlang: why doesn’t Google ever return any hits on the documentation? If I google “perl sprintf” the first hit is the documentation page http://perldoc.perl.org/functions/sprintf.html. The same thing happens if I Google “python array” or “lisp map” or “php echo.” If I type “erlang supervisor” I don’t get anything manual-like until the 18th hit with this page, which isn’t even the man page. Moreover, that page doesn’t even link to the actual manual page despite referring to it:

This section should be read in conjunction with supervisor(3), where all details about the supervisor behaviour is given.

In fact, even if you start at the documentation page, it’s not clear how you would find supervisor (3) aside from actually typing “man supervisor” at the command prompt. The Erlang Reference Manual (with its HORRIBLE HORRIBLE FRAMES) doesn’t actually talk about the supervisor behaviour, presumably because this is technically an OTP thing. Eventually, I go to the index and there it is. Why is this so hard? I feel like I can’t be the only one who finds this frustrating.



Erlang PostgreSQL Roundup

20 February 2008 | 0:56 | Britain, Computing, Erlang | No Comments

Like just about everything to do with Erlang, database driver support appears to be in total disarray. I’d like to be able to store data in a PostgreSQL database and access it reasonably well. Options appear to be

  • Erlang psql driver that is a fork or something of the code by Erlang Consulting. You can’t even directly download it. You have to check it out from SVN. It doesn’t even have a README file.
  • On Jungerl there claims to be psql project, but you can’t download it or anything without apparently downloading all this other stuff. In fact, like the one above, you can only check it out via CVS. The CVS repository for it is viewable here. It only claims to be able to perform “simple commands” and the code doesn’t look like it’s been updated in a while.
  • Erlang Consulting has released some code, but no examples or anything about how to use it.
  • There is apparently an ODBC implementation, but from the mailing list it sounds like it is very slow and doesn’t work well in Linux, while also not implementing many of PostgreSQL’s features.
  • ejabberd also has an implementation

There’s a blog post here from 2006 where Ernie Makris claims to have written a kick-ass interface and is going to tell us all about it and post it. Unfortunately he never posts on his blog again. There is an extensive thread here where people express the same frustrations as I am. This thread is only a couple of months old, so maybe things have gotten somewhere.

It appears, anecdotally, that the one on Jungerl by Christian Sunesson is pretty stable, and it seems to be relatively standalone. I will give this one a shot first and see where it gets me. Connection seems straightforward enough:

{ok, Db} = pgsql:connect(”host”, “database”, “user”, “password”).

My first little query:

pgsql:squery(Db, “SELECT NOW()”).

After turning off SSL in the postgresql.conf file, I got back the quite-reasonable answer:

{ok,[{"SELECT",
[{desc,0,"now",timestamptz,text,8,-1,0}],
[[< <"2008-02-19 18:55:06.87229-05">>]]}]}

So, I’m cautiously optimistic that I might be able to make this work.

UPDATE:
It looks like the ejabberd stuff may actually be the way to go. It is a branch of the Jungerl work by Christian Sunesson and it appears to be under active development. Specifically, they have implemented things with gen_server, which would seem to be a big improvement.



XML Stream Parsing in Erlang, II

19 February 2008 | 22:37 | Computing, Erlang, Gaming, General | No Comments

In my previous post, I complained a lot about trying to get XML stream parsing working. Ultimately, I just decided to rip the guts out of ejabberd, rather than reinvent the wheel. The relevant files are xml_stream.erl, xml.erl, and expat_erl.c. You can see how to use it in ejabberd_receiver.erl. Frankly, these guys seem to have it pretty well figured out. The way I went about it is this: using the tcp-client gen_fsm from here, I added init code to load up the library in start_link, with something like this:

case erl_ddll:load_driver(”ebin”, expat_erl) of
ok -> ok;
{error, already_loaded} -> ok;
{error, Reason} ->
error_logger:error_msg(”Could not load expat driver: ~p~n”,
erl_ddll:format_error(Reason))
end

I added a bit to the fsm state record to manage the xml state. When a client connects and the process gets started for it in the 'WAIT_FOR_SOCKET'({socket_ready, Socket}, State) when is_port(Socket) call, I initialize the xml parser xml_stream:new(self()) and store that in the record. Upon receiving data in 'WAIT_FOR_DATA'({data, Data},, I call NewXmlState = xml_stream:parse(XmlState, Data) and update the record with the new state. Then, messages just appear in the fsm queue and I handle them. What I do, specifically, is set up another fsm and send it messages that have been parsed, so that it essentially abstracts away the tcp and the xml. My plan is to add udp functionality that uses the same interface, so that datagrams look just like stanza’d messages and the process doesn’t need to know where the data came from.



XML Stream Parsing in Erlang

18 February 2008 | 23:53 | Computing, Erlang, Gaming | No Comments

There’s a lively debate out there about how one should communicate with clients in a game, in particular UDP vs TCP. I won’t go into details about it, but you can read a lively debate here. The choice for my ridiculous game is TCP+UDP. I want to use TCP for various communications where I want reliability, and UDP for fast updates that can afford to be lost. I don’t want to deal with having to build a reliable application protocol on top of UDP. I like the idea that a player is connected to a game or not, and I can encapsulate that idea with TCP.

So, I decided that a good first step would be to implement a simple TCP server in Erlang, extending what I introduced in previous posts. There are various examples out there, but I decided to follow “Building a Non-blocking TCP server using OTP principles” on TrapExit. Not too hard to do, and theoretically implements an echo server. However, one gotcha is that when you setup the socket, you’ll use lots of {packet, 2} options and I didn’t realize at first that this is an application-layer thing that helps Erlang decide when to throw you an event. So you will not be able to just log into telnet and bounce things off of your server, because telnet won’t know to include the packet length. It’s easy enough to do this in Perl, by prepending a packed message length to your message, but be forewarned. The easiest thing to do is to replace it with {packet, line} and then it will maybe do what you’re expecting.

Frankly, I found this whole thing a bit annoying, because it wasn’t clear from the documentation that gen_fsm was going to magically receive messages from the socket when data arrived. But I eventually sorted this out and it seems to work okay.

Now what I want to do is to implement a simple XML-based protocol for my game. The idea is that this will fit well with the finite state machine approach, and allows me to extend it to include new functionality pretty easily. It’s not very good for bandwidth usage, but I don’t expect to send much data this way. ejabberd is a good example of XML streaming in Erlang, and there are at least two libraries out there for parsing: xmerl (User’s Guide) and erlsom. Since I want streamed parsing, I’ll need an event-based parser with something like SAX. Neither of them have much documentation about event-driven usage (lack of documentation is starting to be my biggest problem with Erlang), but this thread would imply that I’m better off with Erlsom. Trying it out, however, it wouldn’t work without a complete document. I want to implement something like the “stanzas” of XMPP.

Of course, for some reason, doing even the simplest things in Erlang is like pulling teeth. I thought I’d play with xmerl to see if might do what I want. I look at the sparse documentation, and at this blog entry and write the simplest possible program for parsing an xml file.

-module(lame).
-include_lib(”xmerl/include/xmerl.hrl”).
-export([ parsemo/0 ]).

parsemo() ->
{Xml, Rest} = xmerl_scan:file(”foo.xml”),
io:format(”~p ~p”, [Xml, Rest]).

foo.xml is just about the simplest thing you could imagine:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<foo>
bar
</foo>

I try to run this thing, and get this:

Erlang (BEAM) emulator version 5.5.5

Eshell V5.5.5 (abort with ^G)
1> c(lame).
{ok,lame}
2> lame:parsemo().

=ERROR REPORT==== 18-Feb-2008::16:44:14 ===
Error in process <0.31.0> with exit value: {undef,[{xmerl_scan,file,["foo.xml"]},{lame,parsemo,0},
{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]}

** exited: {undef,[{xmerl_scan,file,["foo.xml"]},
{lame,parsemo,0},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_loop,3}]} **

What the hell? For one thing, there is almost no information about what is wrong. This is a huge failing on the part of Erlang. The error messages are ridiculous. I assume that the problem is that it doesn’t know about xmerl and can’t resolve the function call. It managed to load it just fine with the include_lib, but it doesn’t seem that it’s finding xmerl_scan:file. I don’t even know where to begin, to figure out how to get Erlang to find the module. I mean, am I insane? This is the sort of thing that makes a tool a no-go for me. I could write a streaming XML parser in Perl in about ten minutes. I know this because I’ve done it before. XML parsing is the sort of thing that should never have to be implemented in a modern development toolset - there should be clearly documented libraries with standard interfaces.

Taking a deep breath and looking at Section 6.7 of Joe Armstrong’s book, I see that this cryptic error message indicates one of four possible failure modes in trying to find xmerl_scan:file: 1) no module xmerl_scan, 2) xmerl_scan hasn’t been compiled, 3) xmerl_scan.beam can’t be found, 4) there are multiple versions of xmerl_scan floating around in the path. I was under the impression that xmerl was included in the standard libraries. I installed Erlang 5.5.5 via apt-get erlang-base-hipe (Ubuntu Gutsy), and it would seem that xmerl_scan is around, inasmuch as man xmerl_scan gives me the “docs.” However, I’m not able to actually use it.

After much wringing of hands, I installed every erlang-related package I could find via apt-cache and eventually it seemed to work. I have no idea what the problem was. Now my silly parser outputs:

{xmlElement,foo,
foo,
[],
{xmlNamespace,[],[]},
[],
1,
[],
[{xmlText,[{foo,1}],1,[],”\n bar\n”,text}],
[],
“.”,
undeclared} []

which is at least a step in the right direction. When I try to run it on a partial string, however, we fail. I think this is the behavior that xmerl intends to have. I’m starting to think that implementing whatever ejabberd does is going to be the only way to resolve this, unfortunately. So, I guess that’ll be next time.



Gene Expression Repositories

16 February 2008 | 18:36 | Research, Science and Math | No Comments

There’s been some discussion lately on the UAI mailing list about repositories for gene expression data. Here are some of the places people have pointed to:



Fast Floating-Point Exponential

6 February 2008 | 22:08 | Computing, Science and Math | 1 Comment

If you are writing code that is dominated by evaluation of the exponential function, you cannot do without these two papers:

Nicol N. Schraudolph. A Fast, Compact Approximation of the Exponential Function. Neural Computation, 11(4):853–862, 1999.

G. C. Cawley. On a fast compact approximation of the exponental function. Neural Computation, 12(9):2009-20012, 2000.

The implementations described yield 3x improvement in the performance of the exponential, with marginal introduction of error. Essentially, they use clever tricks exploiting the IEEE floating point implementation. If you want to skip straight to the guts, here is the implementation from the second reference:

#include

#define EXP_A (1048576/M_LN2)
#define EXP_C 60801

inline double exponential(double y)
{
    union
    {
        double d;
#ifdef LITTLE_ENDIAN
        struct { int j, i; } n;
#else
        struct { int i, j; } n;
#endif
    }
    _eco;

    _eco.n.i = (int)(EXP_A*(y)) + (1072693248 - EXP_C);
    _eco.n.j = 0;

    return _eco.d;
}



Making Better Equations in Latex and Beamer

28 January 2008 | 18:31 | Computing, Research, Science and Math | 1 Comment

I use Latex Beamer to do all my talk slides. Problematically, however, it can be tempting to pack the slides full of equations. Obviously if you’re presenting mathematical results this is unavoidable. This page on using arrows in equations with Beamer is fantastic. I’m definitely using them in my next presentation.



Starting an Erlang Project, Part III

24 January 2008 | 21:44 | Computing, Erlang | No Comments

I’ve posted twice now about starting an Erlang project (one and two). I now have a directory structure and a Makefile that I’m happy with, plus some odds-and-ends from my first attempt at stubbing out an OTP application. I’m now going to begin again, with a healthy dose of Chapter 18 from Joe Anderson’s book and a bit of Pete Kazmier’s Howto as well.

To remind you again, we’re calling this project “North Zulch” and prefixing all of our modules with “nz.” This sort of prefixing and underscoring seems to be the way that Erlang manages its namespace. It’s a bit ugly, but that’s the way it is. The first thing we want is an application callback module, more or less like we made before. We’ll name this file after the overall project: north_zulch.erl. Our first shot at this is:

-module(north_zulch).
-behaviour(application).
-export([ start/2, stop/1 ]).

start(_Type, _StartArgs) ->
    io:format("north_zulch is starting~n"),
    {ok, self()}.

stop(_State) ->
    io:format("north_zulch has stopped~n"),
    ok.

This obviously doesn’t do anything interesting at all, but it’s a good place to start. The start/2 function prints out a little message and then returns a tuple of the term ok and our process id. Generally, this process id will be for a supervisor, but right now we just give it this. stop/1 does a similar thing, but without a process id. Note that we prefix the various arguments with underscores to indicate that we’re not interested in the values. We put north_zulch in our MODULES list in the Makefile, run make to compile it and then run erl -pa ebin, which fires up Erlang and looks in the ebin directory for .beam files. Now we get

Erlang (BEAM) emulator version 5.5.5

Eshell V5.5.5  (abort with ^G)
1> north_zulch:start(foo,bar).
north_zulch is starting
{ok,<0.31.0>}
2> north_zulch:stop(foo).
north_zulch has stopped
ok

We give the two functions dummy values (since we aren’t using them), or we’ll get an error. So, this seems to work. However, the whole point is that this is an application callback module. We want to be able to deal with it as an OTP application. To do that we have to create an .app file, which we place in the ebin directory, named north_zulch.app. This file tells the application what to do with us.

%% -*- erlang -*-
{application, north_zulch,
 [{description, "North Zulch Master"},
  {vsn, "1.0"},
  {modules, []},
  {registered, []},
  {applications, [kernel, stdlib]},
  {mod, {north_zulch, []}},
  {env, []},
  {start_phases, []}
 ]}.

I won’t go into detail about this, but the important bits are: the first thing in the tuple must be the term application, followed by the name of the application - which must also be the name of the file. You can give it a description and a version number. You need to make sure to include the kernel and stdlib applications, because pretty much everything uses these. The mod field needs to be your application callback module, along with the arguments that will eventually get handed to start/2. Also, for you folks that care about such things, the top line sends Emacs into Erlang-mode, which it would not be in by default for a file ending in .app. Now, we can fire up our application like this:

Erlang (BEAM) emulator version 5.5.5

Eshell V5.5.5  (abort with ^G)
1> application:start(north_zulch).
north_zulch is starting
ok
2> application:stop(north_zulch).
north_zulch has stopped

=INFO REPORT==== 24-Jan-2008::14:32:32 ===
    application: north_zulch
    exited: stopped
    type: temporary
ok

Now I’m going to invoke (very slightly) deeper magic to allow us to run our application via “make run” on the command line like this:

$ make run
Running application north_zulch...
north_zulch is starting
north_zulch has stopped
$

The Makefile changes are:

APP       = north_zulch
MODULES   = $(APP)

--- SNIP ---

run: all
	@echo "Running application $(APP)..."
	@ $(ERL) -pa $(BINDIR) -eval 'application:start($(APP)).' \
               -noshell -s init stop

and remember - there are tabs before the commands, not spaces.

In the next installment, we’ll look at making the application actually do something - like run a supervisor, for instance.



Starting an Erlang Project, Part II

24 January 2008 | 19:28 | Computing, Erlang | No Comments

Continuing on with the previous Erlang post, there have been a couple of developments: 1) Programming Erlang arrived from Amazon, 2) trapexit.org came back online, and 3) I found this awesome blog.

To remind you of my immediate goal: I want to figure out how to organize a large-scale Erlang project. I want to learn the conventional project directory/file structure and create a useful build environment. Then I want to set up an infrastructure for configuration and logging. My initial stab at this involved using some stub generator code to get up and running. It seemed to do something pretty reasonable, and we walked through it up to the point where we implement the gen_server behaviour. I have to say, though, that I’m not a huge fan of stub generation. I like to know why every line of code is there, and for learning purposes, I want to set things up myself. So I’m going to sort of ignore what I did the other day and start again.

First, directory structure: In his book, Joe Armstrong is very agnostic about directories. He sort of has a “stick it where it makes sense” approach. I can appreciate that, but I like a little more structure. Stoyan Zhekov has exactly the attitude I’m looking for when he quotes the Ruby motto “Convention over configuration.” He says, that the conventional way to go is:

  • ebin/ for compiled BEAM files
  • src/ for .erl source files
  • include/ for .hrl include files
  • priv/ for application-specific configuration

This is pretty much just like C/C++ development with regard to src and include, and also mirrors tutorial by Pete Kazmier, with the exception that he puts his external Python program in the priv directory. Perhaps external programs and libraries would be better in bin and lib directories, respectively? If it seems like I’m being pedantic, well, I am. For me, a project is most fun when everything is still nice and beautiful and unhacky. I try to hang on to that as long as I can, so I like to make decisions like these carefully so I know I can live with them.

Next up is build management. When I write C or C++, I use Make and when I write Java, I use ant. They’ve got their quirks, but they’re well documented and you can cut-and-paste your build files from past projects with only a little bit of modification. So what does Joe use for building Erlang projects? Make! Stoyan does something that looks like Make, but actually defines an Emakefile, which is for an Erlangified implementation of a make-like concept. What I like about the Erlang make module is that it is obviously Erlang-aware. What I don’t like is that I expect to be integrating code from other languages, and it won’t be able to handle those dependencies. Also, the man page is pretty sparse. It doesn’t say anything about recursively using Emakefiles, etc. - things that are well-documented in Gnu Make. So, I think I’m going to stick with good old Make for now then. Without further ado, here is my basic Makefile (for Gnu Make only):

MODULES   = foo bar baz

BEAMS     = $(MODULES:%=%.beam)

BINDIR    = ebin
SRCDIR    = src
INCDIR    = include
VPATH     = $(BINDIR):$(SRCDIR):$(INCDIR)

ERL       = erl
ERLC      = erlc
ERLCFLAGS = -W -smp

all: $(BEAMS)

%.beam : %.erl %.hrl
	$(ERLC) -b beam $(ERLCFLAGS) -I $(INCDIR) -o $(BINDIR) $< .PHONY: clean

clean:
	rm -rf $(BINDIR)/*.beam $(INCDIR)/*~ $(SRCDIR)/*~ *~

This uses a couple of features: it remaps the list of modules to have .beam extensions, it uses VPATH to search directories for dependencies, and it uses the pattern-matching rules to enforce dependencies on .erl and .hrl files. Also, I always like to remove all the Emacs backup files when I clean things up. I use Subversion and am a compulsive committer, so it has almost never come up that I’ve found these files useful.

In the next installment, we’ll look (again) at setting up an application in our nice new project structure.



Starting an Erlang Project, Part I

23 January 2008 | 5:05 | Computing, Erlang, Gaming, General | No Comments

Per my previous post, I’m going to start a little Erlang project. I’ll call it “North Zulch,” or “NZ” in reference to the tiny little place where the family ranch is located.

There are three things that I’ve had to solve over and over (or at least cut and paste over and over), in every development project I’ve ever done. These things are: build management, configuration, and logging. These things have led me down every path one can imagine, in C/C++, Perl, Python and Java: make, ant, MakeMaker, Boost, log4j, log4cpp, Commons Configuration, whatever, I’ve probably tried to use it in a project at some point, and maybe rolled my own version as well.

So, the first thing I want to learn is how to manage a large project correctly in Erlang. There seem to be two concepts for this sort of thing: Applications and Releases. My understanding is that Applications let you bundle processes together into a coherent whole, while Releases let you make something “turn-key” and ready to run. I should also mention that I want this to be a distributed project. There are two basic reasons why you might want to have a distributed application: 1) reliability, and 2) scalability. For the hypothetical MMO framework we want, we need both. Sure, we don’t need the “Nine Nines” of the telecom world, but if the MMO is successful people will want to get their fix, so maybe we want three nines. Scalability is much more important. Since we’re fantasizing anyway, let’s imagine that we want to be able to have around a million users on an unsharded world. Obviously this is going to require a pretty hefty cluster of the sort of machines that are cheap to maintain and replace. Anyway, regardless of the stupidity of these numbers, the point is that NZ needs to be a distributed application.

In practical terms, the most useful things I’ve seen for getting started have been this page by Pete Kazmier, which talks about building an application using OTP principles, and this page at TrapExit, which has been down for several days. Also, there seems to be a lot of stuff going on at Erlware. The goal there seems to be to make a unified system like autotools for making Erlang code available to the public. That is a noble goal to be sure, but at this point Sinan and friends just seem like another learning curve to climb. In addition, the only tutorial around for the build system claims that the tool is broken.

The TrapExit page says

You will need to download the reference build system for this tutorial, this can be found at www.erlware.org under downloads otp_base-vsn

but it doesn’t look to me like there’s a downloads section at erlware.org. After some digging it appears that you can get the OTP base from Google Code. To complicate matters, there are two versions there. It seems that otp_base-R1-1.tgz is the stable release. I untarred this file and got an otp directory and various directories beneath it. I cd otp/tools/utilities and execute ./appgen north_zulch nz (the first argument is the application name and the second argument is going to be prefixed to everything), which spewed out a bunch of osbcenities:

[: 11: ==: unexpected operator
[: 24: ==: unexpected operator
[: 18: ==: unexpected operator
[: 32: ==: unexpected operator
-- SNIP --
[: 18: ==: unexpected operator
[: 32: ==: unexpected operator
rm: missing operand
Try `rm --help' for more information.

north_zulch has been generated and placed under lib/north_zulch
north_zulch_rel has been generated and placed under release/north_zulch_rel

I'm on Ubuntu Gutsy Gibbon, and it turns out that these are because the shell scripts start with #!/bin/sh when they really want #!/bin/bash. The error about rm: missing operand is because there's a command to remove Subversion directories and there aren't any in the tarball. So I went through and changed the shebang line in otp/tools/.appgen/subsitute.sh, otp/tools/.appgen/rename.sh, and otp/tools/utilities/appgen. Then I ran my command again and got

moving blank_app_rel.config.src to north_zulch_rel.config.src
moving blank_app_rel.rel.src to north_zulch_rel.rel.src
replacing %%APP_NAME%% with north_zulch in north_zulch_rel/north_zulch_rel.rel.src
moving ba_server.erl to nz_server.erl
moving ba_sup.erl to nz_sup.erl
moving blank_app.app.src to north_zulch.app.src
moving blank_app.appup.src to north_zulch.appup.src
moving blank_app.erl to north_zulch.erl
/home/rpa/Desktop/otp-diff2/tools/.appgen
replacing %%APP_NAME_UPPER_CASE%% with NORTH_ZULCH in north_zulch/Makefile
replacing %%APP_NAME_UPPER_CASE%% with NORTH_ZULCH in north_zulch/vsn.mk
replacing %%APP_NAME%% with north_zulch in north_zulch/src/Makefile
replacing %%APP_NAME_UPPER_CASE%% with NORTH_ZULCH in north_zulch/src/Makefile
replacing %%PFX%% with nz in north_zulch/src/Makefile
replacing %%APP_NAME%% with north_zulch in north_zulch/src/north_zulch.erl
replacing %%PFX%% with nz in north_zulch/src/north_zulch.erl
replacing %%PFX%% with nz in north_zulch/src/nz_sup.erl
replacing %%APP_NAME%% with north_zulch in north_zulch/src/nz_sup.erl
replacing %%PFX%% with nz in north_zulch/src/nz_server.erl
replacing %%APP_NAME%% with north_zulch in north_zulch/src/nz_server.erl
north_zulch has been generated and placed under lib/north_zulch
north_zulch_rel has been generated and placed under release/north_zulch_rel

which I am assuming means success. As promised, this created directories otp/lib/north_zulch and otp/release/north_zulch/rel with various things beneath them. A significant file here is otp/lib/north_zulch/vsn.mk which contains the version number. I'm going to leave it at 1.0 for now. In that same directory are include which are for shared macro and record definitions, and src where all the Erlang code files go. Significantly, I'm going to include north_zulch and nz in everything I write here, but you'll obviously want to change that for your application.

In the src directory are various stubs. The north_zulch.erl defines the "application callback module." You can think of a callback module as an approximation to deriving from an asbract base class with virtual functions. The function names are there, and they'll get called under certain well-defined circumstances. It's up to you to make them do something reasonable. There are three functions exported so that the application OTP module knows what to do with us. start/2 is called to get us going, shutdown/0 is called to get us to end, and stop/1 is called when we're all done. We look at these and compare them to the docs here.

start(Type, StartArgs) ->
    case nz_sup:start_link(StartArgs) of
	{ok, Pid} ->
	    {ok, Pid};
	Error ->
	    Error
    end.

We get two arguments. Type is an atom that will generally be normal for non-distributed applications. The distributed case is more complicated, so we'll look at that later down the road. We see that the default thing to do is to just try to run nz_sup:start_link(StartArgs) which starts up a Supervisor, regardless of the atom in Type. This appears to be the only value that the spec identifies in the non-distributed case, so we'll roll with it. In any case, if all goes well, we have to return {ok, Pid, State}. If State is left out, then it just uses []. We'll ignore State for now. Pid is the process id of the main Supervisor, the boss. If start_link (which we'll talk about in a sec) goes well, it gives us Pid. The stop/1 function is pretty trivial and is called after everything is done. As far as I can tell, shutdown/0 is not actually part of the Application interface, and is just a convenient way for you to shut down your application if you want to. It just goes up and tells the Application module that it wants to be turned off:

shutdown() ->
    application:stop(north_zulch).

With that big file under our belt, we now look at nz_sup.erl which contains the code for the supervisor. The Supervisor (also, the spec) is a great idea to build as a system primitive. I think I’ve implemented it at least five times in various projects. The basic idea is, manage some underlings and make sure they do their thing, while isolating yourself from their errors. If one fails, start up another, modulo your configurable policy about such things. The docs have more info on restart behaviors. We export, for our own purposes, the function start_link/1, which got called by start/1 in the previous file:

start_link(StartArgs) ->
    supervisor:start_link({local, ?SERVER}, ?MODULE, []).

This just turns around and tells the Supervisor API that we want to be fired up as a supervisor callback module. supervisor:start_link/3 has three arguments. The first is a tuple that tells the Supervisor module who we are. The first bit of the tuple can be local or global. The second is the name of the appropriate callback module, which in this case is ?SERVER. What does that mean? Well, things that start with a question mark in Erlang are macros. Up at the top of the file is

-define(SERVER, ?MODULE).

which defines it. ?MODULE is a special one with the module name. Back to start_link/3, we also specify the name of the callback module - this one - using ?MODULE, and any arguments we want to pass along to init/1, which is the other significant function we need to export to use the supervisor:

init([]) ->
    RestartStrategy    = one_for_one,
    MaxRestarts        = 1000,
    MaxTimeBetRestarts = 3600,

    SupFlags = {RestartStrategy, MaxRestarts, MaxTimeBetRestarts},

    ChildSpecs =
	[
	 {nz_server,
	  {nz_server, start_link, []},
	  permanent,
	  1000,
	  worker,
	  [nz_server]}
	 ],
    {ok,{SupFlags, ChildSpecs}}.

This is pretty much where all the guts of the supervisor’s behavior are. The one argument you get is whatever you handed to start_link/3 before, so you could pass around various application data if you wanted. We’re not worrying about this at the moment, though, and we’re just going to look at what this gets up to. First off, we define some variables that control the supervisor’s behavior. RestartStrategy determines what restarting should occur should a child fail. If MaxRestarts failures occur before MaxTimeBetRestarts seconds, then all children are cancelled and the supervisor terminates itself. We construct a list of child specifications, that in this case only contains one element: the specs for managing the nz_server process. Within this specification are:

	 {nz_server,
	  {nz_server, start_link, []},
	  permanent,
	  1000,
	  worker,
	  [nz_server]}

The first component is a name that the supervisor uses to keep track of this child. In this case, we use the term nz_server, since there’s only one. The second component is a Module-Function-Arguments triplet that tells the supervisor what function should be called in the child. We’re going to look at this later, but the function is start_link/0 in the nz_server module. The next argument can be permanent, transient, or temporary and specifies what should happen if it dies. The fourth argument specifies how long we should wait around for a termination signal to be handled by a child. We can create trees of supervisors, so the fifth argument specifies whether this is a worker or a supervisor itself. The final element will generally be a single-element list with the name of the callback module of the worker, in this case nz_server. We return a tuple with these two specs together and they tell the Supervisor how it should act, and how is should initialize its children.

The final source file we examine is nz_server.erl, which implements the application-specific work. At the very top of the file, we see

-behaviour(gen_server).

which means that it implements the gen_server (also, here) behaviour as a callback. The callbacks we need to define for this module to do its thing are init/1, handle_call/3, handle_cast/2, handle_info/2, terminate, and code_change/3. Again, if you come from a background like C++ or Java, you can think of these approximately as virtual event-handling functions. Rather than an onMouseClick() type override, we have various functions that are called when the server receives Erlang messages. These are all exported so that the gen_server module can get to them.

There are also two interface functions that are defined that don’t really have anything to do with gen_server. The first is start_link/0, which we mentioned before. This is called by the Supervisor to get things going:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

You can see that we immediately just hand control off to the generic server. It takes a couple of arguments. The first is a tuple that is like the previous start_link in that we’re telling it to look for this module and to register a name locally. The other “local” function is stop/0 which provides an interface to shutting down gen_server by sending it an asynchronous message.

To actually do anything interesting, we’ll need to implement the gen_server functions. I’ll leave that to the next installment. We’ll also look at releases, configurations and logging, which is what we said we were after in the first place!



Off On an Erlang Adventure

22 January 2008 | 23:57 | Computing, Erlang, Gaming | 1 Comment

I have decided for some reason that it would be fun to play with massively scalable systems. In particular, I thought it would be a hilarious waste of time to work on an MMO engine. I’m not sure why this seems like a good idea, given that I have more than enough on my plate as it is, but I justified it by saying that time that I might’ve spent playing a game will now be spent creating a game. We’ll see how it goes.

For scalability, and for an excuse to learn something new, I thought I’d play with Erlang. Erlang is a functional programming language that is completely geared toward concurrency. It’s not so fast with the number-crunching we’d expect to do with a game for physics and collision detection, but it is really well-suited for having zillions of semi-independent entities wandering around. Hopefully we can figure out a way to offload hard core number crunching to C via an FFI interface of some kind, but we’ll cross that bridge when we get to it. For now, I’m just trying to learn how to build a real-life application in Erlang.

Most of the documentation on Erlang seems to revolve around learning functional programming. I feel pretty good about FP already, due to having taken 6.001 back in the day. Tutorials about implementing the factorial function in Erlang aren’t really what I’m looking for. I’ve ordered Programming Erlang from Amazon and it’s due to arrive on Friday, so hopefully that’ll get me going. Stay tuned and maybe this will all be useful to you, too.