Starting an Erlang Project, Part I
Per my previous post, I’m going to start a little Erlang project. I’ll call it “North Zulch,” or “NZ” in reference to the tiny little place where the family ranch is located.
There are three things that I’ve had to solve over and over (or at least cut and paste over and over), in every development project I’ve ever done. These things are: build management, configuration, and logging. These things have led me down every path one can imagine, in C/C++, Perl, Python and Java: make, ant, MakeMaker, Boost, log4j, log4cpp, Commons Configuration, whatever, I’ve probably tried to use it in a project at some point, and maybe rolled my own version as well.
So, the first thing I want to learn is how to manage a large project correctly in Erlang. There seem to be two concepts for this sort of thing: Applications and Releases. My understanding is that Applications let you bundle processes together into a coherent whole, while Releases let you make something “turn-key” and ready to run. I should also mention that I want this to be a distributed project. There are two basic reasons why you might want to have a distributed application: 1) reliability, and 2) scalability. For the hypothetical MMO framework we want, we need both. Sure, we don’t need the “Nine Nines” of the telecom world, but if the MMO is successful people will want to get their fix, so maybe we want three nines. Scalability is much more important. Since we’re fantasizing anyway, let’s imagine that we want to be able to have around a million users on an unsharded world. Obviously this is going to require a pretty hefty cluster of the sort of machines that are cheap to maintain and replace. Anyway, regardless of the stupidity of these numbers, the point is that NZ needs to be a distributed application.
In practical terms, the most useful things I’ve seen for getting started have been this page by Pete Kazmier, which talks about building an application using OTP principles, and this page at TrapExit, which has been down for several days. Also, there seems to be a lot of stuff going on at Erlware. The goal there seems to be to make a unified system like autotools for making Erlang code available to the public. That is a noble goal to be sure, but at this point Sinan and friends just seem like another learning curve to climb. In addition, the only tutorial around for the build system claims that the tool is broken.
The TrapExit page says
You will need to download the reference build system for this tutorial, this can be found at www.erlware.org under downloads otp_base-vsn
but it doesn’t look to me like there’s a downloads section at erlware.org. After some digging it appears that you can get the OTP base from Google Code. To complicate matters, there are two versions there. It seems that otp_base-R1-1.tgz is the stable release. I untarred this file and got an otp directory and various directories beneath it. I cd otp/tools/utilities and execute ./appgen north_zulch nz (the first argument is the application name and the second argument is going to be prefixed to everything), which spewed out a bunch of osbcenities:
[: 11: ==: unexpected operator [: 24: ==: unexpected operator [: 18: ==: unexpected operator [: 32: ==: unexpected operator -- SNIP -- [: 18: ==: unexpected operator [: 32: ==: unexpected operator rm: missing operand Try `rm --help' for more information. north_zulch has been generated and placed under lib/north_zulch north_zulch_rel has been generated and placed under release/north_zulch_rel
I'm on Ubuntu Gutsy Gibbon, and it turns out that these are because the shell scripts start with #!/bin/sh when they really want #!/bin/bash. The error about rm: missing operand is because there's a command to remove Subversion directories and there aren't any in the tarball. So I went through and changed the shebang line in otp/tools/.appgen/subsitute.sh, otp/tools/.appgen/rename.sh, and otp/tools/utilities/appgen. Then I ran my command again and got
moving blank_app_rel.config.src to north_zulch_rel.config.src moving blank_app_rel.rel.src to north_zulch_rel.rel.src replacing %%APP_NAME%% with north_zulch in north_zulch_rel/north_zulch_rel.rel.src moving ba_server.erl to nz_server.erl moving ba_sup.erl to nz_sup.erl moving blank_app.app.src to north_zulch.app.src moving blank_app.appup.src to north_zulch.appup.src moving blank_app.erl to north_zulch.erl /home/rpa/Desktop/otp-diff2/tools/.appgen replacing %%APP_NAME_UPPER_CASE%% with NORTH_ZULCH in north_zulch/Makefile replacing %%APP_NAME_UPPER_CASE%% with NORTH_ZULCH in north_zulch/vsn.mk replacing %%APP_NAME%% with north_zulch in north_zulch/src/Makefile replacing %%APP_NAME_UPPER_CASE%% with NORTH_ZULCH in north_zulch/src/Makefile replacing %%PFX%% with nz in north_zulch/src/Makefile replacing %%APP_NAME%% with north_zulch in north_zulch/src/north_zulch.erl replacing %%PFX%% with nz in north_zulch/src/north_zulch.erl replacing %%PFX%% with nz in north_zulch/src/nz_sup.erl replacing %%APP_NAME%% with north_zulch in north_zulch/src/nz_sup.erl replacing %%PFX%% with nz in north_zulch/src/nz_server.erl replacing %%APP_NAME%% with north_zulch in north_zulch/src/nz_server.erl north_zulch has been generated and placed under lib/north_zulch north_zulch_rel has been generated and placed under release/north_zulch_rel
which I am assuming means success. As promised, this created directories otp/lib/north_zulch and otp/release/north_zulch/rel with various things beneath them. A significant file here is otp/lib/north_zulch/vsn.mk which contains the version number. I'm going to leave it at 1.0 for now. In that same directory are include which are for shared macro and record definitions, and src where all the Erlang code files go. Significantly, I'm going to include north_zulch and nz in everything I write here, but you'll obviously want to change that for your application.
In the src directory are various stubs. The north_zulch.erl defines the "application callback module." You can think of a callback module as an approximation to deriving from an asbract base class with virtual functions. The function names are there, and they'll get called under certain well-defined circumstances. It's up to you to make them do something reasonable. There are three functions exported so that the application OTP module knows what to do with us. start/2 is called to get us going, shutdown/0 is called to get us to end, and stop/1 is called when we're all done. We look at these and compare them to the docs here.
start(Type, StartArgs) ->
case nz_sup:start_link(StartArgs) of
{ok, Pid} ->
{ok, Pid};
Error ->
Error
end.
We get two arguments. Type is an atom that will generally be normal for non-distributed applications. The distributed case is more complicated, so we'll look at that later down the road. We see that the default thing to do is to just try to run nz_sup:start_link(StartArgs) which starts up a Supervisor, regardless of the atom in Type. This appears to be the only value that the spec identifies in the non-distributed case, so we'll roll with it. In any case, if all goes well, we have to return {ok, Pid, State}. If State is left out, then it just uses []. We'll ignore State for now. Pid is the process id of the main Supervisor, the boss. If start_link (which we'll talk about in a sec) goes well, it gives us Pid. The stop/1 function is pretty trivial and is called after everything is done. As far as I can tell, shutdown/0 is not actually part of the Application interface, and is just a convenient way for you to shut down your application if you want to. It just goes up and tells the Application module that it wants to be turned off:
shutdown() ->
application:stop(north_zulch).
With that big file under our belt, we now look at nz_sup.erl which contains the code for the supervisor. The Supervisor (also, the spec) is a great idea to build as a system primitive. I think I've implemented it at least five times in various projects. The basic idea is, manage some underlings and make sure they do their thing, while isolating yourself from their errors. If one fails, start up another, modulo your configurable policy about such things. The docs have more info on restart behaviors. We export, for our own purposes, the function start_link/1, which got called by start/1 in the previous file:
start_link(StartArgs) ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
This just turns around and tells the Supervisor API that we want to be fired up as a supervisor callback module. supervisor:start_link/3 has three arguments. The first is a tuple that tells the Supervisor module who we are. The first bit of the tuple can be local or global. The second is the name of the appropriate callback module, which in this case is ?SERVER. What does that mean? Well, things that start with a question mark in Erlang are macros. Up at the top of the file is
-define(SERVER, ?MODULE).
which defines it. ?MODULE is a special one with the module name. Back to start_link/3, we also specify the name of the callback module - this one - using ?MODULE, and any arguments we want to pass along to init/1, which is the other significant function we need to export to use the supervisor:
init([]) ->
RestartStrategy = one_for_one,
MaxRestarts = 1000,
MaxTimeBetRestarts = 3600,
SupFlags = {RestartStrategy, MaxRestarts, MaxTimeBetRestarts},
ChildSpecs =
[
{nz_server,
{nz_server, start_link, []},
permanent,
1000,
worker,
[nz_server]}
],
{ok,{SupFlags, ChildSpecs}}.
This is pretty much where all the guts of the supervisor's behavior are. The one argument you get is whatever you handed to start_link/3 before, so you could pass around various application data if you wanted. We're not worrying about this at the moment, though, and we're just going to look at what this gets up to. First off, we define some variables that control the supervisor's behavior. RestartStrategy determines what restarting should occur should a child fail. If MaxRestarts failures occur before MaxTimeBetRestarts seconds, then all children are cancelled and the supervisor terminates itself. We construct a list of child specifications, that in this case only contains one element: the specs for managing the nz_server process. Within this specification are:
{nz_server,
{nz_server, start_link, []},
permanent,
1000,
worker,
[nz_server]}
The first component is a name that the supervisor uses to keep track of this child. In this case, we use the term nz_server, since there's only one. The second component is a Module-Function-Arguments triplet that tells the supervisor what function should be called in the child. We're going to look at this later, but the function is start_link/0 in the nz_server module. The next argument can be permanent, transient, or temporary and specifies what should happen if it dies. The fourth argument specifies how long we should wait around for a termination signal to be handled by a child. We can create trees of supervisors, so the fifth argument specifies whether this is a worker or a supervisor itself. The final element will generally be a single-element list with the name of the callback module of the worker, in this case nz_server. We return a tuple with these two specs together and they tell the Supervisor how it should act, and how is should initialize its children.
The final source file we examine is nz_server.erl, which implements the application-specific work. At the very top of the file, we see
-behaviour(gen_server).
which means that it implements the gen_server (also, here) behaviour as a callback. The callbacks we need to define for this module to do its thing are init/1, handle_call/3, handle_cast/2, handle_info/2, terminate, and code_change/3. Again, if you come from a background like C++ or Java, you can think of these approximately as virtual event-handling functions. Rather than an onMouseClick() type override, we have various functions that are called when the server receives Erlang messages. These are all exported so that the gen_server module can get to them.
There are also two interface functions that are defined that don't really have anything to do with gen_server. The first is start_link/0, which we mentioned before. This is called by the Supervisor to get things going:
start_link() ->
gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).
You can see that we immediately just hand control off to the generic server. It takes a couple of arguments. The first is a tuple that is like the previous start_link in that we're telling it to look for this module and to register a name locally. The other "local" function is stop/0 which provides an interface to shutting down gen_server by sending it an asynchronous message.
To actually do anything interesting, we'll need to implement the gen_server functions. I'll leave that to the next installment. We'll also look at releases, configurations and logging, which is what we said we were after in the first place!
December 3rd, 2008 at 8:33 pm
Hi ,
i want to build sample erlang application as given in the address:
http://www.trapexit.org/Building_An_OTP_Application
But i am not able to find that otp_base-R1-1.tgz file. It is giving details about Faxian and sinan.
Also I am not able to run that python script from my linux server as it doesn\’t has access to internet.
PLS tell me the exact location of the otp_base-R1-1.tgz file so that I can build that application