Great Post About the RNC Message

5 September 2008 | 16:35 | Civil Liberties, Politics | No Comments

Even though I often agree with them in principle, I pretty much ignore the Daily Kos people as being the Democrat equivalent of the Drudge Report and Rush Limbaugh. However, this post really spoke to me, and makes an observation that I have also been making: the Republicans just seem to want to slam Obama et al, without ever talking about issues. I feel confident that there are planks in McCain’s platform that I would agree with, but they never come up. A few weeks ago, I thought this was a valid complaint against the Obama campaign, but in the interim he’s become much more clear about the specific ways he wants to change things. The McCain campaign has not been doing this. From the article:

All I heard was a long stream of extremely bitter attacks against Barack Obama, none of which go even the slightest step towards solving the problems of this country. When I tuned in, Rudy Giuliani was firing off some attacks, but I expected that - every convention has some room for criticism of the opposition.

But Palin’s speech was obviously meant to be the centerpiece, the real statement about the direction of the Republican party.

And I heard absolutely nothing about their plans for the future.

The article also talks about Palin’s “joke” about “worrying about reading terrorists their rights” and the author is spot on. If we are really the “good guys” in the War on Terror, we have to fight the good fight and we have to support human rights to the very end, no matter what. What the hell is America about if not about fundamental human rights? That’s what the very idea of America is: freedom of speech, freedom of religion, freedom of the press, the right to bear arms, equality in representation, and the right to a fair trial.



NFL Television Broadcast Maps

5 September 2008 | 15:22 | NFL | No Comments

Brenda and I are often wondering what games exactly will be shown in our area on Sundays. We no longer have this problem, however, due to this site. It provides maps of the major network coverage of the upcoming weekend’s games.





Brain Function Discussions

20 August 2008 | 16:14 | Research | No Comments

There has been a really interesting (and inbox-filling) debate about the correct course for brain research on the comp-neuro (that’s “Computational Neuroscience”) mailing list. There is a summary of the discussions here.



ICML Lectures Online

19 August 2008 | 21:24 | Research, Science and Math | No Comments

You can now see a video of a talk I gave at the 2008 ICML/UAI/COLT Workshop on Nonparametric Bayes over at VideoLectures.net, along with a bunch of other talks from the Helsinki conferences, including one given by Oliver Stegle about a paper we wrote together.



Installing Tom Minka’s Lightspeed on Linux

19 August 2008 | 17:26 | Computing, Matlab | No Comments

I am a huge fan of Tom Minka’s Lightspeed Toolbox for MATLAB. I am so dependent on it that I can’t even remember which functions are native to MATLAB and which are Tom’s inventions. However, Tom is at Microsoft Research Cambridge and so he doesn’t spend much time making sure that it works easily on Linux. So every time I install it, I have to modify the installation file “install_lightspeed.m”. So, here is the diff:

25,29c25
< if ispc
< w = fullfile(matlabroot,'toolbox\matlab\elmat\repmat.m');
< else
< w = fullfile(matlabroot,'toolbox/matlab/elmat/repmat.m');
< end
---
> w = fullfile(matlabroot,’toolbox\matlab\elmat\repmat.m’);
84,85d79
< lapacklib = '-llapack';
< blaslib = '-lblas';
148c142
< eval(['mex -f ' fullfile(matlabroot, 'bin/matopts.sh') ' matfile.c']);
---
> mex -f matopts.sh matfile.c

Note that this doesn’t really fix everything. I still get errors regarding ‘matfile.c’ but it’s an improvement.



Radford Neal Has a Blog

19 August 2008 | 16:14 | Research, Science and Math | No Comments

You won’t find many people with Bayesian chops like Radford Neal at the University of Toronto. Fortunately for the rest of us, he’s started a blog.



AI and Statistics 2009 Call for Papers

17 August 2008 | 21:24 | Research, Science and Math | No Comments

AISTATS*09 Call for Papers
Twelfth International Conference on Artificial Intelligence and
Statistics
April 16-19, 2009, Clearwater, Florida USA

This is the twelfth conference on Artificial Intelligence and
Statistics, an interdisciplinary gathering of researchers at the
intersection of computer science, statistics, and related areas. Since
its inception in 1985, the primary goal of this conference has been to
broaden research in both of these fields by promoting the exchange of
ideas between them. We encourage the submission of all papers which
are in keeping with this objective.

Presentations will include invited talks, contributed talks, and
posters. Papers for poster sessions will be treated equally with
papers for oral presentation in publications.

Papers on all aspects of the interface between AI & Statistics are
strongly encouraged, including but not limited to:

* Active learning and experimental design
* Approximate and Monte Carlo Inference
* Bayesian statistics
* Causal modeling
* Density Estimation
* Information retrieval
* Game theory
* Cluster analysis and unsupervised learning
* Graphical models
* Integrated man-machine modeling methods
* Interpretability in modeling
* Kernel methods
* Knowledge discovery in databases
* Metadata and the design of statistical data bases
* Model uncertainty, multiple models
* Online learning
* Pattern recognition
* Prediction: classification and regression
* Probability and search
* Semi-supervised learning
* Statistical decision making
* Vision, robotics, natural language processing, speech
recognition
* Visualization of very large datasets

Submission Requirements:

Papers may be up to 8 double-column pages in length, and must be
submitted electronically by 23:59, Wednesday November 5, 2008,
Universal Time.

Acceptance notices will be emailed by early January, and camera-ready
final versions (same format) will be due in late February 2009. These
papers will published in the refereed conference proceedings.

For more information, see the AI and Statistics Webpage:
http://www.ics.uci.edu/~aistats/

Program Chairs:
* David van Dyk, University of California, Irvine
* Max Welling, University of California, Irvine



Learning about Geometric Algebra

17 July 2008 | 20:03 | Science and Math | No Comments

David said I should learn about Geometric Algebra. So I wrangled Steve into an explanation over beers at the Fort St. George. It looks very interesting and I wonder if there aren’t uses for it in machine learning. Here are some tutorials:



Walmart Growth Over Time

12 July 2008 | 7:22 | General | No Comments

This fantastic map shows the expansion of Walmart locations across the US over time.



Infinite Gaussian Mixture Modeling with FBM

29 May 2008 | 21:24 | Computing, Research, Science and Math | No Comments

I am writing a paper on nonparametric Bayesian density modeling and I would like to compare my technique to the standard approach of the infinite mixture of Gaussians (iMoG). You can read Carl Rasmussen’s paper to get a feel for what it’s all about. My plan is to look at hold-out log probabilities on real data. To do this, I need to have an implementation of iMoG that can give me predictive logprobs. There are a couple of MATLAB implementations (one, two), but they don’t (as far as I can tell) provide true predictive estimates directly. Rather, they give posterior samples from the parameters and you have to do something like (Chib and Jeliazkov, 2001) to get estimates of the logprobs.

Radford Neal, on the other hand, has his Software for Flexible Bayesian Modeling and Markov Chain Sampling (FBM) that implements mixtures of Gaussians. I have to learn how to use it anyway, since it seems to be the only implementation of another nonparametric density estimation procedure that I’m interested in: Dirichlet Diffusion Trees. So this post is going to be about figuring out how to do mixture models with FBM. Radford provides an example of bivariate density estimation here.

Here is my setup: my data is 5-dimensional, and I have 200 training cases and 28 test cases. I have whitened the data so that it has the sample statistics of a spherical Gaussian. I put these data into two files that are just comma-delimited with a line for each case and a column for each dimension. This is following the conventions described here as I understand them.

The first thing we do is use the mix-spec command. This command creates the “log file.” Log files in FBM are the “documents” that one operates on. They contain all of the model and results, etc. The syntax is:

mix-spec log-file N-inputs N-targets [ N-components ]
/ concentration SD-prior [ mean-prior ]

“log-file” is the name log “document” file - I’m going to call mine “mog.log.” “N-inputs” you can pretty much ignore for now, as it doesn’t seem to be implemented. Just use zero. “N-targets” is the number of variables that you wish to find the joint distribution over. In my case, this is going to be five. “N-components” is how many Gaussians you want in your mixture. If you leave it out you get the infinite Dirichlet process mixture, which is what we want. The “concentration” is the parameter that in a sense determines the “variance” of the weights that you get out of the Dirichlet prior. In the infinite case, we have to specify it as a constant multiplied by the number of components, and so we preface it with an “x”. I don’t know what a good choice is, so I’m going to pick 5 and write “x5″. The “SD-prior” is this big stack of priors on the widths of things, described here. I don’t feel like I understand very well how to specify these giant stacks of hyperpriors. The last parameter is the standard deviation of the mean, as far as I can tell. Overall, the command I’m issuing is:

> mix-spec mog.log 0 5 / x1 0.05:0.5:0.2 10

After this, you need to issue the model-spec command. The first argument is the log file, then the word “real” if you’re modeling real data. Then you issue another cryptic command about what I think is the prior on the actual mixture Gaussians. I’m just doing the example thing, since I don’t know better:

> model-spec mog.log real 0.05:0.5:0.5:1

Next, you give it data, using the data-spec command. The first argument is the log file, so “mog.log.” The next argument is “input attributes” which is zero, because it isn’t used. Then are “target attributes” which should be 5, since we have five dimensions. Then the slash and we specify our training file, followed by a “.” since we don’t have training inputs, then the same thing again for the test file:

> $FBM/data-spec $LOG 0 5 / \
macaque-5d-train1.dat . macaque-5d-test1.dat .

So, now we have the model set up and we have data associated with it. Now it’s time for some inference. You have to invoke some special commands for the infinite case that I’m just going to take directly from the example:

> mc-spec mog.log repeat 20 met-indicators 10 gibbs-params gibbs-hypers

Now, I ran it for a bunch of iterations:

> mix-mc mog.log 10000

And after this, I looked at what it came up with:

> mix-display mog.log

But most importantly, I wanted to see the logprobs it found:

> mix-pred p mog.log 5000:

The “p” means “give me each log probability” and the “5000:” says “start after the 5000th iteration.”

This seems to maybe actually do what I want…



Rock Band - Best Cooperative Game Ever

7 May 2008 | 16:19 | Gaming | No Comments

Our downstairs neighbors, who are our very very good friends are moving away. They’re huge karaoke addicts and so we got them a fun parting gift: Rock Band for the PS2. This game is so much fun, it’s painful. Obviously, I know people have liked it, and Guitar Hero has been a big hit, but I’ve never tried it. What’s really, really neat, though is that it is the best cooperative gaming setup I’ve ever seen. With the exception of minigames in Mario Party type games, cooperative play is pretty meager over all. In Rock Band, however, everyone is on equal footing and contributing to the experience. It’s pretty awesome. We pulled it out at a little house party and everyone jumped in. We’re planning a “battle of the bands” at an upcoming party where the bands are selected out of a hat and everyone has to do each instrument for at least one song. It should be pretty hilarious. The only disappointment is that all of the new downloadable content won’t be available for the PS2 version. We’re already itching for new songs.



Holly Dunsworth on NPR

5 May 2008 | 14:54 | General, Science and Math | No Comments

One of our very good friends, Holly Dunsworth, is going to be on National Public Radio (NPR) for “This I Believe” this coming weekend as a part of “Weekend Edition.” You can read her essay here. An excerpt:

I believe in evolution. It’s easy. It’s my life. I’m a paleoanthropologist. I study fossils of humans, apes, and monkeys and I teach college students about their place in nature.

Of course I believe in evolution.

But why then do I answer “no” when you ask me if I “believe in evolution”? Because if you have to ask me if I believe in evolution, than to you evolution is controversial and something to believe in as opposed to God. I answer “no” because I want to separate evolution from religion, from paralleling it with a belief in a deity.

No one has ever asked me if I believe in gravity or electricity. How absurd.



Video Card Ugliness

2 May 2008 | 16:03 | Computing, Gaming | No Comments

I have this fairly nice setup for my home office, since I work from home more or less exclusively. I like to play the occasional PC game, so I have a reasonable video card setup. I had two eVGA 7600GT video cards in an SLI configuration. Well, this all turned sour last Saturday. I pulled out the cards and…


Two broken eVGA 7600GT cardsTwo broken eVGA 7600GT video cards

Can you see the problem? Here’s a pair of close-ups:

Two broken eVGA 7600GT video cards.Busted capacitors on a 7600GT

So I ordered an upgrade (no SLI this time around, since I’m having to pay for it myself) of a PNY 8800GT. It seems pretty nice and was only about $150 after rebate from Newegg. I was tempted to go back to eVGA, but I had a pretty bad experience with them when I was setting this box up. I decided to branch out.



GP Product Model Paper Accepted at ICML 2008

1 May 2008 | 18:43 | Research, Science and Math | No Comments

Oliver Stegle and I just got our paper Gaussian Process Product Models for Nonparametric Nonstationarity (pdf) accepted at the 25th International Conference on Machine Learning in Helsinki. Here is the abstract:

Stationarity is often an unrealistic prior assumption for Gaussian process regression. One solution is to predefine an explicit nonstationary covariance function, but such covariance functions can be difficult to specify and require detailed prior knowledge of the nonstationarity. We propose the Gaussian process product model (GPPM) which models data as the pointwise product of two latent Gaussian processes to nonparametrically infer nonstationary variations of amplitude. This approach differs from other nonparametric approaches to covariance function inference in that it operates on the outputs rather than the inputs, resulting in a significant reduction in computational cost and required data for inference. We present an approximate inference scheme using Expectation Propagation. This variational approximation yields convenient GP hyperparameter selection and compact approximate predictive distributions.



Global Comparison of Acceptance of Evolution

20 April 2008 | 22:25 | Politics, Research, Science and Math | No Comments


I believe the original article is in Science, here.

The article is only available if you or your institution has an AAAS membership. This behavior seems a bit odd for a group calling it self an “Association for the Advancement of Science.”



Good Cop at Protest

16 March 2008 | 17:54 | Civil Liberties, General, Politics | No Comments

There are good cops out there, but sometimes the news makes it seem like they’re all up to no good. Here’s a short video of a good cop doing his job well at a protest in LA. It seemed like it was worth sharing.



Ungoogleable Erlang Documentation

20 February 2008 | 1:16 | Computing, Erlang | No Comments

While I’m complaining about Erlang: why doesn’t Google ever return any hits on the documentation? If I google “perl sprintf” the first hit is the documentation page http://perldoc.perl.org/functions/sprintf.html. The same thing happens if I Google “python array” or “lisp map” or “php echo.” If I type “erlang supervisor” I don’t get anything manual-like until the 18th hit with this page, which isn’t even the man page. Moreover, that page doesn’t even link to the actual manual page despite referring to it:

This section should be read in conjunction with supervisor(3), where all details about the supervisor behaviour is given.

In fact, even if you start at the documentation page, it’s not clear how you would find supervisor (3) aside from actually typing “man supervisor” at the command prompt. The Erlang Reference Manual (with its HORRIBLE HORRIBLE FRAMES) doesn’t actually talk about the supervisor behaviour, presumably because this is technically an OTP thing. Eventually, I go to the index and there it is. Why is this so hard? I feel like I can’t be the only one who finds this frustrating.



Erlang PostgreSQL Roundup

20 February 2008 | 0:56 | Britain, Computing, Erlang | No Comments

Like just about everything to do with Erlang, database driver support appears to be in total disarray. I’d like to be able to store data in a PostgreSQL database and access it reasonably well. Options appear to be

  • Erlang psql driver that is a fork or something of the code by Erlang Consulting. You can’t even directly download it. You have to check it out from SVN. It doesn’t even have a README file.
  • On Jungerl there claims to be psql project, but you can’t download it or anything without apparently downloading all this other stuff. In fact, like the one above, you can only check it out via CVS. The CVS repository for it is viewable here. It only claims to be able to perform “simple commands” and the code doesn’t look like it’s been updated in a while.
  • Erlang Consulting has released some code, but no examples or anything about how to use it.
  • There is apparently an ODBC implementation, but from the mailing list it sounds like it is very slow and doesn’t work well in Linux, while also not implementing many of PostgreSQL’s features.
  • ejabberd also has an implementation

There’s a blog post here from 2006 where Ernie Makris claims to have written a kick-ass interface and is going to tell us all about it and post it. Unfortunately he never posts on his blog again. There is an extensive thread here where people express the same frustrations as I am. This thread is only a couple of months old, so maybe things have gotten somewhere.

It appears, anecdotally, that the one on Jungerl by Christian Sunesson is pretty stable, and it seems to be relatively standalone. I will give this one a shot first and see where it gets me. Connection seems straightforward enough:

{ok, Db} = pgsql:connect(”host”, “database”, “user”, “password”).

My first little query:

pgsql:squery(Db, “SELECT NOW()”).

After turning off SSL in the postgresql.conf file, I got back the quite-reasonable answer:

{ok,[{"SELECT",
[{desc,0,"now",timestamptz,text,8,-1,0}],
[[< <"2008-02-19 18:55:06.87229-05">>]]}]}

So, I’m cautiously optimistic that I might be able to make this work.

UPDATE:
It looks like the ejabberd stuff may actually be the way to go. It is a branch of the Jungerl work by Christian Sunesson and it appears to be under active development. Specifically, they have implemented things with gen_server, which would seem to be a big improvement.



XML Stream Parsing in Erlang, II

19 February 2008 | 22:37 | Computing, Erlang, Gaming, General | No Comments

In my previous post, I complained a lot about trying to get XML stream parsing working. Ultimately, I just decided to rip the guts out of ejabberd, rather than reinvent the wheel. The relevant files are xml_stream.erl, xml.erl, and expat_erl.c. You can see how to use it in ejabberd_receiver.erl. Frankly, these guys seem to have it pretty well figured out. The way I went about it is this: using the tcp-client gen_fsm from here, I added init code to load up the library in start_link, with something like this:

case erl_ddll:load_driver(”ebin”, expat_erl) of
ok -> ok;
{error, already_loaded} -> ok;
{error, Reason} ->
error_logger:error_msg(”Could not load expat driver: ~p~n”,
erl_ddll:format_error(Reason))
end

I added a bit to the fsm state record to manage the xml state. When a client connects and the process gets started for it in the 'WAIT_FOR_SOCKET'({socket_ready, Socket}, State) when is_port(Socket) call, I initialize the xml parser xml_stream:new(self()) and store that in the record. Upon receiving data in 'WAIT_FOR_DATA'({data, Data},, I call NewXmlState = xml_stream:parse(XmlState, Data) and update the record with the new state. Then, messages just appear in the fsm queue and I handle them. What I do, specifically, is set up another fsm and send it messages that have been parsed, so that it essentially abstracts away the tcp and the xml. My plan is to add udp functionality that uses the same interface, so that datagrams look just like stanza’d messages and the process doesn’t need to know where the data came from.