I am writing a paper on nonparametric Bayesian density modeling and I would like to compare my technique to the standard approach of the infinite mixture of Gaussians (iMoG). You can read Carl Rasmussen’s paper to get a feel for what it’s all about. My plan is to look at hold-out log probabilities on real data. To do this, I need to have an implementation of iMoG that can give me predictive logprobs. There are a couple of MATLAB implementations (one, two), but they don’t (as far as I can tell) provide true predictive estimates directly. Rather, they give posterior samples from the parameters and you have to do something like (Chib and Jeliazkov, 2001) to get estimates of the logprobs.
Radford Neal, on the other hand, has his Software for Flexible Bayesian Modeling and Markov Chain Sampling (FBM) that implements mixtures of Gaussians. I have to learn how to use it anyway, since it seems to be the only implementation of another nonparametric density estimation procedure that I’m interested in: Dirichlet Diffusion Trees. So this post is going to be about figuring out how to do mixture models with FBM. Radford provides an example of bivariate density estimation here.
Here is my setup: my data is 5-dimensional, and I have 200 training cases and 28 test cases. I have whitened the data so that it has the sample statistics of a spherical Gaussian. I put these data into two files that are just comma-delimited with a line for each case and a column for each dimension. This is following the conventions described here as I understand them.
The first thing we do is use the mix-spec command. This command creates the “log file.” Log files in FBM are the “documents” that one operates on. They contain all of the model and results, etc. The syntax is:
mix-spec log-file N-inputs N-targets [ N-components ]
/ concentration SD-prior [ mean-prior ]
“log-file” is the name log “document” file - I’m going to call mine “mog.log.” “N-inputs” you can pretty much ignore for now, as it doesn’t seem to be implemented. Just use zero. “N-targets” is the number of variables that you wish to find the joint distribution over. In my case, this is going to be five. “N-components” is how many Gaussians you want in your mixture. If you leave it out you get the infinite Dirichlet process mixture, which is what we want. The “concentration” is the parameter that in a sense determines the “variance” of the weights that you get out of the Dirichlet prior. In the infinite case, we have to specify it as a constant multiplied by the number of components, and so we preface it with an “x”. I don’t know what a good choice is, so I’m going to pick 5 and write “x5″. The “SD-prior” is this big stack of priors on the widths of things, described here. I don’t feel like I understand very well how to specify these giant stacks of hyperpriors. The last parameter is the standard deviation of the mean, as far as I can tell. Overall, the command I’m issuing is:
> mix-spec mog.log 0 5 / x1 0.05:0.5:0.2 10
After this, you need to issue the model-spec command. The first argument is the log file, then the word “real” if you’re modeling real data. Then you issue another cryptic command about what I think is the prior on the actual mixture Gaussians. I’m just doing the example thing, since I don’t know better:
> model-spec mog.log real 0.05:0.5:0.5:1
Next, you give it data, using the data-spec command. The first argument is the log file, so “mog.log.” The next argument is “input attributes” which is zero, because it isn’t used. Then are “target attributes” which should be 5, since we have five dimensions. Then the slash and we specify our training file, followed by a “.” since we don’t have training inputs, then the same thing again for the test file:
> $FBM/data-spec $LOG 0 5 / \
macaque-5d-train1.dat . macaque-5d-test1.dat .
So, now we have the model set up and we have data associated with it. Now it’s time for some inference. You have to invoke some special commands for the infinite case that I’m just going to take directly from the example:
> mc-spec mog.log repeat 20 met-indicators 10 gibbs-params gibbs-hypers
Now, I ran it for a bunch of iterations:
> mix-mc mog.log 10000
And after this, I looked at what it came up with:
> mix-display mog.log
But most importantly, I wanted to see the logprobs it found:
> mix-pred p mog.log 5000:
The “p” means “give me each log probability” and the “5000:” says “start after the 5000th iteration.”
This seems to maybe actually do what I want…