The Baum-Welch algorithm for hidden Markov Models: speed comparison between octave / python / R / scilab / matlab / C / C++


In this comparison, I have programmed in a comparable way the Baum-Welch algorithm for discrete Hidden Markov Models (HMM) in several programming languages often used by probabilists and statisticians.

"A comparable way" means that the code follows the same lines in all languages, but tries to take advantage of each language's strengths (e.g., vectorialization in matlab). Each code is probably improvable (I will appreciate any kind of advice: please send me an email!), but the aim of this comparison is precisely to compare "average" implementations under different programming languages.

Two scenarios are considered:

  1. the first scenario is "small" and has a pedagogical interest: inspired by genomics, it contains a hidden chain with two states, and a four-letter observation alphabet;
  2. the second, "medium" scenario involves 7 hidden states and 7 possible observations;
  3. the third, "large" scenario involves 50 hidden states and 50 possible observations.

The experiment run for each language consists in 100 Monte-Carlo repetitions, and has been run on my desktop machine. In each repetition, a sequence of 10000 successive observations is created, filtered and smoothed. Then, an approximation of the maximum likelihood estimate is computed using the Expectation-Maximization (EM) algorithm (starting from 10 random points drawn independently).

The EM iterations are stopped either when the change in parameters becomes smaller than 1e-4 in 1-norm, or after 100 iterations. With this stopping criterion, the number of iterations is almost always 100 (which gives more reliable results for the speed comparison).

The code, provided below in section "Download", allows everyone to replicate this experiment very easily, and to try other choices of parameters.


I decided to do this comparison because a colleague of mine, who was coding a variant of the Baum-Welch algorithm in R for some involved and demanding application, was facing speed problems: he wanted to know if it was because of the programming language he was using, and what option to choose to solve his problem. He wanted a avoid a too costly solution, both in terms of price and in terms of speed of development.
I realized that though I had a rough idea of the comparative efficiency of all popular languages, I could not tell him in what proportions the use of these options would solve his problem.
More generally, my main motivations in this comparison are the following:

Description of the code

In each language, a Makefile should make the replication of this experiment very easy. Besides, the code might be re-used in applications or teaching; the following functions are provided:

The C++ code was written by Pierre Pudlo, who adapted the scilab code to use Armadillo, a C++ linear algebra library possibly relying on LAPACK, BLAS and ATLAS that aims at providing "good balance between speed and ease of use".

For R, matlab, octave and python, the C extension providing a much faster computation of the filtering/smoothing phase requires to be compiled. This is straightforward by using "make".

The python code makes use of the numpy library; its style is a bit "matlabic" as it makes use of matrix multiplications when loops could also be considered.

The C code is compiled with -O2 optimization option.

The matlab code is used without modification in octave.

The high-level languages allow to include some functions programmed in C, providing a more or less handy interface. I have tried this possibility, operating only filtering and smoothing in C. For python, I did a "hand-made" extension; for R, I used the standard (but easy to use) ".C" interface; for matlab, I have used "mex"; Judd Storrs wrote me an email explaining how to use "mkoctfile --mex" for octave.


Here is a bundle with all the code.

If you are interested in one particular language (with the C extension when available), here are the R, matlab / octave, scilab, python, C++ with armadillo, and pure C codes. In all cases, the benchmark with 10 repetitions of scenario 1 can be run using "make benchmark". In most cases, editing the Makefile is sufficient to run other scenarios, or to use the C filtering/smoothing functions.


"Small" scenario 1:

python python + C R R + C scilab matlab matlab + C octave octave + C C++ (armadillo) pure C
creation of the sample 0.076 0.076 0.25 0.25 0.32 0.031 0.031 0.59 0.59 0.00082 0.00053
filtering/smoothing 0.29 0.00072 0.33 0.0028 0.14 0.076 0.00072 0.68 0.00072 0.0017 0.00054
10 EM 296 2.14 333 9.20 145 76.5 1.60 691 3.38 2.5 0.94

"Medium" scenario 2:

python python + C R R + C scilab matlab matlab + C octave octave + C C++ (armadillo) pure C
creation of the sample 0.14 0.14 0.27 0.27 0.33 0.032 0.032 0.62 0.62 0.0012 0.00058
filtering/smoothing 0.32 0.0035 0.38 0.0067 0.18 0.081 0.0030 0.70 0.0027 0.0038 0.0022
10 EM 331 15.2 395 19.4 229 87.1 10.9 777 13.0 9.55 2.81

"Large" scenario 3:

python python + C R R + C scilab matlab matlab + C octave octave + C C++ (armadillo) pure C
creation of the sample 0.28 0.28 0.30 0.30 0.38 0.042 0.042 0.63 0.63 0.0058 0.0015
filtering/smoothing 0.57 0.14 0.68 0.14 0.92 0.15 0.094 0.82 0.083 0.064 0.073
10 EM 688 355 858 384 1473 282 315 980 331 246 43

Times are given in seconds. These measures are not meant to be extremely precise: what matters is the order of magnitude of the ratios.

Conclusions and comments