Speed up python execution with Nuitka (a python to C++ compiler)

= Overview

* Nuitka compiles python 2.6 and 2.7 into C++ that calls into libpython. It claims to execute the Cpython 2.6 test suite correctly.
* Can be used to speed up programs where majority of the time is spent executing python instructions (as opposed to calling into native libraries or doing i/o).
* Its authors claim a 0 to 258% speedup on pystone micro-benchmarks. Some micro-benchmark figures here.

* Written by Kay Hayen.

= Usage:

* It works out of the box and with zero config.  Manual.

* Create an exe from your python code:
 $nuitka --exe  ga1.py
* Optionally recurses into modules, with module level granularity, with command line switches:
 $nuitka --exe --recurse-to=pyevolve ga1.py
* Run the exe instead of the python script:
 $./ga1.exe

= Real world use case and benchmark: speeding up pyevolve

Machine learning libraries written in python are good candidates for nuitka because a lot of the programs are CPU bound executing python code. Often times these are batch programs that operate on large datasets, and can take time to converge to an optimal solution, so faster execution at the cost of upfront compile time helps.

The synthetic benchmark test below is the stock pyevolve “2D List genome” example, but increasing sizes in each dimension. It gets a speedup of around 13% when compiled to C++. Note that the code in the eval_func in this case is only a handful of lines. When this function has more code, (and more likely to contain the constructs with better speedups) , the overall speedup is generally better.

Benchmark: pyevolve sample code speedup ()

$ /usr/bin/time python ga1.py  > /dev/null
71.06user 0.10system 1:11.49elapsed 99%CPU (0avgtext+0avgdata 19856maxresident)k
0inputs+0outputs (0major+1400minor)pagefaults 0swaps

$/usr/bin/time ./ga1.exe  > /dev/null
60.93user 0.65system 1:01.76elapsed 99%CPU (0avgtext+0avgdata 25840maxresident)k
0inputs+0outputs (0major+1728minor)pagefaults 0swaps

speedup = 9.579 sec or 13.4%

= Potentially better speedups with more compile time

$nuitka --exe --recurse-to=all  ga1.py

You can optionally recurse into all imported modules or standard library modules for a better speedup. However compile times  become extremely long (several minutes).

= Gotchas:

* Head to head microbenchmarks vary widly. As always, measure your code. e.g. For loops show great speedups (216%) but string concatenation is about the same ( -0.6%).  See here for the list.

* Compiling python programs to executables takes time.  You can use the -j switch to parallelize compilation.

// compile time of the above sample on a cure2duo vintage dual core machine

$~/dev$ /usr/bin/time  nuitka –exe –recurse-to=pyevolve  ga1.py  > /dev/null
107.62user 2.30system 1:01.05elapsed 180%CPU (0avgtext+0avgdata 404720maxresident)k
0inputs+30928outputs (0major+350911minor)pagefaults 0swaps

* Nuitka requires a recent C++11 compiler to work (gcc 4.4 and up)

= See also

Cython, pythran, PyPy, Numba

 

Advertisements

One thought on “Speed up python execution with Nuitka (a python to C++ compiler)

  1. “but string concatenation is about the same ( -0.6%)” Could this be to due to the Schlemiel the Painter issue with C string concatenations? It is possible that for each string we go from the start to the end till we find a null character and then append it there and then as the string becomes larger the time taken for concatenation increases.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s