September 13, 2014

Further Adventures in Python Optimisation

Previously we found that PyPy achieves the best performance gain, executing fieldfunc_py in ~6 us. At the end of that article, I mentioned that the C implementation is up to 50x faster, managing the same calculation in ~0.12 us.

The naive conclusion is that the best thing is to simply call the C function to do the heavy lifting, achieving performance somewhere between PyPy and C. But nothing in life is easy...

Naive C Interfacing

The C code in the previous article was compiled using

gcc -O -shared -o -fPIC c_fieldfunc.c

The MagnetElement class was then amended to use the C function whenever possible:

class MagnetElement(object):
  def __init__(self, position, size, magnetisation, fieldcalcfunc=fieldfunc_fast_py):
    position, and magnetisation are all expected to be numpy arrays of 3
    elements each.

    size is a single number, implying all elements are square

    self.position = position
    self.size = size
    self.magnetisation = magnetisation
    self.moment = magnetisation * size * size * size
    self._fieldcalcfunc = fieldcalcfunc

      import ctypes
      cmodule = ctypes.cdll.LoadLibrary('')
      self._fieldcalcfunc = cmodule.fieldfunc
      self.fieldAt = self._cfieldAt
    except OSError, ex: 

  def _cfieldAt(self, p): 
    import ctypes
    def voidp(x):
      return ctypes.c_void_p(
    field = np.array([0,0,0], np.double)
    return field

  def fieldAt(self, p): 
    p is expected to be a numpy array of 3
    return self._fieldcalcfunc(p, self.position, self.moment)

This however yielded a per-run time of ~28 us!! So clearly there is significant cost in interfacing. The most obvious of these is creating a new np.array each time, and defining and calling voidp.

A Better C Interface

Below is an improved version:

class MagnetElement(object):
  def __init__(self, position, size, magnetisation, fieldcalcfunc=fieldfunc_fast_py):
    position, and magnetisation are all expected to be numpy arrays of 3
    elements each.

    size is a single number, implying all elements are square

    self.position = position
    self.size = size
    self.magnetisation = magnetisation
    self.moment = magnetisation * size * size * size
    self._fieldcalcfunc = fieldcalcfunc

      import ctypes
      cmodule = ctypes.cdll.LoadLibrary('')
      self._fieldcalcfunc = cmodule.fieldfunc
      self.fieldAt = self._cfieldAt
      self._field = np.array([0,0,0], np.double)

      def voidp(x):
        return ctypes.c_void_p(
      self._position_p = voidp(self.position)
      self._moment_p = voidp(self.moment)
      self._field_p = voidp(self._field)
    except OSError, ex: 

  def _cfieldAt(self, p): 
    import ctypes
    return self._field

  def fieldAt(self, p):
    p is expected to be a numpy array of 3
    return self._fieldcalcfunc(p, self.position, self.moment)

Now we are down to ~7 us, which is about what PyPy gave us. We are still slower, and more effort is involved compared to the installing PyPy + numpy. That said, this method has the advantage that it is compatible with existing Python and numpy installations, and can be used to optimise python code that uses parts of numpy that are not yet implemented in PyPy.


  • C interfacing needs to be done carefully to gain maximum benefit
  • Not quite as easy as using PyPy, but more compatible


While we achieved performance similar to PyPy using C interfacing in this one function, PyPy is going to give faster performance across the entire program, while Python+C will only speed up this one function. In my particular case, Python+C is over all ~2x as slow as PyPy.


Adventures in Python Optimisation

Recently I found myself looking at the following piece of code and trying to optimise it:

from __future__ import division

import numpy as np

MagneticPermeability = 4*np.pi*1e-7
MagneticPermeabilityOver4Pi = MagneticPermeability / (4*np.pi)

def fieldfunc_py(p, q, moment):
  r = p - q 
  rmag = np.sqrt(

  rdash3 = rmag**3
  rdash5 = rmag**5

  rminus5term = r * * 3 / rdash5
  rminus3term = moment / rdash3

  field = (rminus5term - rminus3term) * MagneticPermeabilityOver4Pi
  return field

This, I have been told, computes the field at p due to a magnetic di-pole at q.

To profile this code I wrote the following function:

def profile(fieldfunc, dtype=np.double):
  def V(*args):
    return np.array(args, dtype)

  q = V(0, 0, 0)
  moment = V(1, 0, 0)
  p = V(10,10,10)

  def f():
    fieldfunc(p, q, moment)

  # force the JIT or whatever to run

  from timeit import Timer
  t = Timer(f)
  runs = 100000
  print fieldfunc.__name__,'per run', t.timeit(number=runs)*1e6/runs, 'us'

This yields a per-run time of ~22 us.

Optimise Python/Numpy

The first optimisation then is to write a better optimised version:

def fieldfunc_fast_py(p, q, moment):
  r = p - q
  rmag2 =, r)

  rdash3 = rmag2**1.5
  rdash5 = rmag2**2.5

  rminus5term = r *, r) * 3 / rdash5
  rminus3term = moment / rdash3

  field = (rminus5term - rminus3term) * MagneticPermeabilityOver4Pi
  return field

The main difference is using and avoiding np.sqrt(). Using ** is faster than using square root and multiplication. These changes gets us down to ~18 us or so, so slight improvement.


Numba offers JIT compilation of python into native code, and is pretty easy to use and compatible with numpy. To use it the profile function was called as:


Using jit without type arguments causes numba to infer the type. This yields per-run time of ~22 us for fieldfunc_py and ~18 us for fieldfunc_fast_py. There is no noticable speed up, in fact per-run times increased slightly. This is possibly due to the fact that the code is already pretty well optimised, and the extra overhead of interfacing between compiled code and the python runtime.


PyPy is an altnerative implementation of Python that also uses JIT, but does it for the entire program. It is a little harder to use because it needs its own version of numpy, which is only partially implemented. For our needs this will suffice.

With PyPy, we see significant improvement. fieldfunc_py's per-run time went down to ~6 us, and fieldfunc_fast_py's per-run time went down to ~6 us also. This is a huge improvement.

Data Type Matters

The diligent reader may have noticed that profile accepts dtype as a keyword arguments. This was used to test the effects of using np.float32 vs np.float64. The following code was written:

for dtype in np.float32, np.float64:
  print 'With', dtype
  profile(fieldfunc_py, dtype)
  profile(fieldfunc_fast_py, dtype)

    from numba import jit
    print '==> Using numba jit'

This yielded the following output using CPython:

With <type 'numpy.float32'>
fieldfunc_py per run 30.8365797997 us
fieldfunc_fast_py per run 27.7728509903 us
==> Using numba jit
fieldfunc_py per run 22.2719597816 us
fieldfunc_fast_py per run 18.1778311729 us

With <type 'numpy.float64'>
fieldfunc_py per run 21.8909192085 us
fieldfunc_fast_py per run 17.5257897377 us
==> Using numba jit
fieldfunc_py per run 22.4718093872 us
fieldfunc_fast_py per run 18.2463383675 us

Choice of data type has a significant effect, and it counter-intuitive: the larger data type is faster. Interestingly the same isn't true for PyPy, which outputted:

With <type 'numpy.float32'>
fieldfunc_py per run 6.29536867142 us
fieldfunc_fast_py per run 6.067070961 us

With <type 'numpy.float64'>
fieldfunc_py per run 6.28163099289 us
fieldfunc_fast_py per run 6.17563962936 us

Which is essentially the same.

How Fast Can It Be Done?

The final question is, how fast can this be done? Here is a naive C implementation:

#include <math.h>
#include <stdio.h>

typedef double dtype;

dtype vdot(dtype *u, dtype *v) {
  return u[0]*v[0] + u[1]*v[1] + u[2]*v[2];

 * Computes w=u+v
void vadd(dtype *u, dtype *v, dtype *w) {
  w[0] = u[0] + v[0];
  w[1] = u[1] + v[1];
  w[2] = u[2] + v[2];

 * Computes w=u-v
void vsub(dtype *u, dtype *v, dtype *w) {
  w[0] = u[0] - v[0];
  w[1] = u[1] - v[1];
  w[2] = u[2] - v[2];
 * Computes w = x*u, returns w
void vmult(dtype x, dtype *u, dtype *w) {
  w[0] = x*u[0];
  w[1] = x*u[1];
  w[2] = x*u[2];

#define MagneticPermeability          (4*M_PI*1e-7f)
#define MagneticPermeabilityOver4Pi   (MagneticPermeability/(4*M_PI))

void vprint(dtype *v) {
  printf("(%e, %e, %e)\n", v[0], v[1], v[2]);

void fieldfunc(dtype *p, dtype *q, dtype *moment, dtype *out) {
  dtype r[3];
  vsub(p, q, r);
  dtype rmag2 = vdot(r,r);

  dtype rdash3 = pow(rmag2, 1.5);
  dtype rdash5 = pow(rmag2, 2.5);

  dtype t5[3];
  vmult(vdot(moment, r), r, t5);
  vmult(3/rdash5, t5, t5);

  dtype t3[3];
  vmult(1/rdash3, moment, t3);

  vsub(t5, t3, out);
  vmult(MagneticPermeabilityOver4Pi, out, out);

int main(int argc, char **argv) {
  dtype q[] = {0, 0, 0};
  dtype p[] = {10, 10, 10};
  dtype moment[] = {1, 0, 0};
  dtype field[3];

  fieldfunc(p, q, moment, field);

  int x;
  for (x = 0; x < 1000000; ++x) {
    fieldfunc(p, q, moment, field);

  return 0;

timing was done as follows:

$ gcc -o c_bench c_bench.c -lm && time ./c_bench 
(0.000000e+00, 1.924501e-11, 1.924501e-11)

real    0m0.116s
user    0m0.114s
sys     0m0.001s

Because we do 1 million iterations, per-run time is ~0.12 us, or around 50 times faster.


  • Most gain/effort comes from using PyPy
  • In this particular case, numba did not yield signficant speed up
  • C is still much faster


August 30, 2014

Play Child of Light in English on PS Vita

If you want to play Child of Light on PS Vita, you will not only need to download the online update, you will also need to set your system language to English (US). If you have it set as English (UK) then you will continue seeing the game in Japanese.

I know right?

August 27, 2014

From Lyx/Latex to Word

This is sort of a placeholder post. Busy meeting a deadline, but this should help future Steve and anyone else when you need to turn your Lyx document into a Word document while keeping the format mostly sane. Broken, but sane.
  1. Export as Latex (plain).
  2. Run
    • latex <name of tex file, with or without extension>

  3. Run
    • bibtex <filename>

  4. Run
    • latex <filename>

  5. Run
    • latex <filename>

  6. Try to run
    • htlatex <filename> "html,0,charset=utf-8" "" -dhtml/
      • html: format to output
      • 0: normally chapters go into their own page, putting 0 here forces everything into a single page
      • charset=utf-8: let us be civilised
      • -dhtml/: puts the output files in a html sub-directory. Note that you can't have a space between -d and the html/

  7. If the above fails with something like 'illegal storage address', and you get a warning about text4ht.env not been found, then you need to find where it is in your TeX installation, and:
    • export TEX4HTENV and try again
    • Copy text4ht.env into your working directory
      • This approach also lets you affect locally some export parameters. More on this later...

  8. Open the html to verify correctness. You might object to the poor graphics quality. In this case copy text4ht.env into the working directory if you haven't done so, and then modify it so it uses a high density when converting images.
    • See this answer for more details
    • In my case, since dvipng was been used, I replaced all instances of
      • -D 96
    • with
      • -D 300

  9. It also helps if you
    • strip away html comments
      • These look like <!-- xxx -->
    • centre aligned image divs
    • remove <hr/> instances

  10. These changes will make the import into Libre/OpenOffice go easier

  11. Open the html file in Libre/OpenOffice

  12. File > Export > ODT

  13. Close html file

  14. Open exported ODT

  15. Edit > Links

  16. Select all links

  17. Break Links

  18. Verify that the ODT file is now much larger!

  19. File > Save As > Word 97 (doc)

Phew! To help future visitors, a simple python script to fix up the html as I have described is included at the end of this post. You will need lxml and cssselector installed. Cheers, Steve
#!/usr/bin/env python

from lxml.html import parse, HtmlComment
from lxml import etree

def main(*args):
  if len(args) == 0:
    return 1

  doc = parse(args[0]).getroot()

  body = doc.cssselect('body')[0]

  # replace <hr/> with <br/> to make doc conversion easier
  for hr in body.cssselect('hr'):
    p = hr.getparent()

    br = etree.Element('br')

  # remove comments because for some reason libreoffice opens up 
  # html comments as document comments, slowing things down
  for node in doc.getiterator():
    if isinstance(node, HtmlComment):

  # centre align all figures
  for div in doc.cssselect('div.figure'):

  print etree.tostring(doc, method='html', encoding='utf-8')
if __name__ == '__main__':
  import sys

July 27, 2014

Reading Metadata Out of Nikon NIS Elements Generated TIFFs

Fluorescence image of 
cavitation effects,visualised 
using FITC-Dextran.
If, like me, you use Nikon's NIS Elements to do fluorescence imaging, you would have also noticed that nothing other than NIS Elements can read the metadata that is saved with TIFFs. ImageJ can't read it, the GIMP can't read it, tiffinfo can't read it. This means the only way to recover that metadata, which contains important things like exposure time and pixel size is to open the image in NIS Elements.

This sucks, because NIS Elements isn't exactly easily or widely available, and NIS Elements Viewer doesn't help either - it only opens ND3 files.

Fret not, however. The metadata isn't encoded in any particularly nasty way, and with a little exploration I was able to put together a small python script which will dump out all the relevant information.

Hope this helps someone. It would be nice if Nikon used the standard TIFF tags instead of putting everything in their own tags.


May 20, 2013

Go Away AVN

So it turns out that if you google vaccination on, Australian Vaccination Network (AVN) comes up as the second result, right after wikipedia.

Search results from for
"vaccination" as of 20th of May 2013
As some of you may know, AVN is an anti-vaccination organisation, part of the anti-vaccination movement, whose core beliefs rests on a retracted paper published by a doctor who was struck off the UK medical register after been found guilty of unprofessional and unethical conduct.

Vaccination is a corner stone of public health policy, and the only means of protection for the most vulnerable amongst us who are unable to receive vaccinations. Through vaccination smallpox was eradicated in 1979, and no child was ever again killed or maimed by it.

A child with smallpox
Due to the efforts of groups like the AVN however, there has been recent decline in the number of children receiving vaccinations against common childhood whooping cough, leading an outbreak of whooping cough in Queensland, and sadly, the death of several infants.

While the Australian Government is taken action against the AVN, the fact it has such a prominent position in Google's search results is undermining the overall effort. Thankfully this is something you and I can do something about, and that is what this blog post is for: part of an effort to increase the PageRank of legitimate sources of vaccination like

If you have a blog, make a post like this one, google vaccination on and click on every result that is NOT AVN.


December 16, 2012

Un/check all items in iTunes 11

I subscribe to The Moth, and had previously selected all episodes to be sync'd to my iPhone. Recently I found myself short on space, so I wanted to tell iTunes 11 to stop doing that. Previously, it was a simple case of multiple selection in a list and uncheck, but with iTunes 11, it looked like I was going to have to do it all by hand... for over 200 episodes!

By playing around, I found that if you command-click on a checkbox, it has the effect of un/checking all the checkboxes in the list. This made things more manageable.


October 17, 2012

Unknown control sequence \doublespacing

If you are getting the error
"Unknown control sequence: \doublespacing" when you try to compile your 
beamer presentation in Lyx, change
Document→Settings→Text Layout→Line spacing 
to Single.


September 14, 2012

Notes On Running Calibre On A Server

Calibre is a great tool, especially for converting between ebook formats. Here are some notes for getting it to run on a headless server.
  • The binary installer off the website works fine — ignore warnings about completion and desktop integration failing.
  • The installer will always pollute /usr/bin regardless of the installation directly you choose.
  • You will need the following libraries for mobi conversion:
    • libxi6
    • libxrandr2
    • libxfixes3
    • libxcursor1
  • If you get this message
    • SVG rasterizer unavailable, SVG will not be converted
    • You will need to install xvfb and use xvfb-run, like so:
xvfb-run ebook-convert blah.epub


Nook vs Kindle

Definitely the Kindle. Amazon offers an amazing service, especially their personal document service's email feature. That is simply divine. Furthermore, Amazon isn't nearly so annoying to use as a non-US citizen. I don't have to use a trick credit card or proxies to purchase ebooks. Amazon just sells them to me, easy peasy.

Disclaimer: I have a first generation Nook and a Kindle Touch. My comments above are however entirely based on the services provided, not the devices themselves. So this is probably more appropriate as Barnes and Noble vs Amazon.


August 28, 2012

Thoughts on The Design of SPOT2 GPS

  1. I love the fact the screws holding down the battery cover has little flip up handles so you can turn them with just your fingers, and that they are set so that they don't fall out once loosened. This is top quality design. However, the battery compartment door is not attached to the unit, and it should be, so there is no risk of losing that. Admittedly, it is pretty hard to lose something that is fairly large and orange
  2. Somewhat more importantly, the SOS button should be hooked into the power button, in the sense that if you press and hold the SOS button, it turns the device on and goes into SOS mode. As it is, activating SOS is a 2 step process, requiring you to first turn the device on.


August 23, 2012

Jumpcut is dead. Long live Jumpcut

I used to use Jumpcut a long time ago, and was a big fan of it. At some point, I think around 10.5, it stopped working, and I let it go -- too busy at the time to poke at it.

Recently however my interest in it was piqued again, and downloaded the 0.63 source. To my great delight, it compiles just fine, and once recompiled, works perfectly. Score one for good programming and open source!

If like me you have given up on Jumpcut, it is time to add this nifty utility back to your toolbox.


P.S. If you don't have access to Xcode, or the idea of compiling a program sounds like a bit too much, drop me a line at freespace *at* and I will make my copy available.