Monday, January 18, 2016

On calling Ceres from Scala - 4.2: Hello World, a brief note on speed

This is the sixth installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.

Hello World, A Brief Note on Speed


We all know how, in computer science, speed is often reported in terms of lies, big lies and benchmarks. However, silly benchmarks are fun, so let's write one.

On my 2.3 GHz i7 mac, the ceres-reported timings for a run of the native C++ implementation of the "helloworld" example are:


My Scala implementation based on the SWIG-generated wrap (HelloWorld.scala) instead clocks at:

This is not an apple-to-apple comparison, as the C++ version's cost function uses automatic differentiation, but the AD overhead should be in the noise.

So Scala is about 3x slower in terms of iteration time, and about 53x slower in total time, and that's OMG horrible right?

Well, not quite. The native version is statically optimized ("c++ -O3") in its entirety, whereas the port side calls JVM routines at every cost function evaluation. So, to be fair, we ought to give the JIT a chance to work its magic.

Running the optimization problem Scala-side 10 times in a row in the same process, but reinitializing all data structures at every run, should give us a better idea of the JIT-induced speedups.


Note the rather dramatic speedup after the first iteration, and the eventual convergence to essentially the same runtime as the native implementation.

This is, of course, just a toy benchmark, but the short answer to "Yo, how much slower is this port going to be compared to native?" should be (a) there is no short answer and (b) tell me more about your use case.

No comments:

Post a Comment