Thursday, August 20, 2015

JMH for Java MicroBenchmarks

Java Micro-benchmark Harness is an open-jdk project. As it says in the project home page, its a benchmarking harness that help you build, run and analyse results of benchmarks written in Java and other JVM languages.
Writing benchmarks I think is a good tool to have in every developer's arsenal. To an extent it's akin to Unit tests -> Functionality, but for performance. We want to know how our code is performing, but we are very bad at predicting how well it will run. So what other way to do this, other than fact and evidence based tests that are benchmarks. Benchmarks are great. But there's a dark side to this as well.
I love writing and running benchmarks. It gives some sense of affirmation on how your code or even some library performs. But the bitter truth and the dark side about benchmarks or any performance test for that matter is that they lie. There's no guarantee on the results you get on your benchmarks due to many factors. These include; Environment, Optimisations (JVM/Compiler), How CPUs work (i.e.  cpu cache misses and memory latencies). And overcoming challenges posed by these factors are really hard. Sure you can ignore these but then everything's nothing but a lie.

It's not all doom-and-gloom though. These challenges are the very reason JMH has been built. Understanding these challenges will make you appreciate it even more. JMH uses different strategies to overcome these challenges and provides a nice set of annotations to use in our benchmark code.

The best way to get started with JMH is to go through the examples that are provided in the project home page. They can be found here.

For the sake of this write up and to highlight some of the key concepts, following is a very basic example of adding an item to a CopyOnWriteArrayList.

@State(Scope.Thread)
public class CopyOnArrayListBenchmark {

    private List<String> benchMarkingList;

    @Setup
    public void setup() {
        benchMarkingList = new CopyOnWriteArrayList<String>();
    }
    @Benchmark
    public void benchMarkArrayListAddStrings(Blackhole blackhole) {
        blackhole.consume(benchMarkingList.add("foo"));
    }

    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .warmupIterations(5) //
                .measurementIterations(5)
                .mode(Mode.AverageTime)
                .forks(1)
                .threads(5)
                .include(CopyOnArrayListBenchmark.class.getSimpleName())
                .timeUnit(TimeUnit.MILLISECONDS)
                .build();

        new Runner(options).run();
    }
}

The annotations and the main method makes this a JMH benchmark. There's lot more that JMH provides but this I feel is a very basic benchmark that covers some important points. A quick run-down of the annotations used;

@State(Scope.Thread)
This is an annotation that is useful to define state for the benchmark. Generally we want to maintain some sort of state in our benchmarks. The different scopes available are;

  • Thread - A benchmark gets its own object, field per thread
  • Group - A benchmark shares objects, fields amongst a thread group
  • Benchmark - A benchmark shares the objects, fields for the whole benchmark

The way the state is defined in this example is by setting a default state. Therefore the instances fields of the benchmark have the state characteristics. The project examples show how per class states can be achieved.

@Setup
This is an annotation I find very useful. It's like the @Before annotation we'd normally use in JUnit tests. This gives us the opportunity to initialise fields, objects without impacting the actual benchmark. I find this very useful since, if not for this, we'd use various techniques to initialise objects and inadvertently incur setup cost in the benchmarking we do.

@Benchmark
The method annotated with this is the actual benchmark. The results captured will be for invoking code that is executed in this method. An important concept used in this method is the Blackhole. The simple reason for this is to avoid dead code. Dead code elimination is one of the key challenges in benchmarks (this caused by compilers running optimisations). JMH provides infrastructure to eliminate/minimise dead code. In the example above, I could have simply returned the boolean resulted by the List.add(T) method. Returning a value from a benchmark tells JMH to limit dead-code-elimination. However if multiple values need to be returned, its possible we compilers might find the code not returning to be dead-code. This is where Blackholes come in handy. Simply by using Blackhole.consume, we sink the values where JMH helps with dead-code. The example does not need a Blackhole (as it returns one value). It's only shown for highlighting this feature.

And then we have the main method which sets up the benchmark to run. The key highlights here is the various options it provides. I've used a few frequent options I use, but there's more. These options also have annotations that can be used.

JMH provides many benefits. However following are the benefits that stood out for me and got me using it more and more.

  • Optimisation proof/aware:
Does a lot to address dead-code elimination, constants-folding, loop unrolling
  • False sharing
There have been many mentions about how false sharing is a silent performance killer. JMH has recognised this and has built-in support to prevent it.
  • Forking
Given how well JVMs optimising for profiling, forking is important when running benchmarks. Running the benchmark in multiple forks helps eliminate these issues and therefore JMH forks tests by default. We however have the option of saying how many forks we want.
  • Data setup
I have found many times writing benchmarks and being worried about initialisation time contributing to my benchmark. The @Setup annotation does this job and is really useful.
  • Warming up
Warm-ups are crucial for benchmarks. Results gathered after running a test vary (sometimes significantly) depending on how many times they are run. This is another instances where optimisations get in the way of benchmarks. I have found needing to loop-run the code under test several times for the purpose of warming up so I can get a more sensible result. This is cumbersome and plus running benchmarks within loops have their own problems (Loop unrolling). So the ability to just mark the number of warm-ups we need is pretty cool.
  • Running tests multiple times
Like warm-ups, the option to set the number of measurement iterations are pretty cool as well. If not for this, we'd typically run the test in a loop.
  • Support for threads
Running benchmarks across multiple threads is a lot of hard work. I have used the technique of barriers to get all threads prepped and then getting them to all run. This still doesn't give us an enormous amount of confidence that all threads are in fact running as we hope they'd run. Thread scheduling is not something we deal with. Plus with all Runnable wrappers and cyclic barriers that might be used, makes the benchmark ugly, hard to read and error-prone. So the option in JMH of just saying how many threads I want to the test to run is just awesome.
  • State per thread/benchmark
Maintaining state during benchmarks can be very tricky. Specially when it comes to tests that we want to run on multiple threads. So the @State annotation comes very handy and makes life so much better.
  • Parameters
This I find is a another super cool feature of JMH. Just by using a @Param annotation and specifying an array of parameters, the test is repeated for all parameters given. This is a very nice and concise way to declare the scope of the parameters under test. A nice example can be found here
  • Artificially consume CPU
Sometimes when we run run a benchmark across multiple threads we also want to burn some cpu cycles to simulate CPU business when running our code. This can't be a Thread.sleep as we really want to burn cpu. The Blackhole.consumeCPU(long) gives us the capability to do this.
  • Measurement modes
Gathering results for benchmarks can be for throughput or latency (or both at times). Capturing results and getting them to produce the percentiles requires careful thought as well. So JMH provides a very convenient way to set the measurement mode. 
  • Built in profilers (Not your fully blown profiler. Use it with care)
JMH has this nice little feature where you can run a StackTrace or GC profiler. They are not your commercial profilers, but yet a good way to get some sense of indication of which part of the code is taken the most time and what kind of GC activity takes place when running the tests. I find these profilers a nice starting point when running benchmarks to further dive into a fully fledged profiler.

As stated above, the examples are a great way to get cracking on writing benchmarks using JMH. Also I found the resources by Aleksey ShipilĂ«v (the guy who's responsible for JMH) to be very useful. He goes into detail about the challenges and how JMH solves them in this video.

As closing remarks its still worth noting that benchmarks are should not be taken as absolutes. They are an indication and close representation on the throughput/latency of the code under test. I find it very useful as a relative measure when I refactor code. So write benchmarks, run them, analyse results and do it regularly.