Saturday, May 25, 2013

Of DirectBuffer, int[], GC and erratic performance

This post is inspired by Vladimir Vukicevic. The JVM used is 1.7.0_21 64 bits, I7 930 @2.80Ghz with 16 Gb of RAM and windows 8, and my reason for posting it is that some of my friends were astonished by the results.  The original code can be found on Github there.

Here is the modified code:

/**
 * VM option
 *   -verbose:gc -server -XX:+PrintCompilation
 */

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.IntBuffer;
import java.util.ArrayList;
import java.util.List;

/**
 * Test direct buffer vs int[] vs List.
 */
public class NioTest {

    public static void main(String[] asArg) {
        final int MB = 1000 * 1000;
        IntBuffer buf = 
           ByteBuffer.allocateDirect(4 * 10 * MB)
              .order(ByteOrder.nativeOrder()).asIntBuffer();
        int cbMax = buf.capacity();
        int[] ab = new int[cbMax];
        // (1) 
        list = new ArrayList();         
        // end of (1)

        int x = 0;

        // how many times to run the test 
        // (to allow hotspot to optimize)
        int cIters = 200;

        for (int i = 1; i <= cIters; ++i) {
            System.out.println("*** iteration " + i);

            start("array[]: populating");
            for (int of = 0; of < cbMax; ++of) {
                ab[of] = of;
            }
            stop();

            buf.rewind();
            start("direct buffer: absolute populating");
            for (int of = 0; of < cbMax; ++of) {
                buf.put(of, of);
            }
            stop();

            buf.rewind();
            start("direct buffer: relative populating");
            for (int of = 0; of < cbMax; ++of) {
                buf.put(of, of);
            }
            stop();

            /**
              * Populate the array list.
              * Another small addition -> (2)
              */
            start("list: populating");
            list.clear();
            for (int of = 0; of < cbMax; ++of) {
                list.add(of);
            }
            stop();
            // end of (2)

            start("array[]: accessing");
            for (int of = 0; of < cbMax; ++of) {
                x |= ab[of];
            }
            stop();

            buf.rewind();
            start("direct buffer: absolute accessing");
            for (int of = 0; of < cbMax; ++of) {
                x |= buf.get(of);
            }
            stop();

            buf.rewind();
            start("direct buffer: relative accessing");
            for (int of = 0; of < cbMax; ++of) {
                x |= buf.get();
            }
            stop();
            
            /**
              * Direct access in an array list.
              * Another small addition -> (3)
              */
            start("list: direct accessing");
            for (int of = 0; of < cbMax; ++of) {
                x |= list.get(of);
            }
            stop();
            // end of (3)

        }

        // to avoid hotspot completely discarding some 
        // of the loops,
        // by using the variable "x" it can't optimize out 
        // the access inside the loop
        System.out.println("*** Test completed (x=" + x + ")");
    }

    /**
     * Start a test.
     *
     * @param sDesc the test description
     */
    public static void start(String sDesc) {
        s_sDesc = sDesc;
        s_lStart = System.currentTimeMillis();
    }

    /**
     * Finish a test and print the elapsed time.
     */
    public static void stop() {
        long lStop = System.currentTimeMillis();
        System.out.println(s_sDesc + "=" + (lStop - s_lStart) + "ms");
    }

    /**
     * Time at which the current test started.
     */
    static long s_lStart;

    /**
     * Description of the current test.
     */
    static String s_sDesc;
}

Before sharing the result, if the added sections (1), (2) and (3) are removed, the output is:
 
    68    1             java.lang.String::hashCode (55 bytes)
    70    2             java.lang.String::charAt (29 bytes)
*** iteration 1
    103    1 %           NioTest::main @ 80 (323 bytes)
array[]: populating=30.573388ms
    135    3             java.nio.Buffer::checkIndex (22 bytes)
    135    4             java.nio.DirectIntBufferU::put (18 bytes)
    135    6     n       sun.misc.Unsafe::putInt (0 bytes)   
    136    5             java.nio.DirectIntBufferU::ix (10 bytes)
direct buffer: absolute populating=59.400435ms
direct buffer: relative populating=57.650079ms
array[]: accessing=6.619388ms
    259    7             java.nio.DirectIntBufferU::get (16 bytes)
    259    8     n       sun.misc.Unsafe::getInt (0 bytes)   
  direct buffer: absolute accessing=45.15859ms
    305    9             java.nio.DirectIntBufferU::get (15 bytes)
    305   10             java.nio.Buffer::nextGetIndex (31 bytes)
direct buffer: relative accessing=46.361253ms
*** iteration 2
array[]: populating=8.189489ms
direct buffer: absolute populating=57.800777ms
direct buffer: relative populating=57.850401ms
array[]: accessing=6.623036ms
direct buffer: absolute accessing=42.51609ms
direct buffer: relative accessing=44.662711ms
[... repeat 197 times ...]
*** iteration 200
array[]: populating=8.223424ms
direct buffer: absolute populating=57.604104ms
direct buffer: relative populating=57.605928ms
array[]: accessing=6.539113ms
direct buffer: absolute accessing=41.6057ms
direct buffer: relative accessing=44.483917ms

It takes 5 iterations to finish all compilations...
 
*** iteration 5
array[]: populating=8.766374ms
direct buffer: absolute populating=57.51799ms
   1069   11             java.lang.String::indexOf (70 bytes))

With (1), (2) and (3) disabled, there are no GC pause triggered which in itself is quite normal.  A simple int[] is an order faster than the direct buffer. All in all the access time is quite good.

The surprise comes in when the List<Integer> is added to the code.  With (1), (2) and (3) enabled, a standard iteration looks like this:

*** iteration 200
array[]: populating=8.548172ms
direct buffer: absolute populating=46.513045ms
direct buffer: relative populating=47.481086ms
list: populating=90.392347ms
array[]: accessing=6.733597ms
direct buffer: absolute accessing=38.527891ms
direct buffer: relative accessing=39.543368ms
list: direct accessing=59.072768ms

Adding Integers to the List is obviously slow: 2 times more than direct buffer and 15 times slower than primitive int.  Accessing elements in the list is also slower: 1.5 times the direct buffer, and 9 times slower than the int[].

But, what is interesting is that even before the first iteration is started, a Full GC is triggered.

Here are the logs from start till end of the second (2nd) iterations:
 
     66    1             java.lang.String::hashCode (55 bytes)
     69    2             java.lang.String::charAt (29 bytes)
*** iteration 1
    103    1 %           NioTest::main @ 89 (418 bytes)
    124    1 %           NioTest::main @ -2 (418 bytes)   made not entrant
    124    2 %           NioTest::main @ 89 (418 bytes)
array[]: populating=47.584349ms
    152    3             java.nio.Buffer::checkIndex (22 bytes)
    152    4             java.nio.DirectIntBufferU::put (18 bytes)
    152    6     n       sun.misc.Unsafe::putInt (0 bytes)   
    153    5             java.nio.DirectIntBufferU::ix (10 bytes)
direct buffer: absolute populating=59.59346ms
direct buffer: relative populating=56.909727ms
    267    2 %           NioTest::main @ -2 (418 bytes)   made not entrant
    269    7             java.lang.Object:: (1 bytes)
    270    8             java.lang.Number:: (5 bytes)
    270    9             java.lang.Integer:: (10 bytes)
    270   10             java.util.ArrayList::add (29 bytes)
    271   11             java.util.ArrayList::ensureCapacityInternal (26 bytes)
    271   12             java.lang.Integer::valueOf (54 bytes)
    272    3 %           NioTest::main @ 200 (418 bytes)
[GC 64139K->55315K(251136K), 0.0236128 secs]
[GC 109374K->92973K(316672K), 0.0642352 secs]
[GC 224045K->173425K(316672K), 0.1279372 secs]
[Full GC 173425K->173131K(519872K), 3.4242516 secs]
list: populating=3821.272896ms
array[]: accessing=6.718271ms
4097   13             java.nio.DirectIntBufferU::get (16 bytes)
4098   14     n       sun.misc.Unsafe::getInt (0 bytes)   
direct buffer: absolute accessing=40.168782ms
4138   15             java.nio.DirectIntBufferU::get (15 bytes)
4138   16             java.nio.Buffer::nextGetIndex (31 bytes)
direct buffer: relative accessing=41.217827ms
4179   17             java.util.ArrayList::get (11 bytes)
4179   18             java.util.ArrayList::rangeCheck (22 bytes)
4179   19             java.util.ArrayList::elementData (7 bytes)
list: direct accessing=66.404046ms
*** iteration 2
array[]: populating=9.818703ms
direct buffer: absolute populating=47.35593ms
direct buffer: relative populating=47.253397ms
4350    4 %           java.util.ArrayList::clear @ 12 (39 bytes)
[GC 304203K->292317K(571520K), 0.4874512 secs]
[Full GC 292317K->111901K(627136K), 0.7334851 secs]
list: populating=1310.247517ms
array[]: accessing=6.688716ms
direct buffer: absolute accessing=38.789879ms
direct buffer: relative accessing=39.294516ms
list: direct accessing=60.193696ms

In that short time, there are two(2) full GCs pauses.  And one full pause just triggered by this snippet of code.

List list = new ArrayList();

Quite impressive...

Note that no other full GC will occur after those two. Some small GC collections are triggered at regular intervals, and such an iteration will look like this:
 
*** iteration 74
array[]: populating=8.292387ms
direct buffer: absolute populating=46.7203ms
direct buffer: relative populating=47.239167ms
[GC 1237773K->300333K(1635456K), 1.1714739 secs]
list: populating=1257.412101ms
array[]: accessing=6.637996ms
direct buffer: absolute accessing=38.250944ms
direct buffer: relative accessing=39.295611ms
list: direct accessing=62.972299ms

The GC kicks in when objects are added to the list.  Which sounds quite normal.  But, as a result, while populating the list is, for most iterations, within 86-90 ms, due to the GC that operation (or list of operations) can reach up to 1146.58 ms even after many iterations.

Example:
 
*** iteration 181
array[]: populating=8.487601ms
direct buffer: absolute populating=48.086431ms
direct buffer: relative populating=48.309376ms
[GC 1224941K->287437K(1632640K), 1.0567382 secs]
list: populating=1146.593101ms

It makes me think about when to use lists and Integer versus primitive types an arrays...