Sunday, June 24, 2007

JavaOne 2007 - Performance tips

by Eduardo Rodrigues
Hello everybody!

I know I've promised more posts with my impressions on JavaOne 2007. So, here it goes...

Some of the most interesting technical sessions I've attended to were on J2SE performance and monitoring. In fact, I would highlight TS-2906: "Garbage Collection-Friendly Programming" by John Coomes, Peter Kessler and Tony Printezis from the Java SE Garbage Collection Group at Sun Microsystems. They certainly gave me a new vision on the newest GCs available.

And what does GC-friendly programming have to do with performance? Well, if you manage to write code that doesn't needlessly spend GC processing, you'll be implicitly avoiding major performance impacts to your application.

Today there are different kinds of GCs and a variety of approaches for them too. We have generational GCs which keeps young and old objects separetely in the heap and uses specific algorithms for each generation. We also have the incremental GC which tries to minimize GC disruption working in parallel with the application. There's also the possibility of mixing both using a generational GC with the incremental approach being applied only to the old generation space. Besides, we have campacting and non-compacting GCs; copying, mark-sweep and mark-compact algorithms; linear and free lists allocation and so on. Yeah... I know... another alphabet soup. If you want to know further about them, here are some interesting resources:

The first and basic question should be "how do I create work for the GC?" and the most common answers would be: allocating new memory (higher allocation rate implies more frequent GCs), "live data" size (more work to determine what's live) and reference field updates (more overhead to the application and more work for the GC, especially for generational or incremental). With that in mind, there are some helpful tips for writing GC-friendly code:
  • Object Allocation

    In recent JVMs, object allocation is usually very cheap. It takes only 10 native instructions in fast common cases. As a matter of fact, if you think that C/C++ has faster allocation you're wrong. Reclaiming new objects is very cheap too (especially for young generation spaces in generational GCs). So, do not be affraid to allocate small objects for intermediate results and remember the following:

  • GCs, in general, love small immutable objects and gerational GCs love small and short-lived ones;

  • Always prefer short-lived immutable objects instead of long-lived mutable ones;

  • Avoid needless allocation but keep using clearer and simpler code, with more allocations instead of more obscure code with fewer allocations.

  • As a simple and great example of how the tiniest details may jeopardize performance, take a look at the code bellow:

    public void printVector(Vector v) {
       for (int i=0; v != null && i < v.size(); i++) {
          String s = (String) v.elementAt(i);

    This must look like a very inocent code but almost every part of it may be optimized for performance. Let's see... First of all, using the expression "v != null && i < v.size()" as the loop condition generates a totally unecessary overhead. Also, declaring the String s inside the loop implies needless allocation and, last but not least, using System.out.println is always an efficient way of making you code really slow (and that's inside the loop!). So, we could rewrite the code like this:

    public void printVector(Vector v) {
       if (v != null) {
          StringBuffer sb = new StringBuffer();
          int size = v.size();

          for (int i=0; i < size; i++) {


    And if we're using J2SE 1.5, we could do even better:

    public void printVector(Vector<String> v) {
    //using Generics to define the vector's content type

       if (v != null) {
          StringBuilder sb = new StringBuilder();
          //faster than StringBuffer since
          //it's not synchronized and thread-safety
          //is not a concern here

          for (String s : v) { //enhanced for loop
             sb.append( s.trim() );
             //we're using Generics, so
             //there's no need for casting
             sb.append( "\n" );


  • Large Objects

    Very large objects are obviously more expensive to allocate and to initalize (zeroing). Also, large objects of different sizes can cause memory fragmentation (especially if you're using a non-compacting GC). So, the message here is: always try to avoid large objects if you can.

  • Reference Field Nulling

    Differently of what many may think, nulling references rarely helps the GC. The exception is when you're implementing array-based data structures.

  • Local Variable Nulling

    This is totaly unecessary since the JIT (Just In-Time compiler) is able to do liveness analysis for itself. For example:

    void foo() {
       int[] array = new int[1024];
       //last use of array in method foo()
       array = null;
       //unnecessary! array is no
       //longer considered live by the GC

  • Explicit GCs

    Avoid them at all costs! Applications does not have all the information needed to decide when a garbage colletion should take place, besides, a call to System.gc() at the wrong time can hurt performance with no benefit. That's because, at least in HotSpottm, System.gc() does a "stop-the-world" full GC. A good way of preventing this is using -XX:+DisableExplicitGC option to ignore System.gc() calls when starting the JVM.

    Libraries can also make explicit System.gc() calls. An easy way to find out is to run FindBugs to check on them.

    If you're using Java RMI, keep in mind that it uses System.gc() for its distributed GC algorithm, so, try to decrease its frequency and use -XX:+ExplicitGCInvokesConcurrent option when starting the JVM.

  • Data Structure Sizing

    Avoid frequent resizing and try to size data structures as realistically as possible. For example, the code bellow will allocate the associated array twice:

    ArrayList list = new ArrayList();

    So, the correct should be:

    ArrayList list = new ArrayList(1024);

  • And remember... array copying operations, even when using direct memory copying methods (like System.arrayCopy() or Arrays.copyOf() in J2SE 6), should always be used carefully.

  • Object Pooling

    This is another old paradigm that must be broken since it brings terrible allocation performance. As you must remember from the first item above, GC loves short-lived immutable objects, not long-lived and highly mutable ones. Unused objects in pools are like bad tax since they are alive and the GC must process them. Besides, they provide no benefit because the application is not using them.

    If pools are too small, you have allocations anyway. If they are too large, you have too much footprint overhead and more pressure on the GC.

    Because any object pool must be thread-safe by default, the use of synchronized methods and/or blocks of code are implicit and that defeats the JVM's fast allocation mechanism.

    Of course, there are some exceptions like pools of objects that are expensive to allocate and/or initialize or that represent scarse resources like threads and database connections. But even in these cases, always prefer to use existing well-known libraries.
to be continued...


Augusto said...

Great blogs Eduardo, this one and umproouto.
I'm wondering how fast memory allocation in Java and C/C++ are.
How do you measure that?

Eduardo Rodrigues said...

Well, there's a vast material on this subject on the Internet. Here's some of them:

I hope it helps you.

name said...

vh1u9c Wonderful blog.

name said...

Good job!