Synchronizing GC in Java and V8

In the last post I wrote that I work on a project that involves a lot of interoperability between Java and V8 JavaScript engine. Here is an interesting problem I was investigating the last couple of days.

Both V8 and JVM use garbage collector for memory management. While using GC provides a lot of benefits sometimes having two garbage collectors in a single process can be tricky though. Suppose we have a super-charged version of LiveConnect where we have access to the full Java API.

var file = new java.io.File("readme.txt");

console.log("length=" + file.length());

These two lines of JavaScript may seem quite simple at first glance. We create an instance of java.io.File and call one of its methods. The tricky part is that we are doing this from JavaScript and we must take care that the actual Java instance would not be GC’ed before we call length method. In other words, we should provide some form of memory management. Suppose we decide to use JNI global references and we call NewGlobalRef every time when we create a new Java object from JavaScript. Accordingly we call DeleteGlobalRef when V8 makes Java object unreachable from JavaScript.

Let’s see a more complicated scenario.

var outStream = new java.io.FileOutputStream("log.txt");

var eventCallback = new com.example.EventCallback({
    onDataReceived: function(data) {
       outStream.write(data);
    }
});

var listener = new com.example.EventListener(eventCallback);

In this case we create an instance of com.example.EventCallback and provide its implementation in JavaScript. Now suppose that all these three JavaScript objects become unreachable and V8 is ready to GC them. Just because all of these objects are unreachable in JavaScript it does not mean that their actual counterparts in Java are unreachable. It’s possible that listener and eventCallback objects are still reachable through a stack of a listener Java thread.

gcchain

Now comes the interesting detail. While in JavaScript eventCallback has a reference to outStream through the function onDataReceived there is no such reference in Java and it is legitimate for Java GC to collect outStream object. The next time when the callback object calls write method there won’t be a corresponding Java object and the application will fail.

There are several solutions to this problem. One of them is to maintain the reachability in Java GC heap graph in sync with the one in JavaScript. After all, if there is an edge connecting eventCallback and outStream Java GC won’t try to collect the latter.

There are two options:

  • sync Java heap graph automatically
  • sync Java heap graph manually

As usual there is a trade-off. While the first option is very desirable there is a price to pay. We should analyze every closure in V8 that is GC’ed and traverse all objects reachable from there. This could slow down the GC by orders of magnitude.

The second option also has drawbacks. In general, JavaScript developers are not used to manual memory management. Introducing new memory management API could cause a lot of discomfort to the less experienced JavaScript developers.

scope(eventCallback, outStream);

Event if we make the API nice and simple, there is a burden of the mental model that JavaScript developers have to maintain. I tend to prefer this option though because many C/C++ developers proved it is possible to build high quality software using manual memory management.

In closing I would say that there are other solutions to this problem. I’ll discuss them in another blog post.

CLR Profilers and Windows Store apps

Last month Microsoft published a white paper about profiling Windows Store apps. The paper is very detailed and provides rich information how to build CLR profiler for Windows Store apps. I was very curious to read it because at the time when we released JustTrace Q3 2012 there was no documentation. After all, I was curious to know whether JustTrace is compliant with the guidelines Microsoft provided. It turns out it is. Almost.

At time of writing JustTrace profiler uses a few Win32 functions that are not officially supported for Windows Store apps. The only reason for this is the support for Windows XP. Typical example is CreateEvent which is not supported for Windows Store apps but is supported since Windows XP. Rather one should use CreateEventEx which is supported since Windows Vista.

One option is to drop the support for Windows XP. I am a bit reluctant though. At least such decision should be carefully thought and must be supported by actual data for our customers using Window XP. Another option is to accept the burden to develop and maintain two source code bases – one for Windows XP and another for Windows Vista and higher. Whatever decision we are going to make, it will be thoroughly thought out.

Let’s have a look at the paper. There is one very interesting detail about memory profiling.

The garbage collector and managed heap are not fundamentally different in a Windows Store app as compared to a desktop app.  However, there are some subtle differences that profiler authors need to be aware of.

It continues even more interesting.

When doing memory profiling, your Profiler DLL typically creates a separate thread from which to call ForceGC. This is nothing new.  But what might be surprising is that the act of doing a garbage collection inside a Windows Store app may transform your thread into a managed thread (for example, a Profiling API ThreadID will be created for that thread)

Very subtle indeed. For a detailed explanation, you can read the paper. Fortunately JustTrace is not affected by this change.

In conclusion, I think the paper is very good. It is a mandatory reading for anyone interested in building CLR profiler for Windows Store apps. I would encourage you to see CLR profiler implementation as well.

Garbage collection – part 1 of N

Recently I deal a lot with memory problems like leaks, stack/heap corruption, heap fragmentation, buffer overflow and the like. Surprisingly these things happen in the .NET world, especially when one deals with COM/PInvoke interoperability.

The CLR comes with a garbage collector (GC) which is a great thing. The GC has been around for many years and we accepted it as something granted and rarely think about it. This is a twofold thing. On one hand this is a prove that the GC does excellent job in most of the time. On the other hand the GC could become a big issue when you want to get the maximum possible performance.

I think it would be nice to explain some of the GC details. I hope this series of posts could help you build more GC friendly apps. Let’s start.

GC

I assume you know what GC is, so I won’t going to explain it. There are a lot of great materials on internet on this topic. I am going to state and the single and the most important thing: GC provides automatic dynamic memory management. As a consequence GC prevents the problems that were (and still are!) common in native/unmanaged applications:

  • dangling pointers
  • memory leaks
  • double free

During the years, the improper use of dynamic memory allocation became a big problem. Nowadays many of modern languages rely on GC. Here is a short list:

ActionScript Lua
AppleScript Objective-C
C# Perl
D PHP
Eiffel Python
F# Ruby
Go Scala
Haskell Smalltalk
Java VB.NET
JavaScript Visual Basic

I guess more than 75% of all developers are programming in these languages. It also important to say that there are attempts to introduce a basic form of “automatic” memory management in C++ as well. Although auto_ptr, shared_ptr, unique_ptr have limitations they are a step in the right direction.

You probably heard that GC is slow. I think there are two aspects of that statement:

  • for the most common LOB applications GC is not slower than manual memory management
  • for real-time applications GC is indeed slower than well crafted manual memory management

However most of us are not working on real-time applications. Also not everyone is capable of writing high performance code, this is indeed hard to do. There are good news though. With the advent of the research in GC theory there are signs that GC will become even faster the current state-of-the-art manual memory management. I am pretty sure that in the future no one will pay for a real-time software with manual memory management; it will be too risky.

GC anatomy

Every GC is composed of the following two components:

  • mutator
  • collector

The mutator is responsible for the memory allocation. It is called so because it mutates the object graph during the program execution. For example in the following pseudo-code:

string firstname = "Chris";
Person person = new Person();
person.Firstname = name;

the mutator is responsible for allocating the memory on the heap and updating the object graph by updating the field Firstname to reference the object firstname (we say that firstname is reachable from person through the field Firstname). It is important to say that these reference fields may be part from objects on the heap (as in our scenario from person object) but also may be contained in other object known as roots. The roots may be thread stacks, static variables, GC handles and so on. As a result from the mutator’s work any object can become unreachable from the roots (we say that such object becomes a garbage). This is where the second component comes.

The collector is responsible for the garbage collection of all unreachable objects and reclaim of their memory.

Let’s have a look at the roots. They are called roots because they are accessible directly, that is they are accessible to the mutator without going thought other objects. We denote the set of all roots objects as Roots.

Now let’s look at the objects allocated on the heap. We can denote this set as Objects. Each object O can be distinguished by its address. For simplification let’s assume that object fields can be only references to other objects. In really most of the object fields are primitive types (like bool, char, int, etc.) but these fields are not important for the object graph connectivity. It doesn’t matter if an int field has value 5 or 10. So for now let’s assume that objects have reference fields only. Let’s denote with |O| the number of the reference fields for the object O and with &O[i] denote the address of the i-th field of O. We write the usual pointer dereference for ptr as *ptr.

This notation allows us to define the set Pointers for an object O as

Pointers(O) ={ a | a=&O[i], where 0<=i<|O| }

For convenience we define Pointers(Roots)=Roots.

To recap – we have defined the following important sets:

  • Objects
  • Roots
  • Pointers(O)

After we defined some of the most important sets, we are going to define the following operations:

  • New()
  • Read(O, i)
  • Write(O, i, value)

New() operation obtains a new heap object from the allocator component. It simply returns the address of the allocated object. The pseudo-code for New() is:

New():
     return allocate()

It is important to say that allocate() function allocates a continuous block of memory. The reality is a bit more complex. We have different object types (e.g. Person, string, etc.) and usually New() takes parameters for the object type and in some cases for its size. Also it could happen that there is not enough memory. We will revisit New() definition later. For simplification we can assume that we are going to allocate object of one type only.

Read(O, i) operation returns the reference stored at the i-th field of the object O. The pseudo-code for Read(O, i) is:

Read(O, i):
     return O[i]

Write(O, i, value) operation updates the reference stored at the i-th field of the object O. The pseudo-code for Write(O, i, value) is:

Write(O, i, value):
     O[i] = value

Sometimes we have to explicitly say that an operation or function is atomic. When we need so, we write atomic in front of the operation name.

Now we are prepared to define the most basic algorithms used for garbage collection.

Mark-sweep algorithm

Earlier I wrote that New() definition is oversimplified. Let’s revisit its definition:

New():
    ref = allocate()
    if ref == null
        collect()
        ref = allocate()
        if ref == null
              error "out of memory"

    return ref

atomic collect():
     markFromRoots()
     sweep(HeapStart, HeapEnd)

The updated New() definition is bit more robust. It first tries to allocate memory. If there is not big enough continuous memory block it will collect the garbage (if there is any). Then it will repeat to allocate memory again. It could fail or succeed. What is important for this function definition is that it reveals when GC will trigger. Again, the reality is more complex but in general GC will trigger when the program tries to allocate memory.

Let’s define the missing markFromRoots and sweep functions.

markFromRoots():
     worklist = empty
     foreach field in Roots
          ref = *field
          if ref != null && !isMarked(ref)
                 setMarked(ref)
                 enqueue(worklist, ref)
                 mark()

mark():
     while !isEmpty(worklist)
          ref = dequeue(worklist)
          foreach field in Pointers(ref)
                child = *field
                if child != null && !isMarked(child)
                      setMarked(child)
                      enqueue(worklist, child)

sweep(start, end):
     scan = start
     while (scan < end)
           if isMarked(scan)
                 unsetMarked(scan)
           else free(scan)
           scan = nextObject(scan)

The algorithm is straightforward and simple. It starts from Roots and marks each reachable object. Then it iterates over the whole heap and frees the memory of every unreachable object. Also it remove the mark of the remaining objects. It is important to say that this algorithm needs two passes over the heap.

The algorithm does not solve the problem with the heap fragmentation. This naive implementation doesn’t work well in real world scenarios. In the next post of this series we will see how we can improve it. Stay tuned.