GC – Never Ending Journey

Memory management in NativeScript for Android

Note: This post will be a bit different from the previous ones. It’s intended to provide brief history as to why current NativeScript for Android implementation is designed this way. So, this post will be most useful for my Telerik ex-colleagues. Think of it as kind of historic documentation. Also, it is a chance to have a peek inside a developer’s mind 😉

I already gave you a hint about my current affairs. Since February I took the opportunity to pursue new ventures in a new company. The fact that my new office is the very next building to Telerik HQ gives me an opportunity to keep close connections with my former colleagues. At one such coffee break I was asked about the current memory management implementation. As I am no longer with Telerik, my former colleagues miss some important history that explains why this feature is implemented this way. I tried to explain briefly that particular technical issue in a previous post, however I couldn’t go much in depth because NativeScript was not announced yet. So, here I’ll try to provide more details.

Note: Keep in mind that this post is about NativeScript for Android platform, so I will focus only on that platform.

On the very first day of the project, we decided that we should explore what can be done with JavaScript-to-Java bidirectional marshalling. So, we set up a simple goal: make an app with a single button that increments a counter. Let’s see what Android docs says about button widget.

 public class MyActivity extends Activity {
     protected void onCreate(Bundle savedInstanceState) {
         super.onCreate(savedInstanceState);

         setContentView(R.layout.content_layout_id);

         final Button button = findViewById(R.id.button_id);
         button.setOnClickListener(new View.OnClickListener() {
             public void onClick(View v) {
                 // Code here executes on main thread after user presses button
             }
         });
     }
 }

After so many years, this is the first code fragment you see on the site. And it should be so. This code fragment captures the very essence of what button widget is and how it is used. We wanted to provide JavaScript syntax which feels familiar to Java developers. So, we ended up with the following syntax:

var button = new android.widget.Button(context);
button.setOnClickListener(new android.view.View.OnClickListener({
   onClick: function() {
      // do some work
   }
}));

This example is shown countless times in NativeScript docs and various presentation slides/materials. It is part of our first and main test/demo app.

Motivation: we wanted to provide JavaScript syntax which is familiar to existing Android developers.

This decision brings an important implication, namely the usage of JavaScript closures. To understand why closures are important for the implementation, we could take a look at the following simple, but complete, Java example.

package com.example;

import android.app.Activity;
import android.os.Bundle;
import android.view.View;
import android.widget.Button;
import android.widget.LinearLayout;
import android.widget.TextView;

public class MyActivity extends Activity {
    private int count = 0;

    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        LinearLayout layout = new LinearLayout(this);
        layout.setFitsSystemWindows(false);
        layout.setOrientation(LinearLayout.VERTICAL);

        final TextView txt = new TextView(this);
        layout.addView(txt);

        Button btn = new Button(this);
        layout.addView(btn);
        btn.setText("Increment");
        btn.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                txt.setText("Count:" + (++count));
            }
        });

        setContentView(layout);
    }
}

Behind the scene, the Java compiler will generate anonymous class that we can decompile and inspect closely. For the purpose of this post I am going to use fernflower decompiler. Here is the output for MyActivity$1 class.

package com.example;

import android.view.View;
import android.view.View.OnClickListener;
import android.widget.TextView;

class MyActivity$1 implements OnClickListener {
   // $FF: synthetic field
   final TextView val$txt;
   // $FF: synthetic field
   final MyActivity this$0;

   MyActivity$1(MyActivity this$0, TextView var2) {
      this.this$0 = this$0;
      this.val$txt = var2;
   }

   public void onClick(View view) {
      this.val$txt.setText("Count:" + MyActivity.access$004(this.this$0));
   }
}

We can see the Java compiler generates code that:
1) captures the variable txt
2) deals with ++count expression

This means that the click handler object holds references to the objects it accesses in its closure. We can call this class stateful as it has class members. Fairly trivial observation.

Let’s take a look again at the previous JavaScript code.

var button = new android.widget.Button(context);
button.setOnClickListener(new android.view.View.OnClickListener({
   onClick: function() {
      // do some work
   }
}));

We access the button widget and call its setOnClickListener method with some argument. This means that we should have instantiated Java object which implements OnClickListener so that the button can use it later. You can find the class implementation for that object in your project platform directory

[proj_dir]/platforms/android/src/main/java/com/tns/gen/android/view/View_OnClickListener.java

Let’s see what the actual implementation is.

package com.tns.gen.android.view;

public class View_OnClickListener
       implements android.view.View.OnClickListener {
  public View_OnClickListener() {
    com.tns.Runtime.initInstance(this);
  }

  public void onClick(android.view.View param_0)  {
    java.lang.Object[] args = new java.lang.Object[1];
    args[0] = param_0;
    com.tns.Runtime.callJSMethod(this, "onClick", void.class, args);
  }
}

As we can see this class acts as a proxy and doesn’t have fields. We can call this class stateless. We don’t store information that we can use to describe its closure if any.

So, we saw that Java compiler generates classes that keep track of their closures while NativeScript generates classes that don’t keep track of their closures. This is a simple implication due to the fact the JavaScript is a dynamic language and the information of lexical scope is not enough to provide full static analysis. The full information about JavaScript closures can be obtain at run time only.

The ovals diagram I used in my previous post visualize the missing object reference to the closed object. So, now we have an understanding what happens in NativeScript runtime for Android. The current NativeScript, at the time of writing version 3.3, provides mechanism to “compensate” for the missing object references. To put it simply, for each JavaScript closure accessible from Java we traverse all reachable Java objects in order to keep them alive until the closure becomes unreachable from Java. Well, while we were able to describe the current solution in a single sentence it doesn’t mean it doesn’t have drawbacks. This solution could be very slow if an object with large hierarchy, like global, is reachable from some closure. If this is the case, the implication is that we will traverse the whole V8 heap on each GC.

Back then in 2014, when we hit this issue for the first time, we discussed the option to customize part of the V8 garbage collector in order to provide faster heap traversing. The drawback is slower upgrade cycle for V8 which means that JavaScriptCore engine will provide more features at given point in time. For example, it is not easy to explain to the developers why they can use class syntax for iOS but not for Android.

Motivation: we wanted to keep V8 customization at minimum so we can achieve relatively feature parity by upgrading V8 engine as soon as possible.

So, now we know traversing V8 heap can be slow, what else? The current implementation is incomplete and case-by-case driven. This means that it is updated when there are important and common memory usage patterns. For example, currently we don’t traverse Map and Set objects.

Let’s see what can happen in practice. Create a default app.

tns create app1

Run the app and make sure it works as expected.

Now, we have to go through the process of designing a user scenario where the runtime will crash. We know that the current implementation doesn’t traverse Map and Set objects. So, we have to make Java object which is reachable only through, let’s say, Map object. This is only the first part of our exercise. We also must take care to make it reachable through a closure. Finally, we must give a chance for GC to collect it before we use it. So, let’s code it.

function crash() {
    var m = new Map();
    m.set('o', new java.lang.Object() /* via the map only */);
    var h = new android.os.Handler(android.os.Looper.getMainLooper());
    h.post(new java.lang.Runnable({
        run: function() {
            console.log(m.get('o').hashCode());
        }
    }));
}

That’s all. Finally, we have to integrate crash within our application. We can do so by modifying onTap handler in [proj_dir]/app/main-view-model.js as follows:

viewModel.onTap = function() {
    crash();
    gc();
    java.lang.Runtime.getRuntime().gc();
    this.counter--;
    this.set("message", getMessage(this.counter));
}

Run the app and click the button. You should get error screen similar to the following one.

Motivation: we wanted to evolve V8 heap traversing on case-by-case basis in order to traverse as little as possible.

Understanding this memory usage pattern (create object, set up object reachability, GC and usage) is a simple but powerful tool. With the current implementation the fix for Map and Set is similar to this one. Also, realizing that in the current implementation the missing references to the captured objects is the only reason for this error is critical for any further changes. This is well documented in the form of unit tests.

So far we discussed the drawbacks of the current implementation. Let’s say a few words about its advantages. First, and foremost, it keeps the current memory management model familiar to the existing Java and JavaScript developers. This is important in order to attract new developers. If two technologies, X and Y, solve similar problems and offer similar licenses, tools, etc., the developers are in favor for the one with simpler “mental model”. While introducing alloc/free or try/finally approach is powerful, it does not attract new developers because it sets higher entry level, less explicit approach. Another advantage, which is mostly for the platform developers, is the fact that current approach aligns well with many optimizations that can be applied. For example, taking advantage (introducing) of GC generations for the means of NativeScript runtime. Also, it allows per-application fine tuning of existing V8 flags (e.g, gc_interval, incremental_marking, minor_mc, etc.). Tweaking V8 flags won’t have general impact when manual memory management is applied. In my opinion, tuning these flags is yet another way to help regular Joe shooting himself in the foot, but providing sane defaults and applying adaptive schemes very possible could be a huge win.

It is important to note that whatever approach is applied, this must be done carefully because of the risk of OOM exception. Introducing schemes like GC generation should consider the object memory weight. This will make obsolete the current approaches that use time and/or memory pressure heuristics. In general, such GC generation approach will pay off well.

I hope I shed more light on this challenging problem. Looking forward to see how the team is going to approach it. Good luck!

Dispose Pattern

There are a lot of blog posts and articles about the proper IDisposable implementation, so why writing another one? I guess I am writing it because the topic is quite subjective as there are many different scenarios for IDisposable usage.

I am not going to explore all the details about IDisposable implementation. I would recommend the following readings though I would advise you to be a selective reader so don’t take everything for granted truth:

DG Update: Dispose, Finalization, and Resource Management from Joe Duffy
Implementing IDisposable and the Dispose Pattern Properly from Scott Dorman

These are valuable articles because they contain a lot of different opinions. I would also recommend the following two MSDN links which you may find contradicting:

Implementing a Dispose Method (.NET 3.0 version)
Implementing a Dispose Method (.NET 4.5 version)

While the most people talk about Dispose and Finalize patterns as different things I would recommend to be careful and to think about them as a single pattern. It is always hard, even impossible, to predict how your code will be used from other software developers. So, having a safe strategy and implementing Dispose and Finalize patterns together might be the right choice.

Let’s see the following class definition:

public class ProcessInfo : IDisposable
{
    [DllImport("kernel32")]
    static extern IntPtr OpenProcess(int dwDesiredAccess, bool bInheritHandle, int dwProcessId);

    [DllImport("kernel32")]
    static extern int GetCurrentProcessId();

    [DllImport("kernel32")]
    static extern bool CloseHandle(IntPtr hHandle);

    private readonly IntPtr hProcess;

    public ProcessInfo()
    {
        const int PROCESS_VM_READ = 0x10;

        this.hProcess = OpenProcess(PROCESS_VM_READ, false, GetCurrentProcessId());
    }

    public void Dispose()
    {
        this.Dispose(true);
        GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (this.hProcess != IntPtr.Zero)
        {
            CloseHandle(hProcess);
        }
    }
}

It is not the best example but let’s pretend this class has some useful methods and properties for the current process. Because the class implements IDisposable interface it is reasonable to expect something like the following code:

using (var procInfo = new ProcessInfo())
{
    // call some methods and properties
}

The code fragment above seems all good and fine. The tricky part is that the C# compiler will actually emit the following equivalent code:

ProcessInfo procInfo = null;
try
{
    procInfo = new ProcessInfo();
    // call some methods and properties
}
finally
{
    if (procInfo != null)
    {
        procInfo.Dispose();
    }
}

Let’s focus on the constructor. The main purpose of the constructor is to construct an object. This means that the constructed object should be in a usable state. This concept is deeply embraced in .NET base class library. Let’s see the following code fragment:

var fs = new FileStream("", FileMode.Open);

As you can guess, it will throw ArgumentException. Let’s see another canonical example:

public class Person
{
    public Person(string name, uint age)
    {
        if (string.IsNullOrWhiteSpace(name))
        {
            throw new ArgumentException("invalid name", "name");
        }

        if (age < MIN_AGE)
        {
            throw new ArgumentException("invalid age", "age");
        }

        // store "name" and "age"
    }
    // Some useful methods and properties
}

Most of us have written such code. Often it is perfectly legal to throw an exception when we cannot construct a usable object. Back to our example:

using (var procInfo = new ProcessInfo())
{
    // call some methods and properties
}

If an exception is thrown in the constructor then procInfo variable won’t be assigned and therefore Dispose() method won’t be called. Let’s modify ProcInfo constructor a bit:

public ProcessInfo()
{
    const int PROCESS_VM_READ = 0x10;

    this.hProcess = OpenProcess(PROCESS_VM_READ, false, GetCurrentProcessId());

    // something wrong
    throw new Exception();
}

Oops, we have Win32 handle leak. One option is to use try-catch clause inside the constructor and call CloseHandle(…) method. Sometimes this is the best option.

Another option is to implement the Finalize pattern. It is easy and I find it more clear:

~ProcessInfo()
{
    this.Dispose(false);
}

Now, there are clear roles for the constructor and Dispose/Finalize methods and the Win32 handle leak is fixed.

Garbage collection – part 1 of N

Recently I deal a lot with memory problems like leaks, stack/heap corruption, heap fragmentation, buffer overflow and the like. Surprisingly these things happen in the .NET world, especially when one deals with COM/PInvoke interoperability.

The CLR comes with a garbage collector (GC) which is a great thing. The GC has been around for many years and we accepted it as something granted and rarely think about it. This is a twofold thing. On one hand this is a prove that the GC does excellent job in most of the time. On the other hand the GC could become a big issue when you want to get the maximum possible performance.

I think it would be nice to explain some of the GC details. I hope this series of posts could help you build more GC friendly apps. Let’s start.

GC

I assume you know what GC is, so I won’t going to explain it. There are a lot of great materials on internet on this topic. I am going to state and the single and the most important thing: GC provides automatic dynamic memory management. As a consequence GC prevents the problems that were (and still are!) common in native/unmanaged applications:

dangling pointers
memory leaks
double free

During the years, the improper use of dynamic memory allocation became a big problem. Nowadays many of modern languages rely on GC. Here is a short list:

ActionScript	Lua
AppleScript	Objective-C
C#	Perl
D	PHP
Eiffel	Python
F#	Ruby
Go	Scala
Haskell	Smalltalk
Java	VB.NET
JavaScript	Visual Basic

I guess more than 75% of all developers are programming in these languages. It also important to say that there are attempts to introduce a basic form of “automatic” memory management in C++ as well. Although auto_ptr, shared_ptr, unique_ptr have limitations they are a step in the right direction.

You probably heard that GC is slow. I think there are two aspects of that statement:

for the most common LOB applications GC is not slower than manual memory management
for real-time applications GC is indeed slower than well crafted manual memory management

However most of us are not working on real-time applications. Also not everyone is capable of writing high performance code, this is indeed hard to do. There are good news though. With the advent of the research in GC theory there are signs that GC will become even faster the current state-of-the-art manual memory management. I am pretty sure that in the future no one will pay for a real-time software with manual memory management; it will be too risky.

GC anatomy

Every GC is composed of the following two components:

mutator
collector

The mutator is responsible for the memory allocation. It is called so because it mutates the object graph during the program execution. For example in the following pseudo-code:

string firstname = "Chris";
Person person = new Person();
person.Firstname = name;

the mutator is responsible for allocating the memory on the heap and updating the object graph by updating the field Firstname to reference the object firstname (we say that firstname is reachable from person through the field Firstname). It is important to say that these reference fields may be part from objects on the heap (as in our scenario from person object) but also may be contained in other object known as roots. The roots may be thread stacks, static variables, GC handles and so on. As a result from the mutator’s work any object can become unreachable from the roots (we say that such object becomes a garbage). This is where the second component comes.

The collector is responsible for the garbage collection of all unreachable objects and reclaim of their memory.

Let’s have a look at the roots. They are called roots because they are accessible directly, that is they are accessible to the mutator without going thought other objects. We denote the set of all roots objects as Roots.

Now let’s look at the objects allocated on the heap. We can denote this set as Objects. Each object O can be distinguished by its address. For simplification let’s assume that object fields can be only references to other objects. In really most of the object fields are primitive types (like bool, char, int, etc.) but these fields are not important for the object graph connectivity. It doesn’t matter if an int field has value 5 or 10. So for now let’s assume that objects have reference fields only. Let’s denote with |O| the number of the reference fields for the object O and with &O[i] denote the address of the i-th field of O. We write the usual pointer dereference for ptr as *ptr.

This notation allows us to define the set Pointers for an object O as

Pointers(O) ={ a | a=&O[i], where 0<=i<|O| }

For convenience we define Pointers(Roots)=Roots.

To recap – we have defined the following important sets:

Objects
Roots
Pointers(O)

After we defined some of the most important sets, we are going to define the following operations:

New()
Read(O, i)
Write(O, i, value)

New() operation obtains a new heap object from the allocator component. It simply returns the address of the allocated object. The pseudo-code for New() is:

New():
     return allocate()

It is important to say that allocate() function allocates a continuous block of memory. The reality is a bit more complex. We have different object types (e.g. Person, string, etc.) and usually New() takes parameters for the object type and in some cases for its size. Also it could happen that there is not enough memory. We will revisit New() definition later. For simplification we can assume that we are going to allocate object of one type only.

Read(O, i) operation returns the reference stored at the i-th field of the object O. The pseudo-code for Read(O, i) is:

Read(O, i):
     return O[i]

Write(O, i, value) operation updates the reference stored at the i-th field of the object O. The pseudo-code for Write(O, i, value) is:

Write(O, i, value):
     O[i] = value

Sometimes we have to explicitly say that an operation or function is atomic. When we need so, we write atomic in front of the operation name.

Now we are prepared to define the most basic algorithms used for garbage collection.

Mark-sweep algorithm

Earlier I wrote that New() definition is oversimplified. Let’s revisit its definition:

New():
    ref = allocate()
    if ref == null
        collect()
        ref = allocate()
        if ref == null
              error "out of memory"

    return ref

atomic collect():
     markFromRoots()
     sweep(HeapStart, HeapEnd)

The updated New() definition is bit more robust. It first tries to allocate memory. If there is not big enough continuous memory block it will collect the garbage (if there is any). Then it will repeat to allocate memory again. It could fail or succeed. What is important for this function definition is that it reveals when GC will trigger. Again, the reality is more complex but in general GC will trigger when the program tries to allocate memory.

Let’s define the missing markFromRoots and sweep functions.

markFromRoots():
     worklist = empty
     foreach field in Roots
          ref = *field
          if ref != null && !isMarked(ref)
                 setMarked(ref)
                 enqueue(worklist, ref)
                 mark()

mark():
     while !isEmpty(worklist)
          ref = dequeue(worklist)
          foreach field in Pointers(ref)
                child = *field
                if child != null && !isMarked(child)
                      setMarked(child)
                      enqueue(worklist, child)

sweep(start, end):
     scan = start
     while (scan < end)
           if isMarked(scan)
                 unsetMarked(scan)
           else free(scan)
           scan = nextObject(scan)

The algorithm is straightforward and simple. It starts from Roots and marks each reachable object. Then it iterates over the whole heap and frees the memory of every unreachable object. Also it remove the mark of the remaining objects. It is important to say that this algorithm needs two passes over the heap.

The algorithm does not solve the problem with the heap fragmentation. This naive implementation doesn’t work well in real world scenarios. In the next post of this series we will see how we can improve it. Stay tuned.