NativeScript Performance – Part 2

The last two weeks I was busy with measuring and optimizing the performance of NativeScript for Android. My main focus was the application startup time and I would like to share some good news.

Results

Let’s first see the results and then I will dig into the details. As in the previous tests I uses the same test devices:

  • Device1 – Nexus 5, Android 4.4.1, build KOT49E
  • Device2 – Nexus 6, Android 5.0.1, build LRX22C

I used the same application as well. Here are the results:

  • For Device1 the first startup time was reduced from average 3.1419 seconds to average 2.8262 seconds (10% improvement) [*]
  • For Device2 the first startup time was reduced from average 3.541 seconds to average 3.3147 seconds (6% improvement) [*]

Details

Before I dig into the details, I would like to give you a quick reminder how I measured the times. As in the previous tests I used the built-in time/perf info that Android ActivityManager provides. It is not the best measuring tool but it is good enough for our purposes.

After detailed profiling with DDMS and NDK profilers I identified two areas for improvements:

  • asset extraction
  • proxy property access

Assets

The old implementation for asset extraction was based on AssetManager. While its API is very convenient, it is not well suited for optimal memory allocation. As a result using AssetManager along with java.io.* classes generates a lot of temporary objects which triggers the GC quite often. The solution we chose is to use libzip C++ library. It is fast and more importantly it doesn’t mess with the GC.

For applications with size similar to the test app using libzip doesn’t help much. The actual improvement is around 30-40 milliseconds. However, for big apps (e.g. 500+ files) libzip really shines. You can easily get improvement of 300-500ms, and in some scenarios more than a second. This was a good reason to reimplement the Java code into C++ and give NativeScript the ability to scale really well.

Java Object Wrappers

Proxies are an experimental ECMAScript 6 feature. In V8 (and for the matter of fact in any other JavaScript engine), direct property access is much faster than direct proxy access. This is easily understandable when you think how the JIT compiler emits the code to access traditional properties. Also, while proxies are good for scripting simple object access they don’t scale in more complex scenarios. With the time it becomes harder to implement the correct dispatch logic.

I am glad to say that we now use plain JavaScript objects to wrap Java objects. We also build the correct prototype chain to map Java class hierarchy. This give us an excellent opportunity to cache runtime objects at more granular level. And as we are going to see, caching changes everything.

While using libzip helped a little bit, it is easy to do the math and see that using prototype chains is the main factor for the improved startup time.

Let’s see how the new caches impact other scenarios. Take a look at the following code fragment.

var JavaDate = java.util.Date;
var start = new Date();
for (var i=0; i<10000; i++) {
    var d1 = new JavaDate();
    var d2 = new JavaDate();
    d1.compareTo(d2);
    d2.compareTo(d1);
}
var end = new Date();
console.log("time=" + (end.getTime() - start.getTime()));

This is not a real world scenario. I wrote this code for sole test purposes. My intent here is to exercise some Java intensive code. Also, note that using JavaScript Date.getTime is not the best way to measure time, but as we are going to see it is good enough for our purposes.

Here are the results.

  • On Device1 – using proxy objects it takes more than 12.5 seconds, using prototype chain it takes less than 2.6 seconds
  • On Device2 – using proxy objects it takes more than 11.6 seconds, using prototype chain it takes less than 2.2 seconds

In my opinion, there is no need for any further or more precise benchmarks. Simply put, using prototype chains along with proper caching is much faster than proxy objects.

Further Improvements

So far, we saw that the first startup of a simple application like CutenessIO takes around 3 seconds. Can we make it faster?

First, we have to set some reasonable expectations. Let’s see how fast HelloWorld applications written in Java and NativeScript start up. For the Java version I used the standard Eclipse project template (which is very similar to the one in Android Studio). I stripped all things like menus and fancy themes. My main goal was the make it as simple as possible (which is not much different from the standard empty project). I did the same for the NativeScript project.

Here are the results.

  • On Device1 – Java 200 milliseconds[*], NativeScript 641.5 milliseconds[*]
  • On Device2 – Java 333.5 milliseconds[*], NativeScript 875.3 milliseconds[*]

So, we have to investigate where the difference comes from. For the purpose of this article, I am going to pick Device1 (the analysis for Device2 is the same).

Let’s analyze a particular run.

  • Time for loading libNativeScript library: 7ms
  • Time for extracting assets: 30ms
  • Time for V8 initialization: 150ms
  • Time for calling Application.onCreate in JavaScript: 60ms
  • Time for calling Activity.onCreate in JavaScript: 100ms
  • Time from Application object initialization to Activity initialization: 510ms
  • Time to display main activity: 658ms

As we can see, the total time of asset extraction and V8 initialization is 180ms which is roughly the time needed for pure Java application to start. So far, it seems unlikely to reduce this time.

The total time spent in running JavaScript 160ms. This is a bit surprising. I would love to see the time spent in V8 to be, say, 400ms because this would mean that running JavaScript is 78% (400/510) of all time. High percentage of time spent inside in V8 is a good thing because this will give us an opportunity to optimize the performance. However, this would not be the case for most applications. We can think of NativeScript as a way to command Java world from JavaScript. Hence, most of the work is done in Java. That’s the nature of NativeScript.

So, we spent 160ms running a few lines of JavaScript. Can we do better? A careful analysis showed that most of this time is spent in JNI infrastructure calls and data marshalling. It seems hard to reduce it, but not unlikely. A possible option is to tweak V8 engine and/or use libffi to generate thunks.

Another 200ms is spent in some run-once pluming code. With a little effort, we could refactor the runtime to support components/modules and gain some performance. Finally, some time is spent inside the Java GC.

In closing, I would say that currently NativeScript for Android is performing well. There are no major performance issues. The current implementation is approaching the point where no big performance wins can be easily achieved. But easy is not interesting 😉 Stay tuned.

4 replies on “NativeScript Performance – Part 2”

  1. Congratulations for the awesome work. My question is: so during this initialization time what does the user see? Is it possible to show a splash screen with a custom image?

    1. By default Android shows blank (white) activity screen until the main activity is loaded. Yes, it is possible to show a splash screen. The “awkward” thing here is that you have to implement it in native (Java) code. I think we can add support for this scenario in the NativeScript runtime. It should be similar to what WPF does.

  2. > As we can see, the total time of asset extraction and V8 initialization is 180ms which is roughly the time needed for pure Java application to start.

    Probably you can extract assets and initialize V8 in other threads.

    1. Extracting assets on another thread is an option, though I doubt it will help much; after all the time needed for this step is 30ms only. Unfortunately, initializing V8 on another thread is not possible because every JavaScript VM is tied to a particular thread. This means that if you want to process, say, a button click then you have to initialize the JavaScript VM on the UI thread. Well, technically it is possible but the synchronization mechanisms are always expensive. Doing a workload in a batch may help though.

Comments are closed.