How Not To Retire Open-Source Projects

A couple of days ago I had a chat with a friend, an ex-Telerik employee, who was the driving force behind JustDecompile product. We discussed briefly the recent Telerik announcement that open-sourced JustDecompile has been retired.

This the moment when you say “Wait a moment! JustDecompile is not open-source project”. This is correct. However JustDecompileEngine, the decompilation engine behind JustDecompile is open-source project since 2015. And if try to access it on GitHub you are greeted with a 404 error

Can you see the problem? Let’s look at another retired Telerik product – JustAssembly.

Fortunately, there are a lot of forks that contain this particular JustDecompileEngine commit

I worked at Telerik for many years and I know in details how the decisions for open-sourcing or discontinuing a project are taken. Without revealing any details, let me assure you that a lot of people are involved in these processes and the decisions aren’t taken lightly. Considering sales and marketing people, product owners, team leads, support leads and VPs it is easy to involve 10-12 persons.

It is easy to conclude that the people involved in this process don’t know what open-source software development is. This is somewhat typical for many corporations. It is funny. It would be funnier if Telerik restores JustDecompileEngine repo (wink-wink).

In case you decide to retire an open-source project I would suggest the following steps:

  1. Edit main README.md file
    • briefly explain the reason to retire the project
    • provide a list of alternative, preferable open-source projects
  2. Disable, if any, CI builds
  3. Consider changing the project license to even more permissive if possible
  4. Archive the repo (make it read-only)

Anyway, JustDecompile is now CodemerxDecompile and JustDecompileEngine has a new home and is even better.

Playing with Z3 Theorem Prover

Once again it’s Christmas time which, to me, means time for leisure. While I am not an avid gamer, sometimes I play simple, non-engaging games on my phone. One of the games in particular is Calculator: The Game.  It is a very simple game and your goal is to calculate a number within some operation count limit. Here is a screenshot.


For example, in this particular level your goal is to calculate -120 within 4 moves. Here is brief description of the operations you can apply:

  • Button “x5” – multiply the current result by 5
  • Button “-6” – subtract 6 from current result
  • Button “4” – append 4 to the current result

Let’s give these operations proper names: Mul5, Sub6 and Append4. The solution is to apply these operations in the following order:

  1. Append4 (end result: 4)
  2. Sub6 (end result: -2)
  3. Append4 (end result: -24)
  4. Mul5 (end result: -120, the level is completed)

I love to play this kind of games for about 10-15 minutes. So yesterday, after I played for a bit, my mind was drifting and I was in the perfect mood for learning something new. Part of my mind was still busy with the game, so I took the opportunity to learn a bit about Z3 Theorem Prover. Googling for “Z3 tutorial” and following the first search result landed me on the rise4fun Z3 page. It’s a wonderful online playground for Z3. I skimmed over the tutorial and felt confident enough that I get the basics, so I decided to give it a try.

First, we have to define Mul5, Sub6 and Append4 operations.

(define-fun Mul5 ((x Int)) Int (* x 5))
(define-fun Sub6 ((x Int)) Int (- x 6))
(define-fun Append4 ((x Int)) Int
    (ite (< x 0) (- (* x 10) 4) (+ (* x 10) 4))
)

Then we have to model the gameplay. We have 4 moves (steps) and on each step we have to apply exactly one operation. We can model each step as follows:

c1*Mul5 + c2*Sub6 + c3*Append4

The integer coefficients c1, c2 and c3 are constrained as follows:

0 <= c1 <= 1
0 <= c2 <= 1
0 <= c3 <= 1
c1 + c2 + c3 = 1

This guarantees us that exactly one operation will be applied on each step. Let’s code it for step 1.

(declare-fun s1Mul5 () Int)
(declare-fun s1Sub6 () Int)
(declare-fun s1Append4 () Int)
(assert (and (<= 0 s1Mul5) (<= s1Mul5 1)))
(assert (and (<= 0 s1Sub6) (<= s1Sub6 1)))
(assert (and (<= 0 s1Append4) (<= s1Append4 1)))
(assert (= 1 (+ s1Mul5 s1Sub6 s1Append4)))

(define-fun Step1 ((x Int)) Int
    (+ (* s1Mul5 (Mul5 x)) (* s1Sub6 (Sub6 x)) (* s1Append4 (Append4 x)))
)

The code for steps 2, 3 and 4 is similar to the code for step 1. Finally, the result from each step should be the input value for the next step.

(define-fun Level38 ((x Int)) Int
    (Step4 (Step3 (Step2 (Step1 x))))
)

I guess my game model is clumsy but this is what I’ve got with after 30 minutes skimming over the tutorial and playing with the examples. Finally, we can test our model as follows:

(assert (= (Level38 0) -120))

(check-sat)
(get-model)

Here is the output.

sat
(model
  (define-fun s1Mul5 () Int
    0)
  (define-fun s1Sub6 () Int
    0)
  (define-fun s1Append4 () Int
    1)
  (define-fun s2Mul5 () Int
    0)
  (define-fun s2Sub6 () Int
    1)
  (define-fun s2Append4 () Int
    0)
  (define-fun s3Mul5 () Int
    0)
  (define-fun s3Sub6 () Int
    0)
  (define-fun s3Append4 () Int
    1)
  (define-fun s4Mul5 () Int
    1)
  (define-fun s4Sub6 () Int
    0)
  (define-fun s4Append4 () Int
    0)
)

So, Z3 calculated the sequence s1Append4s2Sub6s3Append4s4Mul5 satisfies our model which is indeed a solution for that game level. You can find the full code here and you can play with it on the rise4fun playground as well. Let’s try to find a solution for another end goal, say -130 instead of -120.

(assert (= (Level38 0) -130))

(check-sat)
(get-model)

Here is the output for new solution.

sat
(model
  (define-fun s1Mul5 () Int
    0)
  (define-fun s1Sub6 () Int
    1)
  (define-fun s1Append4 () Int
    0)
  (define-fun s2Mul5 () Int
    0)
  (define-fun s2Sub6 () Int
    1)
  (define-fun s2Append4 () Int
    0)
  (define-fun s3Mul5 () Int
    0)
  (define-fun s3Sub6 () Int
    0)
  (define-fun s3Append4 () Int
    1)
  (define-fun s4Mul5 () Int
    0)
  (define-fun s4Sub6 () Int
    1)
  (define-fun s4Append4 () Int
    0)
)

Indeed, if you the apply the following steps s1Sub6s2Sub6s3Append4s4Sub6 you will get end result -130.

In closing, I would say that using Z3 seems quite intuitive and easy. Z3 provides bindings for C/C++, Java, .NET, Python and ML/OCalm which makes it accessible from most popular programming languages.

Memory management in NativeScript for Android

Note: This post will be a bit different from the previous ones. It’s intended to provide brief history as to why current NativeScript for Android implementation is designed this way. So, this post will be most useful for my Telerik ex-colleagues. Think of it as kind of historic documentation. Also, it is a chance to have a peek inside a developer’s mind 😉

I already gave you a hint about my current affairs. Since February I took the opportunity to pursue new ventures in a new company. The fact that my new office is the very next building to Telerik HQ gives me an opportunity to keep close connections with my former colleagues. At one such coffee break I was asked about the current memory management implementation. As I am no longer with Telerik, my former colleagues miss some important history that explains why this feature is implemented this way. I tried to explain briefly that particular technical issue in a previous post, however I couldn’t go much in depth because NativeScript was not announced yet. So, here I’ll try to provide more details.

Note: Keep in mind that this post is about NativeScript for Android platform, so I will focus only on that platform.

On the very first day of the project, we decided that we should explore what can be done with JavaScript-to-Java bidirectional marshalling. So, we set up a simple goal: make an app with a single button that increments a counter. Let’s see what Android docs says about button widget.

 public class MyActivity extends Activity {
     protected void onCreate(Bundle savedInstanceState) {
         super.onCreate(savedInstanceState);

         setContentView(R.layout.content_layout_id);

         final Button button = findViewById(R.id.button_id);
         button.setOnClickListener(new View.OnClickListener() {
             public void onClick(View v) {
                 // Code here executes on main thread after user presses button
             }
         });
     }
 }

After so many years, this is the first code fragment you see on the site. And it should be so. This code fragment captures the very essence of what button widget is and how it is used. We wanted to provide JavaScript syntax which feels familiar to Java developers. So, we ended up with the following syntax:

var button = new android.widget.Button(context);
button.setOnClickListener(new android.view.View.OnClickListener({
   onClick: function() {
      // do some work
   }
}));

This example is shown countless times in NativeScript docs and various presentation slides/materials. It is part of our first and main test/demo app.

Motivation: we wanted to provide JavaScript syntax which is familiar to existing Android developers.

This decision brings an important implication, namely the usage of JavaScript closures. To understand why closures are important for the implementation, we could take a look at the following simple, but complete, Java example.

package com.example;

import android.app.Activity;
import android.os.Bundle;
import android.view.View;
import android.widget.Button;
import android.widget.LinearLayout;
import android.widget.TextView;

public class MyActivity extends Activity {
    private int count = 0;

    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        LinearLayout layout = new LinearLayout(this);
        layout.setFitsSystemWindows(false);
        layout.setOrientation(LinearLayout.VERTICAL);

        final TextView txt = new TextView(this);
        layout.addView(txt);

        Button btn = new Button(this);
        layout.addView(btn);
        btn.setText("Increment");
        btn.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                txt.setText("Count:" + (++count));
            }
        });

        setContentView(layout);
    }
}

Behind the scene, the Java compiler will generate anonymous class that we can decompile and inspect closely. For the purpose of this post I am going to use fernflower decompiler. Here is the output for MyActivity$1 class.

package com.example;

import android.view.View;
import android.view.View.OnClickListener;
import android.widget.TextView;

class MyActivity$1 implements OnClickListener {
   // $FF: synthetic field
   final TextView val$txt;
   // $FF: synthetic field
   final MyActivity this$0;

   MyActivity$1(MyActivity this$0, TextView var2) {
      this.this$0 = this$0;
      this.val$txt = var2;
   }

   public void onClick(View view) {
      this.val$txt.setText("Count:" + MyActivity.access$004(this.this$0));
   }
}

We can see the Java compiler generates code that:
1) captures the variable txt
2) deals with ++count expression

This means that the click handler object holds references to the objects it accesses in its closure. We can call this class stateful as it has class members. Fairly trivial observation.

Let’s take a look again at the previous JavaScript code.

var button = new android.widget.Button(context);
button.setOnClickListener(new android.view.View.OnClickListener({
   onClick: function() {
      // do some work
   }
}));

We access the button widget and call its setOnClickListener method with some argument. This means that we should have instantiated Java object which implements OnClickListener so that the button can use it later. You can find the class implementation for that object in your project platform directory

[proj_dir]/platforms/android/src/main/java/com/tns/gen/android/view/View_OnClickListener.java

Let’s see what the actual implementation is.

package com.tns.gen.android.view;

public class View_OnClickListener
       implements android.view.View.OnClickListener {
  public View_OnClickListener() {
    com.tns.Runtime.initInstance(this);
  }

  public void onClick(android.view.View param_0)  {
    java.lang.Object[] args = new java.lang.Object[1];
    args[0] = param_0;
    com.tns.Runtime.callJSMethod(this, "onClick", void.class, args);
  }
}

As we can see this class acts as a proxy and doesn’t have fields. We can call this class stateless. We don’t store information that we can use to describe its closure if any.

So, we saw that Java compiler generates classes that keep track of their closures while NativeScript generates classes that don’t keep track of their closures. This is a simple implication due to the fact the JavaScript is a dynamic language and the information of lexical scope is not enough to provide full static analysis. The full information about JavaScript closures can be obtain at run time only.

The ovals diagram I used in my previous post visualize the missing object reference to the closed object. So, now we have an understanding what happens in NativeScript runtime for Android. The current NativeScript, at the time of writing version 3.3, provides mechanism to “compensate” for the missing object references. To put it simply, for each JavaScript closure accessible from Java we traverse all reachable Java objects in order to keep them alive until the closure becomes unreachable from Java. Well, while we were able to describe the current solution in a single sentence it doesn’t mean it doesn’t have drawbacks. This solution could be very slow if an object with large hierarchy, like global, is reachable from some closure. If this is the case, the implication is that we will traverse the whole V8 heap on each GC.

Back then in 2014, when we hit this issue for the first time, we discussed the option to customize part of the V8 garbage collector in order to provide faster heap traversing. The drawback is slower upgrade cycle for V8 which means that JavaScriptCore engine will provide more features at given point in time. For example, it is not easy to explain to the developers why they can use class syntax for iOS but not for Android.

Motivation: we wanted to keep V8 customization at minimum so we can achieve relatively feature parity by upgrading V8 engine as soon as possible.

So, now we know traversing V8 heap can be slow, what else? The current implementation is incomplete and case-by-case driven. This means that it is updated when there are important and common memory usage patterns. For example, currently we don’t traverse Map and Set objects.

Let’s see what can happen in practice. Create a default app.

tns create app1

Run the app and make sure it works as expected.

Now, we have to go through the process of designing a user scenario where the runtime will crash. We know that the current implementation doesn’t traverse Map and Set objects. So, we have to make Java object which is reachable only through, let’s say, Map object. This is only the first part of our exercise. We also must take care to make it reachable through a closure. Finally, we must give a chance for GC to collect it before we use it. So, let’s code it.

function crash() {
    var m = new Map();
    m.set('o', new java.lang.Object() /* via the map only */);
    var h = new android.os.Handler(android.os.Looper.getMainLooper());
    h.post(new java.lang.Runnable({
        run: function() {
            console.log(m.get('o').hashCode());
        }
    }));
}

That’s all. Finally, we have to integrate crash within our application. We can do so by modifying onTap handler in [proj_dir]/app/main-view-model.js as follows:

viewModel.onTap = function() {
    crash();
    gc();
    java.lang.Runtime.getRuntime().gc();
    this.counter--;
    this.set("message", getMessage(this.counter));
}

Run the app and click the button. You should get error screen similar to the following one.

Motivation: we wanted to evolve V8 heap traversing on case-by-case basis in order to traverse as little as possible.

Understanding this memory usage pattern (create object, set up object reachability, GC and usage) is a simple but powerful tool. With the current implementation the fix for Map and Set is similar to this one. Also, realizing that in the current implementation the missing references to the captured objects is the only reason for this error is critical for any further changes. This is well documented in the form of unit tests.

So far we discussed the drawbacks of the current implementation. Let’s say a few words about its advantages. First, and foremost, it keeps the current memory management model familiar to the existing Java and JavaScript developers. This is important in order to attract new developers. If two technologies, X and Y, solve similar problems and offer similar licenses, tools, etc., the developers are in favor for the one with simpler “mental model”. While introducing alloc/free or try/finally approach is powerful, it does not attract new developers because it sets higher entry level, less explicit approach. Another advantage, which is mostly for the platform developers, is the fact that current approach aligns well with many optimizations that can be applied. For example, taking advantage (introducing) of GC generations for the means of NativeScript runtime. Also, it allows per-application fine tuning of existing V8 flags (e.g, gc_intervalincremental_markingminor_mc, etc.). Tweaking V8 flags won’t have general impact when manual memory management is applied. In my opinion, tuning these flags is yet another way to help regular Joe shooting himself in the foot, but providing sane defaults and applying adaptive schemes very possible could be a huge win.

It is important to note that whatever approach is applied, this must be done carefully because of the risk of OOM exception. Introducing schemes like GC generation should consider the object memory weight. This will make obsolete the current approaches that use time and/or memory pressure heuristics. In general, such GC generation approach will pay off well.

I hope I shed more light on this challenging problem. Looking forward to see how the team is going to approach it. Good luck!

NativeScript release life cycle

I am glad to announce that yesterday we released NativeScript 2.4. This is all good, but in this post I would like to discuss the future. If you take a look at the next GitHub milestone you will see this

rc25

So, why 2.5.0-RC?

There are many reasons for this. Firstly, it is hard to make a feature right from the first time. And by “right” I don’t mean correct but also right from technical perspective. It’s not easy to say but every developer knows the difference a solution and the right solution. Secondly, often our requirements are incomplete and we need feedback as soon as possible in order to ship the right feature. Even with complete requirements it will take longer for development, thus delaying the user feedback. Following the analogy with minimum viable product (MVP), you can think of minimum viable technical implementation that (almost) meets the current requirements. As with many other release approaches it is a trade-off. Shipping RC is a sweet spot as we will offer reasonable product quality in a timely manner. Each RC will be followed shortly by an official release. So far, our release history shows that a week or two is enough to fix all corner-case issues and apply other minor fixes.

Of course, there are drawbacks. Probably the biggest one is that there will be increased operational costs for actual shipping RC or required changes in the test infrastructure for example. I think this is a good incentive to automate even more the existing processes and infrastructures so it will be a win-win situation. Stay tuned.

Lessons Learned From OSS

It’s more than 3 years since I started working on NativeScript. Working on an OSS project is different from anything I did before so I decided to share some of my experience. Here are some of the lessons I learned working on OSS and as usually happens all these lessons were discovered by many other people before.

Make an economic difference

This is a common characteristic of almost every successful OSS project. Take a look at Linux, Ruby on Rails, MySQL or Node.js for example. All these projects made an impact on our society for good reasons. They may change existing markets, create new jobs or simply make us more productive.

Technology that matters is technology that drives some economic difference.

Whatever is the OSS project you work on make sure it’s worth it.

Engage the community

Doing OSS project in isolation limits your chances to build a product that matters. Many OSS projects go in wrong direction without user feedback. OSS projects may fail for a lot of reasons: they do the wrong thing, solve the wrong problem or solve the right problem in non user friendly way. These issues can be prevented if a user community is in place. Building software for other users than yourself is challenging. Even if you have a clear vision what the project should be it is good idea to discuss it in public. Leaving “gaps” in your project gives chance for user contribution. Here is a quote from Christopher Alexander on this topic:

Master plans have two additional unhealthy characteristics. To begin with, the existence of a master plan alienates the users…. After all, the very existence of a master plan means, by definition, that the members of the community can have little impact on the future shape of their community, because most of the important decisions have already been made. In a sense, under a master plan people are living with a frozen future, able to affect only relatively trivial details. When people lose the sense of responsibility for the environment they live in, and realize that there are merely cogs in someone else’s machine, how can they feel any sense of identification with the community, or any sense of purpose there?

Second, neither the users nor the key decision makers can visualize the actual implications of the master plan.

Here is another observation, the Conway’s law, which in its simplified form states:

organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations

In short, no matter how smart you are your solutions will be constrained by your understanding about the problem. Engaging more people on a problem can give you more perspectives on it, hence better chances to find broader solution.

Find your pace

I love the following comment on Linux 3.0 kernel because it captures an important aspect of OSS projects:

It’s hard to make revolutionary changes, especially when you have the open-source community making so many contributions. Just like a democratic political process, the changes are going to be incremental, but user driven. That’s why free software is so good.

Striving for innovation may cause as much damage as much your intention for good is. Often the push for novelty may be far less productive than what can be achieved by broadening participation. Achieving mass adoption in an incremental approach and solving the nowadays problems, often gives the most sustainable way for growth.

Learn from the others

Finally, I would like to comment on some of Eric S. Raymond’s lessons for creating good OSS. I am going to comment only on these that are confirmed by my experience.

  • Every good work of software starts by scratching a developer’s personal itch.
    I couldn’t agree more. Starting a project because you have a problem is one thing and starting a project because of me-too effect is another story. Be careful if you start a project just to chase the market.
  • Good programmers know what to write. Great ones know what to rewrite (and reuse).
    Building your project around an existing one may give you a tremendous boost. You can take leverage on the existing community and shape your project quickly and efficiently.
  • If you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource.
    While this is a common sense, I think is one of the most underestimated thing. Early adopters can give you valuable feedback and can become evangelists for your project.
  • The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.
    I already touched this topic with Conway’s law. The more people are generating ideas the better are the chances for your project to succeed.
  • Often, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.
    We learn from our mistakes but often it is not that easy to see them. Having a healthy OSS community greatly improves the chances for someone to point you the mistake or make a code review.
  • Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.
    I already touched this topic with Christopher Alexander’s quote. When designing a feature you rarely can predict all use cases. Engaging the OSS community can provide you with more perspectives on the problem so you can design better features.
  • Provided the development coordinator has a communications medium at least as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one.
    It just says it all.

How Android Instant Run Works

Today I installed Android Studio 2.0 Beta 7 and I really enjoyed one of its new features, namely Instant Run. We built a similar feature for NativeScript, named LiveSync, so I was curious to how Instant Run feature works.

The first thing I noticed is instant-run.jar placed in <my project>/build/intermediates/incremental-runtime-classes/debug directory. This library provides Server class in the com.android.tools.fd.runtime package where are the most interesting things. This class is a simple wrapper around android.net.LocalServerSocket which opens a unix domain socket with the application package name. And indeed during the code update you can see similar messages in the logcat window.

com.example.testapp I/InstantRun: Received connection from IDE: spawning connection thread

In short, when you change your code the IDE generates new *.dex file which is uploaded in files/instant-run/dex-temp directory which is private for your application package. It then communicates with the local server which uses com.android.tools.fd.runtime.Restarter class. The Restarter class in an interested one. It uses a technique very similar to the one we use in NativeScript Companion App to either restart the app or recreate the activities. The last one is a nice feature which the current {N} companion app doesn’t support. I find it a bit risky but probably it will work for most scenarios. I guess we could consider to implement something similar for {N}.

So far we know the basics of Instant Run feature. Let’s see what these *.dex files are and how they are used. For the purpose of this article I am going to pick a scenario where I change a code inside Application’s onCreate method. Note that in this scenario Instant Run feature won’t work since this method is called once. I pick this scenario just to show that this feature has some limitations. Nevertheless, the generated code shows clearly how this feature is designed and how it should work in general. Take a look at the following implementation.

package com.example.testapp;
public class MyApplication extends Application {
    private static MyApplication app;
    private String msg;
    public MyApplication() {
        app = this;
    }
    @Override
    public void onCreate() {
        super.onCreate();
        int pid = android.os.Process.myPid();
        msg = "Hello World! PID=" + pid;
    }
    public static String getMessage() {
        return app.msg;
    }
}

Let’s change msg as follows

msg = "Hello New World! PID=" + pid;

Now if I click Instant Run button the IDE will generate new classes.dex file inside <my project>/build/intermediates/reload-dex/debug directory. This file contains MyApplication$override class and we can see the following code.

public static void onCreate(MyApplication $this) {
  Object[] arrayOfObject = new Object[0];
  MyApplication.access$super($this, "onCreate.()V", arrayOfObject);
  AndroidInstantRuntime.setStaticPrivateField($this, MyApplication.class, "app");
  int pid = Process.myPid();
  AndroidInstantRuntime.setPrivateField($this, "Hello New World! PID=" + pid, MyApplication.class, "msg");
}

By now it should be easy to guess how the original onCreate method is rewritten. The rewritten MyApplication.class file is located in <my project>/build/intermediates/transforms/instantRun/debug/folders/1/5/main/com/example/testapp folder.

public void onCreate() {
  IncrementalChange localIncrementalChange = $change;
  if (localIncrementalChange != null) {
    localIncrementalChange.access$dispatch("onCreate.()V", new Object[] { this });
    return;
  }
  super.onCreate();
    
  app = this;
  int pid = Process.myPid();
  this.msg = ("Hello World! PID=" + pid);
}

As you can see there is nothing special. During compilation the new gradle-core-2.0.0-beta7.jar library uses the classes like com.android.build.gradle.internal.incremental.IncrementalChangeVisitor to instrument the compiled code so it can support Instant Run feature.

I hope this post sheds some light on how Android Instant Run feature works.

Thirty Years After

I found this wonderful Turbo Pascal ad while browsing InfoWorld 3 Jun 1985 issue.

With more than 250,000 users worldwide Turbo Pascal is the industry’s de facto standard. Turbo Pascal is praised by more engineers, hobbyists, students and professional programmers than any other development environment in the history of microcomputing. And yet, Turbo Pascal is simple and fun to use!

Wow! I wonder what will happen to today’s programming languages after 30 years.

Software Performance Engineering

Every time we build software we have functional requirements. Maybe these functional requirements are not well defined but they are good enough to start prototyping and you refine them as you go. After all, the functional requirements describe what you have to build. Sometimes, you have additional requirements that describe how your software system should operate, rather than how it should behave. We call them non-functional requirements. For example if you build a web site, a non-functional requirement can define the maximum page response time. In this post I am not going to write about the SPE approach, but rather about the more general term of software performance engineering.

Context

Anytime a new technology emerges people want to know how fast it is. We seem obsessed with this question. Once we understand the basic scenarios where a new tech is applicable we start asking questions about its performance. Unfortunately, talking about performance is not easy because of all misunderstanding around it.

Not quick. Efficient.

Let’s set up some context. With any technology we are trying to solve some problem, not everything. In some sense, any technology is bound to the time. In general, nowadays we solve different problems than 10 years ago. This is so obvious in the IT industry as it changes so fast. Thus, when we talk about given technology it is important to understand where it comes from and what are the problems it tries to solve. Obviously, a technology cannot solve all current problems and there is no best technology for everything and everybody.

We should understand another important aspect as well. Usually when a new technology emerges it has some compatibility with an older one. We don’t want people to learn again how to do trivial things. Often this can have big impact on performance.

Having set up the context, it should be clear that the interpreting performance results is also time bound. What we have considered fast 5 years ago, may not be fast any longer. Now, let’s try to describe informally what performance is and how we try to build performant software. Performance is a general term describing various system aspects such as operation completion time, resource utilization, throughput, etc. While it is a quantitative discipline it does not define itself any criterion what is good or bad performance. For the sake of this post, this definition is good enough to understand why performance is important in say, algorithmic trading. In general, there is a clear connection between performance and cost.

IT Industry-Education Gap

Performance issues are rarely measured in a single value (e.g. CPU utilization) and thus they are difficult to understand. These problems become even more difficult in distributed and parallel systems. Despite the fact that performance is important and difficult problem, most universities and other educational facilities fail to prepare their students so they can avoid and efficiently solve performance issues. The IT industry has recognized this fact and there are companies like Pluralsight, Udacity and Coursera that offer additional training on this topic.

In the rare cases where students are taught on localizing and solving performance issues, they use outdated textbooks from ’80s. Current education cannot produce well-trained candidates mostly because the teachers have outdated knowledge. On the other hand, many (online) education companies offer highly specialized performance courses in say, web development, C++, Java or .NET, which cannot help students to understand the performance issues in depth.

Sometimes the academia tries to help the IT industry providing facilities like cache-oblivious algorithms or QN models but abstracting real hardware often produces suboptimal solutions.

Engaging students in real-life projects can prepare them much better. It doesn’t matter whether it is an open-source project or a collaboration with the industry. At present, students just don’t have the chance to work on a big project and thus miss the opportunity to learn. Not surprisingly the best resources on solving performance issues are various blogs and case studies from big companies like Google, Microsoft, Intel and Twitter.

Performance Engineering

Often software engineers have to rewrite code or change system architecture because of performance problems. To mitigate such expensive changes, many software engineers try to employ various tools and practices. Usually, these practices can be formalized in the form of an iterative process which is part from the development process itself. A common simplified overview of such iterative process might be as follows:

  • identify critical use cases
  • select a use case (by priority)
  • set performance goals
  • build/adjust performance model
  • implement the model
  • gather performance data (exercise the system)
  • report results

Different performance frameworks/approaches emphasize on different stages and/or techniques for modelling. However, what they all have in common is that it is an iterative process. We set up performance goals and perform quantitative measurements. We repeat the process until we meet the performance goals.

Most performance engineering practices heavily rely on tools and automation. Usually, they are part from various CI and testbed builds. This definitely streamlines the process and helps the software engineers. Still, there is a big caveat. Building a good performance model is not an easy task. You must have a very good understanding of the hardware and the whole system setup. The usual way to overcome this problem is to provide small, composable model templates for common tasks so the developer can build larger and complex models.

In closing I would say that there isn’t a silver bullet when it comes to solving performance issues. The main reason to make solving performance issues difficult is that it requires a lot of knowledge and expertise from both software and hardware areas. There are a lot of places for improvement both in the education and the industry.

What’s New in Chakra JavaScript Engine

A few weeks ago I decided to install Windows 10 Mobile Insider Preview on my Nokia Lumia 630 and played a little bit with it. Since then, I have completely forgotten about it until yesterday when I saw a notification for pending software update (10.0.12562.84). So I grabbed the opportunity to see what changed in Chakra JavaScript engine and JsRT API.

Overview

I am going highlight only some of the architectural changes in Chakra, for more information you can read MSDN documentation. Firstly, Microsoft decided to create new chakra.dll library and keep the old jscript9.dll library for compatibility reasons. This is a good decision because it allows shorter release cycles and provides some space for experimentation as well. Secondly, it seems that Microsoft is all into performance optimizations right now. Some of the most important optimizations are:

  • concurrent JIT compiler
  • new simple JIT compiler (when bailout happens)
  • improved polymorphic inline cache
  • equivalent object type specialization
  • bounds checking elimination (array optimization)
  • minified code optimization (sounds interesting and very promising)
  • concurrent mark-and-sweep GC (mark phase)

Lastly, with the upcoming ECMAScript 6 Microsoft decided to provide better support for it which is a big win for everybody.

JsRT

This is where it becomes interesting. As I work on NativeScript project, I would like to access WinRT APIs from JavaScript. In fact, Microsoft already supports this scenario in WinJS but I am interested in accessing all WinRT APIs and being able to build XAML based UI from JavaScript. Last September I blogged how to embed Chakra in Windows Phone 8.1 but back then this scenario was practically not supported by Microsoft. There wasn’t even jscript9.lib import library for ARM.

I am happy to say that those days are gone. Now, JsRT provides better support for WinRT projections. This is done through the following APIs:

  • JsProjectWinRTNamespace
  • JsInspectableToObject
  • JsObjectToInspectable

Let’s see how this works (I assume you have already installed Windows 10 Technical Preview and Visual Studio 2015 RC). Create new WinRT library project (Visual C++ -> Windows -> Windows Universal -> Windows Runtime Component). In my case I named it WindowsRuntimeComponent1 and created a simple Greeter class as follows.

namespace WindowsRuntimeComponent1
{
    public ref class Greeter sealed
    {
    public:
        Platform::String^ SayHello()
        {
            return ref new Platform::String(L"Hello");
        }
    };
}

Create an empty app (Visual C++ -> Windows -> Windows Universal -> Blank App) and add reference to the WindowsRuntimeComponent1 project. You have to define the macro USE_EDGEMODE_JSRT  in order to use the new JsRT API and link against chakrart.lib as well. Projecting WinRT classes is as easy as follows.

JsErrorCode err = JsProjectWinRTNamespace(L"WindowsRuntimeComponent1");
assert(JsNoError == err);

Now we are ready to consume the projected WinRT classes from JavaScript.

var g = new WindowsRuntimeComponent1.Greeter();
var s = g.sayHello();

I have to say that the debugging experience is almost perfect. I say “almost” only because I don’t see script debugging for ARM devices. I guess since this is Visual Studio 2015 RC it is a kind of expected. Also, you can always use script debugger on Windows Phone emulator since it is running x86 code.

You can find the sample project at GitHub.

Conclusion

Using the new JsRT together with Windows 10 Universal Application Platform (UAP) makes it easy to write apps that use JavaScript scripting. The good thing is that UAP guarantees that your apps will work across all kind of devices. There are some important limitations though:

  • cannot use XAML types (I guess it is still related to WebHostHidden attribute)
  • cannot extend types from JavaScript (again related to XAML)
  • cannot access Chakra in WinJS apps from WinRT components

I guess if you don’t want to build JavaScript/native bridges then the new JsRT is good enough. Resolving the above-mentioned issues will allow writing much more sophisticated apps though. Right now, you can use JsRT for simple scripting and nothing else. Making Chakra engine an open-source project will solve these and other issues. It will allow people to contribute to and customize the engine. Will it ever happen? Only time will tell.

NativeScript Performance – Part 2

The last two weeks I was busy with measuring and optimizing the performance of NativeScript for Android. My main focus was the application startup time and I would like to share some good news.

Results

Let’s first see the results and then I will dig into the details. As in the previous tests I uses the same test devices:

  • Device1 – Nexus 5, Android 4.4.1, build KOT49E
  • Device2 – Nexus 6, Android 5.0.1, build LRX22C

I used the same application as well. Here are the results:

  • For Device1 the first startup time was reduced from average 3.1419 seconds to average 2.8262 seconds (10% improvement) [*]
  • For Device2 the first startup time was reduced from average 3.541 seconds to average 3.3147 seconds (6% improvement) [*]

Details

Before I dig into the details, I would like to give you a quick reminder how I measured the times. As in the previous tests I used the built-in time/perf info that Android ActivityManager provides. It is not the best measuring tool but it is good enough for our purposes.

After detailed profiling with DDMS and NDK profilers I identified two areas for improvements:

  • asset extraction
  • proxy property access

Assets

The old implementation for asset extraction was based on AssetManager. While its API is very convenient, it is not well suited for optimal memory allocation. As a result using AssetManager along with java.io.* classes generates a lot of temporary objects which triggers the GC quite often. The solution we chose is to use libzip C++ library. It is fast and more importantly it doesn’t mess with the GC.

For applications with size similar to the test app using libzip doesn’t help much. The actual improvement is around 30-40 milliseconds. However, for big apps (e.g. 500+ files) libzip really shines. You can easily get improvement of 300-500ms, and in some scenarios more than a second. This was a good reason to reimplement the Java code into C++ and give NativeScript the ability to scale really well.

Java Object Wrappers

Proxies are an experimental ECMAScript 6 feature. In V8 (and for the matter of fact in any other JavaScript engine), direct property access is much faster than direct proxy access. This is easily understandable when you think how the JIT compiler emits the code to access traditional properties. Also, while proxies are good for scripting simple object access they don’t scale in more complex scenarios. With the time it becomes harder to implement the correct dispatch logic.

I am glad to say that we now use plain JavaScript objects to wrap Java objects. We also build the correct prototype chain to map Java class hierarchy. This give us an excellent opportunity to cache runtime objects at more granular level. And as we are going to see, caching changes everything.

While using libzip helped a little bit, it is easy to do the math and see that using prototype chains is the main factor for the improved startup time.

Let’s see how the new caches impact other scenarios. Take a look at the following code fragment.

var JavaDate = java.util.Date;
var start = new Date();
for (var i=0; i<10000; i++) {
    var d1 = new JavaDate();
    var d2 = new JavaDate();
    d1.compareTo(d2);
    d2.compareTo(d1);
}
var end = new Date();
console.log("time=" + (end.getTime() - start.getTime()));

This is not a real world scenario. I wrote this code for sole test purposes. My intent here is to exercise some Java intensive code. Also, note that using JavaScript Date.getTime is not the best way to measure time, but as we are going to see it is good enough for our purposes.

Here are the results.

  • On Device1 – using proxy objects it takes more than 12.5 seconds, using prototype chain it takes less than 2.6 seconds
  • On Device2 – using proxy objects it takes more than 11.6 seconds, using prototype chain it takes less than 2.2 seconds

In my opinion, there is no need for any further or more precise benchmarks. Simply put, using prototype chains along with proper caching is much faster than proxy objects.

Further Improvements

So far, we saw that the first startup of a simple application like CutenessIO takes around 3 seconds. Can we make it faster?

First, we have to set some reasonable expectations. Let’s see how fast HelloWorld applications written in Java and NativeScript start up. For the Java version I used the standard Eclipse project template (which is very similar to the one in Android Studio). I stripped all things like menus and fancy themes. My main goal was the make it as simple as possible (which is not much different from the standard empty project). I did the same for the NativeScript project.

Here are the results.

  • On Device1 – Java 200 milliseconds[*], NativeScript 641.5 milliseconds[*]
  • On Device2 – Java 333.5 milliseconds[*], NativeScript 875.3 milliseconds[*]

So, we have to investigate where the difference comes from. For the purpose of this article, I am going to pick Device1 (the analysis for Device2 is the same).

Let’s analyze a particular run.

  • Time for loading libNativeScript library: 7ms
  • Time for extracting assets: 30ms
  • Time for V8 initialization: 150ms
  • Time for calling Application.onCreate in JavaScript: 60ms
  • Time for calling Activity.onCreate in JavaScript: 100ms
  • Time from Application object initialization to Activity initialization: 510ms
  • Time to display main activity: 658ms

As we can see, the total time of asset extraction and V8 initialization is 180ms which is roughly the time needed for pure Java application to start. So far, it seems unlikely to reduce this time.

The total time spent in running JavaScript 160ms. This is a bit surprising. I would love to see the time spent in V8 to be, say, 400ms because this would mean that running JavaScript is 78% (400/510) of all time. High percentage of time spent inside in V8 is a good thing because this will give us an opportunity to optimize the performance. However, this would not be the case for most applications. We can think of NativeScript as a way to command Java world from JavaScript. Hence, most of the work is done in Java. That’s the nature of NativeScript.

So, we spent 160ms running a few lines of JavaScript. Can we do better? A careful analysis showed that most of this time is spent in JNI infrastructure calls and data marshalling. It seems hard to reduce it, but not unlikely. A possible option is to tweak V8 engine and/or use libffi to generate thunks.

Another 200ms is spent in some run-once pluming code. With a little effort, we could refactor the runtime to support components/modules and gain some performance. Finally, some time is spent inside the Java GC.

In closing, I would say that currently NativeScript for Android is performing well. There are no major performance issues. The current implementation is approaching the point where no big performance wins can be easily achieved. But easy is not interesting 😉 Stay tuned.