Running JavaScriptCore on Windows Phone 8.1

After the first release of NativeScript I decided to spend some time playing with JavaScriptCore engine. We use it in NativeScript bridge for iOS and so far I heard good words about it from my colleagues. So I decided to play with JavaScriptCore and compare it to V8 engine.

At present NativeScript supports Android and iOS platforms only. We have plans to add support for Windows Phone as well and I thought it would be nice to have some experience with JavaScriptCore on Windows Phone before we make a choice between Chakra and JavaScriptCore.

The first difference I noticed between JavaScriptCore and V8 is that it has an API much closer to C style while the V8 API is entirely written in C++. This is not an issue and sometimes I consider it as an advantage.

The second difference, in my opinion, is that JavaScriptCore API is more simpler and expose almost no extension points. From this point of view I consider the V8 API the better one. Compared to JavaScriptCore, V8 provides much richer API for controlling internals of the engine like JIT compilation, object heap management and garbage collection.

Nevertheless it was fun to play with JavaScriptCore engine. You can find a sample project at GitHub.

Synchronizing GC in Java and V8

In the last post I wrote that I work on a project that involves a lot of interoperability between Java and V8 JavaScript engine. Here is an interesting problem I was investigating the last couple of days.

Both V8 and JVM use garbage collector for memory management. While using GC provides a lot of benefits sometimes having two garbage collectors in a single process can be tricky though. Suppose we have a super-charged version of LiveConnect where we have access to the full Java API.

var file = new java.io.File("readme.txt");

console.log("length=" + file.length());

These two lines of JavaScript may seem quite simple at first glance. We create an instance of java.io.File and call one of its methods. The tricky part is that we are doing this from JavaScript and we must take care that the actual Java instance would not be GC’ed before we call length method. In other words, we should provide some form of memory management. Suppose we decide to use JNI global references and we call NewGlobalRef every time when we create a new Java object from JavaScript. Accordingly we call DeleteGlobalRef when V8 makes Java object unreachable from JavaScript.

Let’s see a more complicated scenario.

var outStream = new java.io.FileOutputStream("log.txt");

var eventCallback = new com.example.EventCallback({
    onDataReceived: function(data) {
       outStream.write(data);
    }
});

var listener = new com.example.EventListener(eventCallback);

In this case we create an instance of com.example.EventCallback and provide its implementation in JavaScript. Now suppose that all these three JavaScript objects become unreachable and V8 is ready to GC them. Just because all of these objects are unreachable in JavaScript it does not mean that their actual counterparts in Java are unreachable. It’s possible that listener and eventCallback objects are still reachable through a stack of a listener Java thread.

gcchain

Now comes the interesting detail. While in JavaScript eventCallback has a reference to outStream through the function onDataReceived there is no such reference in Java and it is legitimate for Java GC to collect outStream object. The next time when the callback object calls write method there won’t be a corresponding Java object and the application will fail.

There are several solutions to this problem. One of them is to maintain the reachability in Java GC heap graph in sync with the one in JavaScript. After all, if there is an edge connecting eventCallback and outStream Java GC won’t try to collect the latter.

There are two options:

  • sync Java heap graph automatically
  • sync Java heap graph manually

As usual there is a trade-off. While the first option is very desirable there is a price to pay. We should analyze every closure in V8 that is GC’ed and traverse all objects reachable from there. This could slow down the GC by orders of magnitude.

The second option also has drawbacks. In general, JavaScript developers are not used to manual memory management. Introducing new memory management API could cause a lot of discomfort to the less experienced JavaScript developers.

scope(eventCallback, outStream);

Event if we make the API nice and simple, there is a burden of the mental model that JavaScript developers have to maintain. I tend to prefer this option though because many C/C++ developers proved it is possible to build high quality software using manual memory management.

In closing I would say that there are other solutions to this problem. I’ll discuss them in another blog post.

Java and V8 Interoperability

The project I currently work on involves a lot of Java/JavaScript (V8 JavaScript engine) interoperability. Fortunately, Java provides JNI and V8 has a nice C++ API which make the integration process very smooth. Most of the Java-JNI-V8 type marshaling is quite straightforward but there is one exception.

The JNI uses modified UTF-8 strings to represent various string types. Modified UTF-8 strings are the same as those used by the Java VM. Modified UTF-8 strings are encoded so that character sequences that contain only non-null ASCII characters can be represented using only one byte per character, but all Unicode characters can be represented.

I was well aware of this fact since the beginning of the project but somehow I neglected it. Until recently, when one of my colleagues showed me a peculiar bug that turned out to be related to the process of marshaling a non-trivial Unicode string.

At first, I tried a few quick and dirty workarounds just to prove that the root of problem is more complex it seemed. Then I realized that jstring type is not the best type when it comes to string interoperability with V8 engine. I decided to use jbyteArray type instead of jstring though I had some concerns about the performance overhead.

private static native void doSomething(byte[] strData);

String s = "some string";
byte[] strData = s.getBytes("UTF-8");
doSomething(strData);

The code doesn’t look ugly though the string version looks better. I did microbenchmarks and it turned out the performance is good enough for my purposes. Nevertheless, I decided to compare the performance with Nashorn JavaScript engine. As expected, Nashorn implementation was faster because it uses the same internal string format as the JVM.

On Agile Practices

I have recently read the article What Agile Teams Think of Agile Principles from Laurie Williams and it got me thinking. The study conclusion is as follows:

The authors of the Agile Manifesto and the original 12 principles spelled out the essence of the agile trend that has transformed the software industry over more than a dozen years. That is, they nailed it.

Here are the top 10 agile practices from the case study.

Agile practice Mean Standard Deviation
Continuous integration 4.5 0.8
Short iterations (30 days of less) 4.5 0.8
“Done” criteria 4.5 0.8
Automated tests run with each build 4.4 0.9
Automated unit testing 4.4 0.9
Iterations review/demos 4.3 0.8
“Potentially shippable” features at the end of each iteration 4.3 0.9
“Whole” multidisciplinary team with one goal 4.3 0.8
Synchronous communication 4.4 0.9
Embracing changing requirements 4.3 0.8

These are indeed practices instead of exact science and I am going to elaborate more on this topic. But first I would like to recap a few things from the history of the software industry.

Making successful software is hard. Many software projects failed in the past and many software projects are failing now. There are a lots of studies that confirm it. Some studies claim that more than 50% of all software projects fail. In order to improve the rate of successful projects we tried to adopt know-how from other industries. The software industry adopted metaphors like building software and software engineering. We started to apply waterfall methodologies and rigorous scientific methods for defining software requirements like UML. We tried many things in order to do better software but not much changed.

Then people came up with the idea of agile software methodology. Agile manifesto states:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

Agile methodology proposes different mindset. We started put emphasis on things like creativity and self-organizing teams more than engineering. Nowadays, we use metaphors like writing software much more often than 15 years ago. Some people go further by comparing programmers with writers and consequently in order to do good software we need good writers instead of good engineers. While I find such claims a bit controversial they are many people who share similar opinions. In general, today we talk about software craftsmanship instead of software engineering.

These two approaches are not mutually exclusive. I see good trends of merging both of them whenever it is reasonable. It is natural for people to select the best from both worlds. Still, agile methodology is considered young. Most of the software companies still publish their job offerings as “Software Engineer Wanted” instead of “Software Craftsman Wanted“. This is only one example of what we have inherited in IT industry. It does not matter how much an IT company boasts how agile it is, the fact is that we need time to fully adopt the new mindset. The good thing is that the new mindset focuses on the individual and I think this is the key for better software.

ECMAScript 262 And Browser Compatibility

I was curious to check how well the current browsers implement ECMAScript 262 specification. At the time of writing I have the following browsers installed on my machine:

  • Google Chrome (34.0.1847.116 m)
  • Microsoft IE (11.0.9600.17031 / 11.0.7)
  • Mozilla Firefox (28.0)

I guess Chrome, IE and Firefox present at least 90% of all desktop browsers. I tested them against the test suite provided by ECMA (http://test262.ecmascript.org) and it turned out none of them passed all the tests. Here is the list of failing tests for each browser.

I know that some of the smartest guys in IT work on these projects for many years. So, the next time I hear that JavaScript is a simple scripting language I won’t consider it seriously.

JustMock Lite is now open source

A few weeks ago something important happened. I am very happy to say that JustMock Lite is now open source. You can find it on GitHub.

In this post I would like to share some bits of history. JustMock was my first project at Telerik. It was created by a team of two. The managed API was designed and implemented by my friend and colleague Mehfuz and I implemented the unmanaged CLR profiler. The project was done in a very short time. I spent 6 weeks to implement all the functionality for so called elevated mocking. This includes mocking static, non-virtual methods and sealed types. After a few iterations JustMock was released early in April 2010.

I remember my very first day at Telerik. I had a meeting with Hristo Kosev and together we set the project goals. It turned out JustMock was just an appetizer for JustTrace. Back then we did not have much experience with the CLR unmanaged profiling API and Hristo wanted to extend Telerik product family with a performance and memory profiling tool. So, the plans were to start with JustMock and gain know-how before we build JustTrace. Step by step, we extended the team and JustMock/JustTrace team was created. Here is the door sign that the team used to have.

jmjt

Later the team changed its name to MATTeam (mocking and tracing team).

Looking back, I think we built two really good products. As far as I know, at the time of writing this post JustMock is still the only tool that can mock of the most types from mscorlib.dll assembly. JustTrace also has its merits. It was the first .NET profiler with support for profiling managed Windows Store apps. I left MATTeam an year ago and I hope soon I can tell you about what I work on. Stay tuned.

Native code profiling with JustTrace

The latest JustTrace version (Q1 2014) has some neat features. It is now possible to profile unmanaged applications with JustTrace. In this post I am going to show you how easy it is to profile native applications with JustTrace.

For the sake of simplicity I am going to profile notepad.exe editor as it is available on every Windows machine. First, we need to setup the symbol path folder so that JustTrace can decode correctly the native call stacks. This folder is the place where all required *.pdb files should be.

jtsettings

In most scenarios, we want to profile the code we wrote from within Visual Studio. If your build generates *.pdb files then it is not required to setup the symbols folder. However, in order to analyze the call stacks collected from notepad.exe we must download the debug symbols from Microsoft Symbol Server. The easiest way to obtain the debug symbol files is to use symchk.exe which comes with Microsoft Debugging Tools for Windows. Here is how we can download notepad.pdb file.

symchk.exe c:\Windows\System32\notepad.exe /s SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

[Note that in order to decode full call stacks you may need to download *.pdb files for other dynamic libraries such as user32.dll and kernelbase.dll for example. With symchk.exe you can download debug symbol files for more than one module at once. For more details you can check Using SymChk page.]

Now we are ready to profile notepad.exe editor. Navigate to New Profiling Session->Native Executable menu, enter the path to notepad.exe and click Run button. Once notepad.exe is started, open some large file and use the timeline UI control to select the time interval of interest.

jtnative

In closing, I would say that JustTrace has become a versatile profiling tool which is not constrained to the .NET world anymore. There are plenty of unmanaged applications written in C or C++ and JustTrace can help to improve their performance. You should give it a try.

Notes on Asynchronous I/O in .NET

Yesterday I worked on a pet project and I needed to read some large files in an asynchronous manner. The last time I had to solve similar problem was in the times of .NET v2.0 so I was familiar with FileStream constructors that have bool isAsync parameter and BeginRead/EndRead methods. This time, however, I decided to use the newer Task based API.

After some time working I noticed that there was a lot of repetition and my code was quite verbose. I googled for an asynchronous I/O library and I picked some popular one. Indeed the library hid the unwanted verbosity and the code became nice and tidy. After I finished the feature I was working on, I decided to run some performance tests. Oops, the performance was not good. It seemed like the bottleneck was in the file I/O. I started JustDecompile and quickly found out that the library was using FileStream.ReadAsync method. So far, so good.

Without much thinking I ran my app under WinDbg and set breakpoint at kernel32!ReadFile function. Once the breakpoint was hit I examined the stack:

0:007> ddp esp
0577f074  720fcf8b c6d04d8b
0577f078  000001fc
0577f07c  03e85328 05040302
0577f080  00100000
0577f084  0577f0f8 00000000
0577f088  00000000

Hmm, a few wrong things here. The breakpoint is hit on thread #7 and the OVERLAPPED argument is NULL. It seems like ReadAsync is executed in a new thread and the read operation is synchronous. After some poking with JustDecompile I found the reason. The FileStream object was created via FileStream(string path, FileMode mode) constructor which sets useAsync to false.

I created a small isolated project to test further ReadAsync behavior. I used a constructor that explicitly sets useAsync to true. I set the breakpoint and examined the stack:

0:000> ddp esp
00ffed54  726c0e24 c6d44d8b
00ffed58  000001f4
00ffed5c  03da5328 84838281
00ffed60  00100000
00ffed64  00000000
00ffed68  02e01e34 00000000
00ffed6c  e1648b9e

This time the read operation is started on the main thread and an OVERLAPPED argument is passed to the ReadFile function.

0:000> dd 02e01e34 
02e01e34  00000000 00000000 04c912f4 00000000
02e01e44  00000000 00000000 72158e40 02da30fc
02e01e54  02da318c 00000000 00000000 00000000
0:000> ? 04c912f4 
Evaluate expression: 80286452 = 04c912f4

A double check with SysInternals’ Process Monitor confirms it.

readmonitor

I emailed the author of the library and he was kind enough to response immediately. At first, he pointed me to the following MSDN page that demonstrates “correct” FileStream usage but after a short discussion he realized the unexpected behavior.

badasync

I don’t think this is a correct pattern and I quickly found at least two other MSDN resources that use explicit useAsync argument for the FileStream constructor:

In closing, I would say that simply using ReadAsync API doesn’t guarantee that the actual read operation would be executed in an asynchronous manner. You should be careful which FileStream constructor you use. Otherwise you could end up with a new thread that executes the I/O operation synchronously.

501 Must-Write Programs

In the spirit of 501 Must-Visit… book series I decided to write this post. My ambitions are much smaller so don’t expect a comprehensive “guide” to computer programming. Nevertheless, I just think it would be useful to show you short programs that demonstrate important computer ideas and techniques.

Fortunately, as software developers, we don’t need to know 501 things in order to accomplish our daily tasks. So my intention is write a list of programs with brief explanation for each one. I will extend the list whenever I find programs good enough to represent important computer techniques. At first sight, the programs in the list may seem random or unusual but they all demonstrate important aspects of commonly used concepts in computer programming. Here goes the first ten in no particular order:

  1. Nth Fibonacci number  – I know, I know… it is too simple. However, I think we often can learn from simple things. Let’s see what there is for us to learn. Fibonacci numbers are one of the canonical examples for recursion. They can be used to explain time complexity in a very easy manner as well. It’s also good to mention Binet’s formula and memoization technique.
  2. Conway’s Game of Life – I include this program in the list just because it is beautiful. It is an excellent example of how simple rules can lead to complex interaction. Cellular automation is used in computability theory, mathematics, physics, complexity science and theoretical biology just to mention a few. To the curious readers, I recommend to read about Wolfram code as well.
  3. Quine (self-replicating program) – It is really fun to write a quine in your favorite (Turing complete) programming language. While writing a quine program is fun and good for your creativity, it is worthy to note its deep connection with mathematical logic and fixed point in mathematics.
  4. Currency converter – While this program may not seem fun or a programming challenge, it’s a handy tool and it will make you familiar with units of measure concept which is important in numeric calculation.
  5. String searching – I guess this is one of most common tasks in our work. Every now and then we have to search for a given substring. While the naive implementation is easy to write, you will quickly find out that it is impractical for large strings due its time and space complexity. Then you will enter the wonderful world of Knuth–Morris–Pratt and Rabin–Karp string search algorithms. There are tons of scientific literature on the topic and most of it comes from bioinformatics and computational mathematics.
  6. Huffman coding – It is, by far, the easiest introduction into the information theory and lossless data compression. While nowadays there are much better data compression algorithms, the idea of Shannon entropy is still used.
  7. Line counting – Well, you may find this is too lame. Counting the lines of a text file is not a big deal until the file is small and can fit into the main memory. Then you quickly come with the idea of data buffer. While buffering is not a rocket science it is one of most practical and useful things in most code bases.
  8. Calculate definite integral – Yes, you read it right. At first you may wonder how calculus is related to computer programming. I chose this relatively simple program because it demonstrates important ideas used in numerical analysis such as approximation and sampling. Both ideas are very important in computer programming because everything in computer science is discrete in a broader sense. If your programming languages allows passing functions as parameters then you can write this program in functional programming style.
  9. Graph traversal – Graphs are one of the most commonly used data structures. While both depth-first search and breadth-first search algorithms are easy to understand, together they represent a duality which is common to find in the computer science.
  10. Tic-tac-toe game – Writing this game is one of the easiest introduction to game theory. The simplicity of the game allows you to comprehend hard ideas like NP-complete and NP-hard problems, backtracking algorithms, alpha-beta pruning and minmax rule.