Java and V8 Interoperability – Never Ending Journey

The project I currently work on involves a lot of Java/JavaScript (V8 JavaScript engine) interoperability. Fortunately, Java provides JNI and V8 has a nice C++ API which make the integration process very smooth. Most of the Java-JNI-V8 type marshaling is quite straightforward but there is one exception.

The JNI uses modified UTF-8 strings to represent various string types. Modified UTF-8 strings are the same as those used by the Java VM. Modified UTF-8 strings are encoded so that character sequences that contain only non-null ASCII characters can be represented using only one byte per character, but all Unicode characters can be represented.

I was well aware of this fact since the beginning of the project but somehow I neglected it. Until recently, when one of my colleagues showed me a peculiar bug that turned out to be related to the process of marshaling a non-trivial Unicode string.

At first, I tried a few quick and dirty workarounds just to prove that the root of problem is more complex it seemed. Then I realized that jstring type is not the best type when it comes to string interoperability with V8 engine. I decided to use jbyteArray type instead of jstring though I had some concerns about the performance overhead.

private static native void doSomething(byte[] strData);

String s = "some string";
byte[] strData = s.getBytes("UTF-8");
doSomething(strData);

The code doesn’t look ugly though the string version looks better. I did microbenchmarks and it turned out the performance is good enough for my purposes. Nevertheless, I decided to compare the performance with Nashorn JavaScript engine. As expected, Nashorn implementation was faster because it uses the same internal string format as the JVM.