August 2013 – Never Ending Journey

Distributed System Validation

Last week I read an article about Netflix Chaos Monkey and it provoked some thoughts. I’ve read about Chaos Monkey before but back then I didn’t realize some aspects of it. I guess the difference is that I recently started working on a distributed system project and an important aspect of my work is the validation of what we are doing.

So, how do we validate a distributed system? There is no need to reinvent the wheel so I googled around. That’s how I ended up reading Netflix article again. The idea is simple – terminate running “instances” and then watch how your distributed system changes. There are more details to it but the essence is that Chaos Monkey stops some of your instances/services (and makes them unavailable) so you can inspect how well your distributed system works in such scenarios. Please note that this approach works so well that Chaos Monkey runs on both test and production systems and it is considered as a key component for any major distributed system.

Let’s dig in further. Distributed systems, in fact all large projects, are built from many moving parts. Often, it is practically impossible to anticipate every possible problem that may occur in your system. Instead of waiting for problems you can happen them in a controlled manner and get feedback. This makes your system more resilient.

This gets me thinking about the way we build large projects and distributed systems in particular. Nowadays agile methodologies are widely accepted and we build software in small iterations. The continuous project evolution makes it hard doing correct reasoning and it is practically impossible to validate ever-changing system. Often, we build distributed system much like the way we play with Lego. We pick this web server, that database server, that network load balancer and so on, glue them together and add our services on top of it. During this process we rarely consider each component specifics and limitations. This (development) process leads to heisenbugs and other hard to reproduce issues.

Solutions like Chaos Monkey make it easier as they provide a process that helps the validation of your systems. Incorporating such process/methodology into your software development process gives you better monitoring and better guarantees for more resilient systems.

Profiing Data Visualization

Every .NET performance profiling tool offers some form of data visualization. Usually, the profiling data is shown in a hierarchical representation such as calling context tree (CCT) or calling context ring chart (CCRC). In this post I would like to provide a short description of the most commonly used profiling data visualizations.

In general, CCT is well understood. Software developers find CCT easy to work with as it represents the program workflow. For example, if method A() calls method B() which in turn calls method C() then the CCT will represent this program workflow as follows:

A() -> B() -> C()

CCT data contains the time that is spent inside each method (not shown here for the sake of simplicity). Here is a short list of some .NET profilers that use CCT/CCRC to visualize data:

While CCT is useful and easy to understand data visualization it has limitations. Often we create big applications with complex program workflows. For such big applications the CCT navigation becomes harder. Often the CCT size becomes overwhelming and the developers cannot grasp the data. To understand big CCT the profiling tools offer some form of aggregation. The most common aggregation is so called Hot Spot tree (HST). Sometimes it is called caller context tree but for the purpose of this post we will use former name. Here is the HST for our previous example:

C() <- B() <- A()

We said that HST is a form of aggregation but we didn’t explain what and how we aggregate. HST aggregates CCT nodes by summing the time spent inside a method for each unique call path. Let’s make it more concrete with a simple example. Suppose we have an application with the following program workflow (CCT):

A() -> B() -> C(/* 4s */)
           |
           |--> D() -> C(/* 6s */)

The time spent in method C() is 4 seconds when it is called from B() and the time spent in method C() is 6 seconds when it is called from D(). So, the total time spent in method C() is 10 seconds. We can build HST for method C() by aggregating the time for each unique call path.

C(/* 10s */) <- B(/* 4s */) <- A(/* 4s */)
              |
              |-- D(/* 6s */) <- B(/* 6s */) <- A(/* 6s */)

HST shows how the time spent in method C() is distributed for each unique call path. If we build HST for every method it becomes obvious why HST is so useful data visualization. Instead of focusing on the whole CCT which may contain millions of nodes we can focus on the top, say, 10 most expensive HSTs as they show us the top 10 most time expensive methods. In fact, I find HST so useful that I can argue that showing CCT is not needed at all when solving difficult performance issues.

I would like to address the last sentence as it is related to the DIKW pyramid. While CCT is useful profiling data visualization, it is mostly about data. Data is just numbers/symbols. It cannot answer “who”, “what”, “where” and “when” questions. Processing CCT into HSTs transforms data into information. HSTs can answer where time is spent inside an application. I am not going to address all the theoretical details here but I would like to dig some details about performance profiling further.

We saw why HSTs are useful but sometimes we want to know more. For example, is our application CPU or I/O bound? Or maybe we are interested in the application dynamics (e.g. when it is CPU bound and when it is I/O bound). Component interaction is also an important question for many of us. The software vendors of profiling tools recognize these needs and try to build better products. For example Microsoft provides Tier Interaction Profiling, Telerik JustTrace provides Namespace grouping, JetBrains dotTrace provides Subsystems, SpeedTrace offers Layer Breakdown and so on. While all these visualizations are useful, sometimes a simple diagram works even better.

The point is that there is no silver bullet. A single profiling data visualization cannot answer every question. I think ideas like Profiling Query Language (PQL) have a lot of potential. It doesn’t matter if there will be PQL or LINQ to some well established domain model (e.g. LINQ to Profiling Data). The language is only a detail. The important thing is that the collected data should be queryable. Once the data is queryable the developer can do the proper queries. Of course, each profiling tool can be shipped with a set of predefined common queries. I hope we will see PQL in action very soon 😉

CLR Limitations

Yesterday I ran one of my apps onto VirtualBox VM and it suddenly crashed with OutOfMemoryException. There is nothing special about my app, it allocates one large array of System.UInt64 and does some calculations. So, it seems that my problem was related to the array size.

Array size limitations

Here are a few facts about my app:

it allocates a single ~6GB array and it does memory intensive calculation
it uses the new .NET 4.5 configuration gcAllowVeryLargeObjects setting
it is compiled with “Any CPU” platform target (“Prefer 32-bit” option is not set)

My first thought was that the guest OS does not support 64-bit programs and I should compile my app with “x64” platform target support to make this requirement explicit. It turned out that this is not the case and the guest OS is Windows 7 Ultimate x64 edition. This is where I got confused. I decided to run my app onto the host OS and it ran as expected.

Let’s recap it. My host OS is Windows 7 Ultimate x64 Edition (same as the guest OS) and my app works. On the guest OS my app crashes The .NET version is 4.0.30319.18051 for both host OS and guest OS. The only difference is that the host OS has 16GB physical memory while the guest OS has 4GB. However, my understanding is that the amount of physical memory should not cause OutOfMemoryException.

The first thing I did was reading MSDN documentation one more time. There isn’t much related to my issue. The only relevant part is the following:

Using this element in your application configuration file enables arrays that are larger than 2 GB in size, but does not change other limits on object size or array size:

The maximum number of elements in an array is UInt32.MaxValue.
The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.
The maximum size for strings and other non-array objects is unchanged.

I decided to create a small repro app that isolates the problem:

static void Main(string[] args)
{
    var arr = new object[0X7FEFFFFF];
    Console.WriteLine(arr.LongLength);
    Console.ReadLine();
}

(I also modified app.config file accordingly)

<configuration>
  <runtime>
    <gcAllowVeryLargeObjects enabled="true" />
  </runtime>
</configuration>

When I run this program on the host OS it works as expected. When I run the same binary onto the VM I get OutOfMemoryException. I googled and found the following comment on stackoverflow since Sep 7 2010. This pretty much confirms my understanding stated above. Still, the reality is that this simple app crashes on the guest OS. Clearly, there is an undocumented (please correct me if I am wrong) CLR limitation.

As I said before, the only difference between the host OS and the guest OS is the amount of the physical memory. So, I decided to increase the guest OS memory. I had no clear idea what I am doing and I set the memory to 5000MB (just some number larger then 4GB). This time, my app worked as expected.

So, it seems that the physical memory is an important factor. I still don’t understand it and if you know why this happens please drop a comment. I guess the CLR team has good reason for that 4GB threshold but it would be nice if this is properly documented.

Object size limitations

Once I figured out that the physical memory can also limit the array size, I became curious what are the CLR limitations for regular objects. I quickly managed to find out that the maximum object size in my version of .NET is 128MB.

class ClassA
{
    public StructA a;
}

unsafe struct StructA
{
    public fixed byte data[128 * 1024 * 1024 - 8];
}

I can instantiate objects from ClassA without problems. However when I try add one more field (e.g. byte or bool) to ClassA or StructA definition I get the following error:

System.TypeLoadException was unhandled
Message: Size of field of type 'ClassA' from assembly
'Test, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'
is too large.

So, it seems that for my particular .NET version the maximum object size is 128MB. What about if we try to instantiate the following array:

var arr = new StructA[1];

In this case I get the following error:

System.TypeLoadException was unhandled
Message: Array of type 'StructA' from assembly
'Test, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' 
cannot be created because base value type is too large.

It turns out arrays have yet another limitation. In order to instantiate an array of StructA I have to modify StructA definition to the following:

unsafe struct StructA
{
    public fixed byte data[64 * 1024 - 4];
}

It seems that the maximum array base element size is limited to 64KB.

Conclusion

In closing, I would say it is always useful to know the CLR limitations. Sometimes they are manifested in an unexpected way but in the most common scenarios it is unlikely that you hit some of them.

OWIN and NGINX

Since a few months I work with non-Microsoft technologies and operating systems. I work with Linux, Puppet, Docker (lightweight Linux containers), Apache, Nginx, Node.js and other. So far, it is fun and I’ve learned a lot. This week I saw a lot of news and buzz around OWIN and Katana project. It seems that OWIN is a hot topic and I decided to give it a try. In this post I will show you how to build OWIN implementation and use Nginx server.

Note: this is proof of concept rather than a production-ready code.

OWIN is a specification that defines a standard interface between .NET web servers and web applications. Its goal is to provide a simple and decoupled way how web frameworks and web servers interact. As the specification states, there is no assembly called OWIN.dll or similar. It is just a way how you can build web applications without dependency on particular web server. Concrete implementations can provide OWIN.dll assembly though.

This is in a contrast with the traditional ASP.NET applications that have a dependency on System.Web.dll assembly. If implemented correctly OWIN eliminates such dependencies. The benefits are that your web application becomes more portable, flexible and lightweight.

Let’s start with the implementation. I modeled my OWIN implementation after the one provided by Microsoft.

public interface IAppBuilder
{
    IDictionary<string, object> Properties { get; }
    object Build(Type returnType);
    IAppBuilder New();
    IAppBuilder Use(object middleware, params object[] args);
}

For the purpose of this post we will implement the Properties property and the Use method. Let’s define our AppFunc application delegate as follows:

delegate Task AppFunc(IDictionary<string, object> environment);

The examples from Katana project provide the following code template for the main function:

static void Main(string[] args)
{
    using (WebApplication.Start<Startup>("http://localhost:5000/"))
    {
        Console.WriteLine("Started");
        Console.ReadKey();
        Console.WriteLine("Stopping");
    }
}

I like it very much so I decided to provide WebApplication class with a single Start method:

public static class WebApplication
{
    public static IDisposable Start(string url)
    {
        return new WebServer(typeof(TStartup));
    }
}

We will provide WebServer implementation later. Let’s see what the implementation of Startup class is:

public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        var myModule = new LightweightModule();
        app.Use(myModule);
    }
}

Let’s summarize it. We have a console application and in the main method it calls Start method and passes two parameters: Startup type and a URL. Start method will start a web server that listens for requests on the specified URL and the server will use Startup class to configure the web application. We don’t have any dependency on System.Web.dll assembly. We have a nice and simple decoupling of the web server and the web application.

So far, so good. Let’s see how the web server configures the web application. In our OWIN implementation we will use reflection to reflect TStartup type and try find Configuration method using naming convention and predefined method signature. The Configuration method instantiates LightweightModule object and passes it to the web server. The web server will inspect the object for its type and will try to find Invoke method compatible with the AppFunc signature. Once Invoke method is found it will be called for every web request. Here is the actual Use method implementation:

public IAppBuilder Use(object middleware, params object[] args)
{
    var type = middleware.GetType();
    var flags = BindingFlags.Instance | BindingFlags.Public;
    var methods = type.GetMethods(flags);

    // TODO: call method "void Initialize(AppFunc next, ...)" with "args"

    var q = from m in methods
            where m.Name == "Invoke"
            let p = m.GetParameters()
            where (p.Length == 1)
                   && (p[0].ParameterType == typeof(IDictionary<string, object>))
                   && (m.ReturnType == typeof(Task))
            select m;

    var candidate = q.FirstOrDefault();

    if (candidate != null)
    {
        var appFunc = Delegate.CreateDelegate(typeof(AppFunc), middleware, candidate) as AppFunc;
        this.registeredMiddlewareObjects.Add(appFunc);
    }

    return this;
}

Finally we come to WebServer implementation. This is where Nginx comes. For the purpose of this post we will assume that Nginx server is started and configured. You can easily extend this code to start Nginx via System.Diagnostics.Process class. I built and tested this example with Nginx version 1.4.2. Let’s see how we have to configure Nginx server. Open nginx.conf file and find the following settings:

    server {
        listen       80;
        server_name  localhost;

and change the port to 5000 (this is the port we use in the example). A few lines below you should see the following settings:

        location / {
            root   html;
            index  index.html index.htm;
        }

You should modify it as follows:

        location / {
            root   html;
            index  index.html index.htm;
            fastcgi_index Default.aspx;
            fastcgi_pass 127.0.0.1:9000;
            include fastcgi_params;
        }

That’s all. In short, we configured Nginx to listen on port 5000 and configured fastcgi settings. With these settings Nginx will pass every request to a FastCGI server at 127.0.0.1:9000 using FastCGI protocol. FastCGI is a protocol for interfacing programs with a web server.

So, now we need a FastCGI server. Implementing FastCGI server is not hard but for the sake of this post we will use SharpCGI implementation. We are going to use SharpCGI library in WebServer implementation. First, we have to start listening on port 9000:

private void Start()
{
    var config = new Options();
    config.Bind = BindMode.CreateSocket;
    var addr = IPAddress.Parse("127.0.0.1");
    config.EndPoint = new IPEndPoint(addr, 9000);
    config.OnError = Console.WriteLine;
    Server.Start(this.HandleRequest, config);
}

The code is straightforward and the only piece we haven’t look at is HandleRequest method. This is where web requests are processed:

private void HandleRequest(Request req, Response res)
{
    var outputBuff = new byte[1000];

    // TODO: use middleware chaining instead a loop

    foreach (var appFunc in this.appBuilder.RegisteredMiddlewareObjects)
    {
        using (var ms = new MemoryStream(outputBuff))
        {
            this.appBuilder.Properties["owin.RequestPath"] = req.ScriptName.Value;
            this.appBuilder.Properties["owin.RequestQueryString"] = req.QueryString.Value;

            this.appBuilder.Properties["owin.ResponseBody"] = ms;
            this.appBuilder.Properties["owin.ResponseStatusCode"] = 0;

            var task = appFunc(this.appBuilder.Properties);

            // TODO: don't task.Wait() and use res.AsyncPut(outputBuff);

            task.Wait();
            res.Put(outputBuff);
        }
    }
}

This was the last piece from our OWIN implementaion. This is where we call the web application specific method via AppFunc delegate.

In closing I think OWIN helps the developers to build better web applications. Please note that my implementation is neither complete neither production-ready. There is a lot of room for improvement. You can find the source code here:

OWINDemo.zip