Software Lifespan

Last weekend a friend of mine told me an interesting story. Some software on a build server had stopped working unexpectedly. It turned out the software in question expired after 10 years! Yep, you read it right. On top of this as it turned out the company that made the software does not exist anymore. It took 5 days for the company my friend work for to find a contact of the author of the software. Fortunately the solution was easy – they had to uninstall the software, delete a “secret” key in the registry and then install the software again.

This story reveals some interesting things:
1) a software written in 2002 is still in use today
2) the way we do differentiate between a trial and a non-trial software

We all know the mantra that we do not have to make assumption how long our software will be in use. Out of curiosity I asked my friend how it happens to have a build machine that is 10 years old. It turned out it was a Windows XP machine that was migrated to a virtual server many years ago. Interesting, don’t you think? The hardware lifespan is much shorter than the software lifespan.

For comparison, Windows XP was released in 2001 but it doesn’t expire as the software in question. I can easily imagine how the particular expiration check is implemented. During the installation it takes the current date and if a valid serial key is entered it adds 10 years, otherwise it adds, say, 14 days. The interesting thing is the idea that the software won’t be used after 10 years. Back then, I would probably implement that expiration check in a similar way. In the best case I would probably add 20 instead of 10 years. I remember those days very well; all the dynamics and expectations after the dot-com bubble. My point is that for the small software vendors a 10 years period was equal to eternity. For companies like Microsoft a 10 years period is just another good step (at the time of writing different sources estimate that Windows XP has between 22% and 35% of the OS market share).

Why do we need profiling tools?

Every project is defined by its requirements. The requirements can be functional and non-functional. Most of the time the developers are focused on functional requirements only. This is how it is now and probably it won’t change much in the near future. We may say that the developers are obsessed by the functional requirements. In the matter of fact, a few decades earlier the software engineers thought that the future IDE will look something like this:

This is quite different from nowadays Visual Studio or Eclipse. The reason is not that it is technically impossible. On contrary, it is technically possible. This is one of the reasons for the great enthusiasm of the software engineers back then. The reason this didn’t happen is simple. People are not that good at making specifications. Today no one expects to build large software by a single, huge, monolithic specification. Instead we practice iterative processes, each time implementing a small part of the specification. The specifications evolve during the project and so on.

While we still struggle with the tools for functional requirements verification we made a big progress with the tools for non-functional requirements verification. Usually we call such tools profilers. Today we have a lot of useful profiling tools. These tools analyze the software for performance and memory issues. Using a profiler as a part from the development process can be a big cost-saver. Profilers liberate software engineers from the boring task to watch for performance issues all the time. Instead the developers can stay focused on the implementation and verification of the functional requirements.

Take for example the following code:

FileStream fs = …
using (var reader = new StreamReader(fs, ...))
{
    while (reader.BaseStream.Position < reader.BaseStream.Length)
    {
       string line = reader.ReadLine();

       // process the line
    }
}

This is a straightforward and simple piece of code. It has an explicit intention – process a text file reading one line at a time. The performance issue here is that the Length property calls the native function GetFileSize(…) and this is an expensive operation. So, if you are going to read a file with 1,000,000 lines then GetFileSize(…) will be called 1,000,000 times.

Let’s have a look at another piece of code. This time the following code has quite different runtime behavior.

string[] lines = …
int i = 0;
while (i < lines.Length)
{
   ...
}

In both examples the pattern is the same. And this is exactly what we want. We want to use a well-known and predictive patterns.

Take a look at the following 2 minutes video to see how easy it is to spot and fix such issues (you can find the sample solution at the end of the post).

After all, this is why we want to use such tools – they work for us. It is much easier to fix performance and memory issues in the implementation/construction phase rather than in the test/validation phase of the project iteration.

Being proactive and using profiling tools during the whole ALM will help you to build better products and have happy customers.

WindowsFormsApplication1.zip

Enumerate managed processes

If you ever needed to enumerate managed (.NET) processes you probably found that this is a difficult task. There is no single API that is robust and guarantees correct result. Let’s see what the available options are:

You can take a look at the following examples:

Unfortunately, each one of these approaches has a hidden trap. You cannot enumerate 64bit processes from a 32bit process and you cannot enumerate 32bit processes from a 64bit process. After all, it makes sense because most of these techniques rely on reading another process memory.

This is a well-known issue. Microsoft has provided a tool called clrver.exe that suffers from the same problem. The following screen shots demonstrate it.

As we can see the 32bit clrver.exe can enumerate 32bit processes only. If you try it on 64bit process you get the error “Failed getting running runtimes, error code 8007012b”. The same is valid for the 64bit scenario. The 64bit clrver.exe can enumerate 64bit processes only. If you try it on 32bit process you get “PID XXXX doesn’t have a runtime loaded”.

A lot of people tried to solve this issue. The most common solution is to spawn a new process on 64bit Windows. If your process is 32bit then you have to spawn a 64bit process and vice versa. However, this is a really ugly “solution”. You have to deal with IPC and other things.

However, there is a Microsoft tool that can enumerate both 32bit and 64bit managed processes. This is, of course, Visual Studio. In the Attach to Process dialog you can see the column “Type” that shows what CLR versions are loaded if any.

I don’t know how Visual Studio does the trick so I implemented another solution. I am not very happy with the approach that I used, but it seems stable and fast. I also added support for Silverlight (see process 7232).

You can find a sample source code at the end of the posting.

EnumManagedProcesses.zip

Explaining Design Patterns

From time to time I have to explain design patterns to junior developers. There are many excellent books and web sites on this topic that I recommend. However, it turns out that often the developers cannot relate a particular design pattern to a real world scenarios. In such cases I try to give an example implementation in the  .NET Framework. I find this article very helpful. Hope you find it helpful too.

Ramblings on code refactoring

As a Telerik employee I use Visual Studio and JustCode (JustCode is a Visual Studio productivity tool which provides code analysis, navigation, refactoring and other goodies). I often refactor my code with the help of JustCode. However over the years I had become more careful and in a way more reluctant to code refactoring. I will try to write down my thoughts and explain what I mean.

I don’t buy entirely the idea that we are mere mortals and we will never be able to write “perfect” code. I think that most of the developers do a good job and they strive to write the best code. During the project they get more domain knowledge and it is expected that they want to improve the existing code. So they do refactor their code. Especially if there are good unit test suites to prevent regressions. Sometimes there is no need for refactoring – the code is good enough. Yes, this also happens 🙂 Sometimes we write bad code and we refactor it.

I also don’t buy the idea that the developers do code refactoring only for the sake of it. Junior developers may be tempted by powerful tools like JustCode that allow very easy code refactoring, but there are rare cases. Most of them are reasonable guys and they do code refactoring only when there is a need for.

Still my observations are that immediately after code refactoring the number of bug increases. You can easily check this by observing projects on sourceforge.net, codeplex.com and so on.

I think the reason for this lies in the lack of conveying the change in the mental model together with the code refactoring. In a team of 10 or more software developers each one updates its mental model spontaneously and it hard to keep all the developers in sync. Maybe that’s why code refactoring works more smoothly in small teams. In the small teams the communication is much simpler and it is easier to have a shared mental model.

That’s why every time before I commit a code refactoring in the source code repository I think how I am going to communicate the change to the other team members.

Garbage collection – part 1 of N

Recently I deal a lot with memory problems like leaks, stack/heap corruption, heap fragmentation, buffer overflow and the like. Surprisingly these things happen in the .NET world, especially when one deals with COM/PInvoke interoperability.

The CLR comes with a garbage collector (GC) which is a great thing. The GC has been around for many years and we accepted it as something granted and rarely think about it. This is a twofold thing. On one hand this is a prove that the GC does excellent job in most of the time. On the other hand the GC could become a big issue when you want to get the maximum possible performance.

I think it would be nice to explain some of the GC details. I hope this series of posts could help you build more GC friendly apps. Let’s start.

GC

I assume you know what GC is, so I won’t going to explain it. There are a lot of great materials on internet on this topic. I am going to state and the single and the most important thing: GC provides automatic dynamic memory management. As a consequence GC prevents the problems that were (and still are!) common in native/unmanaged applications:

  • dangling pointers
  • memory leaks
  • double free

During the years, the improper use of dynamic memory allocation became a big problem. Nowadays many of modern languages rely on GC. Here is a short list:

ActionScript Lua
AppleScript Objective-C
C# Perl
D PHP
Eiffel Python
F# Ruby
Go Scala
Haskell Smalltalk
Java VB.NET
JavaScript Visual Basic

I guess more than 75% of all developers are programming in these languages. It also important to say that there are attempts to introduce a basic form of “automatic” memory management in C++ as well. Although auto_ptr, shared_ptr, unique_ptr have limitations they are a step in the right direction.

You probably heard that GC is slow. I think there are two aspects of that statement:

  • for the most common LOB applications GC is not slower than manual memory management
  • for real-time applications GC is indeed slower than well crafted manual memory management

However most of us are not working on real-time applications. Also not everyone is capable of writing high performance code, this is indeed hard to do. There are good news though. With the advent of the research in GC theory there are signs that GC will become even faster the current state-of-the-art manual memory management. I am pretty sure that in the future no one will pay for a real-time software with manual memory management; it will be too risky.

GC anatomy

Every GC is composed of the following two components:

  • mutator
  • collector

The mutator is responsible for the memory allocation. It is called so because it mutates the object graph during the program execution. For example in the following pseudo-code:

string firstname = "Chris";
Person person = new Person();
person.Firstname = name;

the mutator is responsible for allocating the memory on the heap and updating the object graph by updating the field Firstname to reference the object firstname (we say that firstname is reachable from person through the field Firstname). It is important to say that these reference fields may be part from objects on the heap (as in our scenario from person object) but also may be contained in other object known as roots. The roots may be thread stacks, static variables, GC handles and so on. As a result from the mutator’s work any object can become unreachable from the roots (we say that such object becomes a garbage). This is where the second component comes.

The collector is responsible for the garbage collection of all unreachable objects and reclaim of their memory.

Let’s have a look at the roots. They are called roots because they are accessible directly, that is they are accessible to the mutator without going thought other objects. We denote the set of all roots objects as Roots.

Now let’s look at the objects allocated on the heap. We can denote this set as Objects. Each object O can be distinguished by its address. For simplification let’s assume that object fields can be only references to other objects. In really most of the object fields are primitive types (like bool, char, int, etc.) but these fields are not important for the object graph connectivity. It doesn’t matter if an int field has value 5 or 10. So for now let’s assume that objects have reference fields only. Let’s denote with |O| the number of the reference fields for the object O and with &O[i] denote the address of the i-th field of O. We write the usual pointer dereference for ptr as *ptr.

This notation allows us to define the set Pointers for an object O as

Pointers(O) ={ a | a=&O[i], where 0<=i<|O| }

For convenience we define Pointers(Roots)=Roots.

To recap – we have defined the following important sets:

  • Objects
  • Roots
  • Pointers(O)

After we defined some of the most important sets, we are going to define the following operations:

  • New()
  • Read(O, i)
  • Write(O, i, value)

New() operation obtains a new heap object from the allocator component. It simply returns the address of the allocated object. The pseudo-code for New() is:

New():
     return allocate()

It is important to say that allocate() function allocates a continuous block of memory. The reality is a bit more complex. We have different object types (e.g. Person, string, etc.) and usually New() takes parameters for the object type and in some cases for its size. Also it could happen that there is not enough memory. We will revisit New() definition later. For simplification we can assume that we are going to allocate object of one type only.

Read(O, i) operation returns the reference stored at the i-th field of the object O. The pseudo-code for Read(O, i) is:

Read(O, i):
     return O[i]

Write(O, i, value) operation updates the reference stored at the i-th field of the object O. The pseudo-code for Write(O, i, value) is:

Write(O, i, value):
     O[i] = value

Sometimes we have to explicitly say that an operation or function is atomic. When we need so, we write atomic in front of the operation name.

Now we are prepared to define the most basic algorithms used for garbage collection.

Mark-sweep algorithm

Earlier I wrote that New() definition is oversimplified. Let’s revisit its definition:

New():
    ref = allocate()
    if ref == null
        collect()
        ref = allocate()
        if ref == null
              error "out of memory"

    return ref

atomic collect():
     markFromRoots()
     sweep(HeapStart, HeapEnd)

The updated New() definition is bit more robust. It first tries to allocate memory. If there is not big enough continuous memory block it will collect the garbage (if there is any). Then it will repeat to allocate memory again. It could fail or succeed. What is important for this function definition is that it reveals when GC will trigger. Again, the reality is more complex but in general GC will trigger when the program tries to allocate memory.

Let’s define the missing markFromRoots and sweep functions.

markFromRoots():
     worklist = empty
     foreach field in Roots
          ref = *field
          if ref != null && !isMarked(ref)
                 setMarked(ref)
                 enqueue(worklist, ref)
                 mark()

mark():
     while !isEmpty(worklist)
          ref = dequeue(worklist)
          foreach field in Pointers(ref)
                child = *field
                if child != null && !isMarked(child)
                      setMarked(child)
                      enqueue(worklist, child)

sweep(start, end):
     scan = start
     while (scan < end)
           if isMarked(scan)
                 unsetMarked(scan)
           else free(scan)
           scan = nextObject(scan)

The algorithm is straightforward and simple. It starts from Roots and marks each reachable object. Then it iterates over the whole heap and frees the memory of every unreachable object. Also it remove the mark of the remaining objects. It is important to say that this algorithm needs two passes over the heap.

The algorithm does not solve the problem with the heap fragmentation. This naive implementation doesn’t work well in real world scenarios. In the next post of this series we will see how we can improve it. Stay tuned.

Introduction

Hi. My name is Mihail Slavchev, welcome to my blog! Twelve years ago I was a rookie software engineer. I started my journey as a C++ developer, then did Java for a while and finally I landed into .NET world. Today, I think I am still a rookie, exploring new and amazing territories.

Feel free to join to my never ending journey!