вторник, 29 апреля 2014 г.

Mono unmanaged calls performance

Let's imagine that you implemented some great algorithm using C# and think about improving performance of it. You might suggest to rewrite some bottleneck part in native C/C++ and call it from managed code using PInvoke. That may look like a good idea because native code is generally faster than managed but even moon has its own dark side. In the case it will be the cost of PInvoke calls to unmanaged functions. In this post you can find speed comparison of various approaches and choose the best fitting to your needs.

PInvoke call

For example, let's take a function, which calculates the sum of char codes in the string. Something like this: (Note: all source code you can find at github)

public static int ManagedCount(string s)
{
    int sum = 0;

    for (int j = 0; j < s.Length; j++) {
        sum+=(int)s[j];
    }
    return sum;
}

To add some complexity we will pass array of strings and index in the array to the function.

public static int ManagedCount(string[] arr,int i)
{
    int sum = 0;

    for (int j = 0; j < arr [i].Length; j++) {
        sum+=(int)arr[i][j];
    }
 
    return sum;
}

OK, that's the managed function we will work with. Now, translate this function to native code.

int
unmanagedCount (guint16 **arr,int index)
{
    int sum=0;
    guint16 *str=arr[index];

    while(*str)
    {
        sum+=*str;
        str++;
    }

    return sum;
}

string in CLR has two-byte representation, so we use guint16** pointer to access array of strings. Also we have to add some declaration in *.cs file

[DllImport ("libperf.so",EntryPoint="unmanagedCount")]
public static extern int UnmanagedCount(
    [MarshalAs(UnmanagedType.LPArray, ArraySubType=UnmanagedType.LPWStr)] 
    string[] arr, 
    int i
    );

[DllImport] attribute tells which native library to use and the name of the function in the library (EntryPoint), [MarshalAs] attribute says that PInvoke must pass first parameter as array of two-bytes strings

If you're unfamiliar with PInvoke, you should know one thing: at every call PInvoke converts parameters from managed type to unmanaged and then convert it back on return value. The attribute [MarshalAs] of parameters tells CLR how they should be converted. Such conversions consume additional time and affects to performance as well.

Now, we can create array of strings, and call these functions ten million times to check the time execution.

Managed: 3 939 ms
PInvoke: 11 616 ms

You can see, that PInvoke is three times slower than managed function and mostly because of these managed to unmanaged conversions, so you can't improve performance with PInvoke to unmanaged function if it is called very often.

Internal call

Mono has a hidden feature which is not well-known yet. It's called Internal Calls. Primary purpose of Internal Calls are mostly provide the way to implement in native code some critical methods of corlib library (memory allocation, copiing of objects, interaction with sockets and so on). Secondary it allows native application which embeds mono calls native functions of the application. With some magic I found a way to use Internal Calls in common mono application without embedding mono or changing corlib assembly.

At first, declare InternalCount method in *.cs file

[DllImport ("libperf.so",EntryPoint="internalCount")]
[MethodImpl(MethodImplOptions.InternalCall, MethodCodeType = MethodCodeType.Runtime)]
public static extern int InternalCount(string[] arr, int i);

Difference between platform invoke method declaration and internal calls is that you place the attribute [MethodImpl(MethodImplOptions.InternalCall)] over the method. MethodCodeType field is optional and may be omitted. Also, there is another difference you don't need to specify how parameters will be marshaled, because Internal Calls don't convert parameters of method to unmanaged types and place parameter to the stack as is.

Then we have to write registration function for our internal call. Add the declaration to cs file.

[DllImport ("libperf.so",EntryPoint="init")]
public static extern void InitInternals();

And add the code to c file

#include <mono/metadata/loader.h>

void
init()
{
    mono_add_internal_call (
         "PInvokePerf.PerformanceTest::InternalCount(string[],int)",
         internalCount
         );
}

You see that init() function calls mono_add_internal_call. This function is defined in mono runtime, and you have to add header <loader.h>, add to compiler options include search path and link with mono library. To know headers include path, run from command line

pkg-config --cflags mono-2

To find library name and path, run

pkg-config --libs mono-2

An example of Makefile

Function mono_add_internal_call has two parameters: CLI method name (with optional signature) and pointer to a native function, which will be called when CLR calls the declared method. The name of the method is constucted as "Namespace.ClassName::MethodName" and may be optionally added with method signature (this is usefull, when you have got overloaded methods)

Now we are ready to the final part: implementation of internalCount function. Let see at the function body

int
internalCount (MonoArray *arr,int index)
{
    MonoString* el = mono_array_get(arr,MonoString *,index);
    int len = mono_string_length(el);
    gint32 sum=0;
    guint16 *str = mono_string_chars(el);
    int i;

    for(i = 0; i < len; i++)
    {
        sum += str[i];
    }

    return sum;
}

You may notice that the function has MonoArray type in the signature which represents string[] type in csharp. That is the most important difference versus standard PInvoke: Internal Calls works directly with managed types and you have to use Mono API to access parameters and return values. Header files of Mono API you can find at pkg-config --cflags mono-2 directory mentioned above.

Some comments about the code:

MonoString* el = mono_array_get(arr,MonoString *,index);
returns element from array arr with elements of type MonoString * at the index location

mono_string_length(el)
returns string length

mono_string_chars(el)
returns pointer to the internal char array of the managed string

Now all is done and we can run our InternalCount function. When I did it for the first time, I did it like that

public static void Main (string[] args)
{
    //We must call InitInternals to initialize internal calls 
    PerformanceTest.InitInternals();
    PerformanceTest.InternalCount(arr,0);
}

But surprisingly for me it worked as expected only in mono AOT mode, when I run this program in normal mode I got MissingMethodException. I have to spent some time with debugger and found the interesting thing

When mono starts to execute method 'Main' JIT compiler compiles 'Main' at first and recursively all the methods which are called from the 'Main'. As 'Main' is referenced to 'InternalCount' method, JIT starts to compile 'InternalCount' method too. In compilation it searches the method name in registered internal calls, because the method has [MethodImplOption.InternalCall] attribute. But it could not find it, because 'InitInternals' function is not yet run! In this case JIT generates 'throw new MissingMethodException' in IL and all subsequent calls are throwing that exception, even when we register proper internal call later.

To avoid such behaviour I hide the method InternalCount from JIT. To do this I placed all the meaningful code out of the 'Main' function, and in 'Main' function created delegate to my function and called it. JIT compiles delegate only when it starts to run and 'MissingMethodException' goes away! The code now looks like this

delegate void HideFromJit();

public static void Main (string[] args)
{
    //Create array
    InitArray ();
    //Register internal calls
    PerformanceTest.InitInternals ();
    Console.WriteLine ("Performance measuring starting");

    //You can call it directly in AOT mode
    //Performance ();

    HideFromJit d=Performance;
    d ();
}

public static void Performance()
{
    //All the code is here
    for(int i = 0; i < 1000000; i++)
        PerformanceTest.InternalCount(arr,i%100);
}

Finally performance comparison for these methods

MethodMono no optimizationsMono --optimize=unsafe
Managed3 939 ms2 784 ms
PInvoke11 616 ms11 804 ms
Internal Call872 ms855 ms

Internal calls is a total winner, when the PInvoke is outsider with no chances to beat even managed code. PInvoke to Internal Call performance differs more to ten times!

And here are the results for byte buffer xoring algorithm

MethodMono no optimizationsMono --optimize=unsafe
Managed2 068 ms1 507 ms
PInvoke4 387 ms3 707 ms
Internal Call1 372 ms1 381 ms

You can see that with 'unsafe' optimization managed code executes close to Internal Calls, but without optimizations it's 50% slower. I choose 'unsafe' optimization, because it shows maximal speed boost for code working with arrays (unsafe optimization removes bounds checks). PInvoke again at the last place

Opened questions:

  • GC movements. Should we pin managed data or do something another to be sure, that the data is not moving by GC when we are in the internal call?

Conclusion

Mono is a powerful framework and allows you to do great things with native code as well as managed. If you want to increase performance of you managed code don't use PInvoke to unmanaged as it defeats performance, but instead you might look onto Internal Calls. But you should be aware that the internal call mechanism is platform depended and you could not run you great app on .NET if you use it. By the way you always can add conditional #ifdef and compile your app with managed method for .NET and internal for Mono

References

Комментариев нет:

Отправить комментарий