Let's imagine that you implemented some great algorithm using C# and think about improving performance of it. You might suggest to rewrite some bottleneck part in native C/C++ and call it from managed code using PInvoke. That may look like a good idea because native code is generally faster than managed but even moon has its own dark side. In the case it will be the cost of PInvoke calls to unmanaged functions. In this post you can find speed comparison of various approaches and choose the best fitting to your needs.
PInvoke call
For example, let's take a function, which calculates the sum of char codes in the string. Something like this: (Note: all source code you can find at github)
public static int ManagedCount(string s) { int sum = 0; for (int j = 0; j < s.Length; j++) { sum+=(int)s[j]; } return sum; }
To add some complexity we will pass array of strings and index in the array to the function.
public static int ManagedCount(string[] arr,int i) { int sum = 0; for (int j = 0; j < arr [i].Length; j++) { sum+=(int)arr[i][j]; } return sum; }
OK, that's the managed function we will work with. Now, translate this function to native code.
int unmanagedCount (guint16 **arr,int index) { int sum=0; guint16 *str=arr[index]; while(*str) { sum+=*str; str++; } return sum; }
string in CLR has two-byte representation, so we use guint16** pointer to access array of strings. Also we have to add some declaration in *.cs file
[DllImport ("libperf.so",EntryPoint="unmanagedCount")] public static extern int UnmanagedCount( [MarshalAs(UnmanagedType.LPArray, ArraySubType=UnmanagedType.LPWStr)] string[] arr, int i );
[DllImport] attribute tells which native library to use and the name of the function in the library (EntryPoint), [MarshalAs] attribute says that PInvoke must pass first parameter as array of two-bytes strings
If you're unfamiliar with PInvoke, you should know one thing: at every call PInvoke converts parameters from managed type to unmanaged and then convert it back on return value. The attribute [MarshalAs] of parameters tells CLR how they should be converted. Such conversions consume additional time and affects to performance as well.
Now, we can create array of strings, and call these functions ten million times to check the time execution.
Managed: 3 939 ms PInvoke: 11 616 ms
You can see, that PInvoke is three times slower than managed function and mostly because of these managed to unmanaged conversions, so you can't improve performance with PInvoke to unmanaged function if it is called very often.
Internal call
Mono has a hidden feature which is not well-known yet. It's called Internal Calls. Primary purpose of Internal Calls are mostly provide the way to implement in native code some critical methods of corlib library (memory allocation, copiing of objects, interaction with sockets and so on). Secondary it allows native application which embeds mono calls native functions of the application. With some magic I found a way to use Internal Calls in common mono application without embedding mono or changing corlib assembly.
At first, declare InternalCount method in *.cs file
[DllImport ("libperf.so",EntryPoint="internalCount")] [MethodImpl(MethodImplOptions.InternalCall, MethodCodeType = MethodCodeType.Runtime)] public static extern int InternalCount(string[] arr, int i);
Difference between platform invoke method declaration and internal calls is that you place the attribute [MethodImpl(MethodImplOptions.InternalCall)] over the method. MethodCodeType field is optional and may be omitted. Also, there is another difference you don't need to specify how parameters will be marshaled, because Internal Calls don't convert parameters of method to unmanaged types and place parameter to the stack as is.
Then we have to write registration function for our internal call. Add the declaration to cs file.
[DllImport ("libperf.so",EntryPoint="init")] public static extern void InitInternals();
And add the code to c file
#include <mono/metadata/loader.h> void init() { mono_add_internal_call ( "PInvokePerf.PerformanceTest::InternalCount(string[],int)", internalCount ); }
You see that init() function calls mono_add_internal_call. This function is defined in mono runtime, and you have to add header <loader.h>, add to compiler options include search path and link with mono library. To know headers include path, run from command line
pkg-config --cflags mono-2
To find library name and path, run
pkg-config --libs mono-2
An example of Makefile
Function mono_add_internal_call has two parameters: CLI method name (with optional signature) and pointer to a native function, which will be called when CLR calls the declared method. The name of the method is constucted as "Namespace.ClassName::MethodName" and may be optionally added with method signature (this is usefull, when you have got overloaded methods)
Now we are ready to the final part: implementation of internalCount function. Let see at the function body
int internalCount (MonoArray *arr,int index) { MonoString* el = mono_array_get(arr,MonoString *,index); int len = mono_string_length(el); gint32 sum=0; guint16 *str = mono_string_chars(el); int i; for(i = 0; i < len; i++) { sum += str[i]; } return sum; }
You may notice that the function has MonoArray type in the signature which represents string[] type in csharp. That is the most important difference versus standard PInvoke: Internal Calls works directly with managed types and you have to use Mono API to access parameters and return values. Header files of Mono API you can find at pkg-config --cflags mono-2 directory mentioned above.
Some comments about the code:
MonoString* el = mono_array_get(arr,MonoString *,index);
returns element from array arr with elements of type MonoString * at the index location
mono_string_length(el)
returns string length
mono_string_chars(el)
returns pointer to the internal char array of the managed string
Now all is done and we can run our InternalCount function. When I did it for the first time, I did it like that
public static void Main (string[] args) { //We must call InitInternals to initialize internal calls PerformanceTest.InitInternals(); PerformanceTest.InternalCount(arr,0); }
But surprisingly for me it worked as expected only in mono AOT mode, when I run this program in normal mode I got MissingMethodException. I have to spent some time with debugger and found the interesting thing
When mono starts to execute method 'Main' JIT compiler compiles 'Main' at first and recursively all the methods which are called from the 'Main'. As 'Main' is referenced to 'InternalCount' method, JIT starts to compile 'InternalCount' method too. In compilation it searches the method name in registered internal calls, because the method has [MethodImplOption.InternalCall] attribute. But it could not find it, because 'InitInternals' function is not yet run! In this case JIT generates 'throw new MissingMethodException' in IL and all subsequent calls are throwing that exception, even when we register proper internal call later.
To avoid such behaviour I hide the method InternalCount from JIT. To do this I placed all the meaningful code out of the 'Main' function, and in 'Main' function created delegate to my function and called it. JIT compiles delegate only when it starts to run and 'MissingMethodException' goes away! The code now looks like this
delegate void HideFromJit(); public static void Main (string[] args) { //Create array InitArray (); //Register internal calls PerformanceTest.InitInternals (); Console.WriteLine ("Performance measuring starting"); //You can call it directly in AOT mode //Performance (); HideFromJit d=Performance; d (); } public static void Performance() { //All the code is here for(int i = 0; i < 1000000; i++) PerformanceTest.InternalCount(arr,i%100); }
Finally performance comparison for these methods
Method | Mono no optimizations | Mono --optimize=unsafe |
---|---|---|
Managed | 3 939 ms | 2 784 ms |
PInvoke | 11 616 ms | 11 804 ms |
Internal Call | 872 ms | 855 ms |
Internal calls is a total winner, when the PInvoke is outsider with no chances to beat even managed code. PInvoke to Internal Call performance differs more to ten times!
And here are the results for byte buffer xoring algorithm
Method | Mono no optimizations | Mono --optimize=unsafe |
---|---|---|
Managed | 2 068 ms | 1 507 ms |
PInvoke | 4 387 ms | 3 707 ms |
Internal Call | 1 372 ms | 1 381 ms |
You can see that with 'unsafe' optimization managed code executes close to Internal Calls, but without optimizations it's 50% slower. I choose 'unsafe' optimization, because it shows maximal speed boost for code working with arrays (unsafe optimization removes bounds checks). PInvoke again at the last place
Opened questions:
- GC movements. Should we pin managed data or do something another to be sure, that the data is not moving by GC when we are in the internal call?
Conclusion
Mono is a powerful framework and allows you to do great things with native code as well as managed. If you want to increase performance of you managed code don't use PInvoke to unmanaged as it defeats performance, but instead you might look onto Internal Calls. But you should be aware that the internal call mechanism is platform depended and you could not run you great app on .NET if you use it. By the way you always can add conditional #ifdef and compile your app with managed method for .NET and internal for Mono
References