C# GarbageCollection of Delegates SOLVED

So, I solved it.

I have until now always run the tests in my project. Only recently, with the new safeapp c# bindings, have I been able to get those tests working. For some reason it didn’t occur to me to test it there.
But now I did, and I could reproduce the error immediately, by running the AccountOverflowTest with just a few modifications:

[Test]
    public async Task AccountOverflowTest() {
        var session = await Utils.CreateTestApp();
        var mdInfo = await session.MDataInfoActions.RandomPublicAsync(16000);
        var accountInfo = await session.GetAccountInfoAsyc();
        
        using (var permissionsHandle = await session.MDataPermissions.NewAsync())
        using (var userHandle = await session.Crypto.AppPubSignKeyAsync()) {
        await session.MDataPermissions.InsertAsync(permissionsHandle, userHandle, new PermissionSet {Insert = true, Delete = true});
        await session.MData.PutAsync(mdInfo, permissionsHandle, NativeHandle.Zero);
        for (var i = 0; i < (long)987654321 - 1; i++) { // <- way more iterations than you will reach..
            Debug.WriteLine(i); // <- So you can see how many iterations before you crash
            var entryHandle = await session.MDataEntryActions.NewAsync();
            await session.MDataEntryActions.InsertAsync(entryHandle, Utils.GetRandomData(10).ToList(), Utils.GetRandomData(15).ToList());
            await session.MData.MutateEntriesAsync(mdInfo, entryHandle);
            entryHandle.Dispose();
            }
        }

        session.Dispose();
    }

And naturally, we hit the exact same error, after about the same number of iterations (20-30):

Managed Debugging Assistant 'CallbackOnCollectedDelegate' 
A callback was made on a garbage collected delegate of type 'SafeApp.AppBindings!SafeApp.AppBindings.AppBindings+FfiResultCb::Invoke'. This may cause application crashes, corruption and data loss. When passing delegates to unmanaged code, they must be kept alive by the managed application until it is guaranteed that they will never be called.

So, I had done some googling on it a few weeks ago, didn’t find what I was looking for. I didn’t spend hours on it. I thought, that this can’t be ungoogleble, and went for another try.

The answer is much more mundane, and much less related to unmanaged code than I had first thought.

It’s as simple as instantiating something, that is passed on to native code, without a reference kept in managed code.

The entire AppBindings.cs file is full of this.

You see the answer in this post: https://stackoverflow.com/a/6193914

specifically:

The problem is that:

hhook = SetWindowsHookEx(WH_KEYBOARD_LL, hookProc, hInstance, 0);
is just syntactic sugar for:

hhook = SetWindowsHookEx(WH_KEYBOARD_LL, new keyboardHookProc(hookProc), hInstance, 0);
and so the keyboardHookProc object is just local and will get disposed of since SetWindowsHookEx doesn’t do anything to actually hold onto it in the managed world.

and… this is exactly what we see in AppBindings.cs.

Here’s an example:

public Task<List<byte>> MDataInfoSerialiseAsync(ref MDataInfo info) {
      var (ret, userData) = BindingUtils.PrepareTask<List<byte>>();
      MDataInfoSerialiseNative(ref info, userData, OnFfiResultByteListCb);
      return ret;
    }

we are passing in OnFfiResultByteListCb which is just syntactic sugar for
new FfiResultByteListCb(OnFfiResultByteListCb)

… and this instance is sent to native code, goes out of sscope, and is garbage collected.
… or to use the words of the guy from SO:

and so the FfiResultByteListCb object is just local and will get disposed of since MDataInfoSerialiseNative doesn’t do anything to actually hold onto it in the managed world.

I’m not sure if it’s a good idea to keep a collection of instantiated references, these must at some point be cleaned up. I think the very first versions of SafeApp c# bindings actually had something similar.
Wrap it in some IDisposable object and use it for a request? I will experiment.

11 Likes

Hey @oetyng,

Thanks for the snippets and the detailed info, it was really helpful to narrow down the issue. We were able to reproduce the issue intermittently at our end too.

While a subtle difference between the linked SO post was the method in the post not being a static method, looks like the actual delegate passed to the native code still has the same flow. Fixing this should not be too hard, we just need to get the code generator in safe_client_libs updated to hold a reference to the delegate passed to the native code as indicated in SO.

Thanks again @oetyng, really appreciate the debugging help. I’ll update here once we have an issue raised in JIRA to get this addressed and @nbaksalyar hopefully can get this sorted for us soon.

6 Likes

Yep, it was a typical Heisenbug to begin with. Quote from this link which had some good info:

The failure appears random because it depends on when garbage collection occurs.

Super! Really looking forward to the solution! :slight_smile:

2 Likes