Okay, this whole thing needs a new approach, this is getting nowhere.
I started with the first write operation that is made: create a db.
Iterations with 100 ms delay, after 51 iterations (always same count) AppSession destructor is called (do not know why yet), whereby Session.AppPtr is set to IntPtr.Zero, which is setup to cause exception be thrown on subsequent access to this property. Meanwhile, in a parallel thread, a NativeHandle somewhere is being disposed, and when calling native code to free it, there is an access to Session.AppPtr. Boom…
To be continued. I suspect I will find quite a few things with this approach, before getting to the real problem of OP.
(This is more or less the example code - copy paste from SafeMessages app - only being tested now, or equivalent flow)
Questions so far:
- What is it that happens after 51 iterations of creating 2 MDs with write permissions inserted?
- Is there a general design flaw that we can reset Session.AppPtr, before all current usage of it has ended?
Answers so far:
- I have no idea.
- I suspect so, and will try to redesign it.
Here follows some results, one section per operation type, with some input variations for each session.
Create db iteration results
Delay: 100 ms
Iterations: 51
Event: AppSession
destructor is called, thereby resetting Session.AppPtr
before last use of it, causing ArgumentNullException
thrown on a subsequent NativeHandle
destructor called.
Delay: 200 ms, 1 000 ms,
Iterations: 51
Event: Crash, with msg The program ‘[4404] dotnet.exe’ has exited with code -1073741819 (0xc0000005) ‘Access violation’.
Delay: 5 000 ms,
Iterations: 51
Event: An unhandled exception of type ‘System.NullReferenceException’ occurred in Unknown Module. Object reference not set to an instance of an object.
(No stacktrace, ie. suspected origin: native code. Same as observed when running SAFEExamples.NoteBook
after 17x2 or so writes!)
Delay: 10 000 ms
Iterations: 51
Event: AppSession
destructor is called, thereby resetting Session.AppPtr
before last use of it, causing ArgumentNullException
thrown on a subsequent NativeHandle
destructor called.
Thoughts:
Progress! Already at this early (relatively, from my high abstraction point of view) stage of code depth (a lot more isolated than running all layers of abstraction) we are seeing similar problems!
Increased delay between iterations does not ameliorate the problem of errors showing up, but it does affect which kind of error we see… Which is not making sense to me currently. But I guess my initial theory of some corrupt memory somewhere is still a good candidate considering these seemingly uncorrelated and dispersed error types.
EDIT:
Regarding question #2
This is just a very small detail of higher level implementation. Temporary solution:
I did a static reference to the AppPtr that is not reset when AppSession destructor is called.
Probably not especially important, but if we want to be able to free all currently used NativeHandles, even after an AppSession has been disposed, then it needs some other approach. So, that would be something to think about for people using the code (or more probable: other examples not yet produced) for reference later.
So, when removing that noise, I’m back at unexplicable app crash. Debugging session just ends and in EventViewer all we can see is that dotnet.exe crashed.
Back to the OP question:
Now, I’m getting closer to a question that is maybe possible to answer for someone out there.
I have boiled it down to a very limited set of operations
// Creates db with address to category MD
public async Task CreateDbAsync(string databaseId)
{
databaseId = DbIdForProtocol(databaseId);
if (databaseId.Contains(".") || databaseId.Contains("@"))
throw new NotSupportedException("Unsupported characters '.' and '@'.");
// Check if account exits first and return error
var dstPubIdDigest = await GetMdXorName(databaseId);
using (var dstPubIdMDataInfoH = await MDataInfo.NewPublicAsync(dstPubIdDigest, 15001))
{
var accountExists = false;
try
{
var keysH = await MData.ListKeysAsync(dstPubIdMDataInfoH);
keysH.Dispose();
accountExists = true;
}
catch (Exception)
{
// ignored - acct not found
}
if (accountExists)
{
throw new Exception("Id already exists.");
}
}
// Create Self Permissions
using (var categorySelfPermSetH = await MDataPermissionSet.NewAsync())
{
await Task.WhenAll(
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kInsert),
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kUpdate),
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kDelete),
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kManagePermissions));
using (var streamTypesPermH = await MDataPermissions.NewAsync())
{
using (var appSignPkH = await Crypto.AppPubSignKeyAsync())
{
await MDataPermissions.InsertAsync(streamTypesPermH, appSignPkH, categorySelfPermSetH);
}
// Create Md for holding categories
var categoriesMDataInfoH = await MDataInfo.RandomPrivateAsync(15001);
await MData.PutAsync(categoriesMDataInfoH, streamTypesPermH, NativeHandle.Zero);
var serializedCategoriesMdInfo = await MDataInfo.SerialiseAsync(categoriesMDataInfoH);
// Finally update App Container (store db info to it)
var database = new Database
{
DbId = databaseId,
Categories = new DataArray { Type = "Buffer", Data = serializedCategoriesMdInfo }, // Points to Md holding stream types
};
var serializedDb = JsonConvert.SerializeObject(database);
using (var appContH = await AccessContainer.GetMDataInfoAsync(AppContainerPath)) // appContainerHandle
{
var dbIdCipherBytes = await MDataInfo.EncryptEntryKeyAsync(appContH, database.DbId.ToUtfBytes());
var dbCipherBytes = await MDataInfo.EncryptEntryValueAsync(appContH, serializedDb.ToUtfBytes());
using (var appContEntryActionsH = await MDataEntryActions.NewAsync())
{
await MDataEntryActions.InsertAsync(appContEntryActionsH, dbIdCipherBytes, dbCipherBytes);
await MData.MutateEntriesAsync(appContH, appContEntryActionsH);
}
}
}
}
}
This is very similar to the SafeMessages example MaidSafe have here: https://github.com/maidsafe/safe_mobile
Iterating 51 times over the above code will make the app crash with no signs of why (that I could find).
Below, I will start with the first interactions with safe_app.dll
, and see how many iterations I can go before breakdown. Then I will add another interaction, and see how many iterations we can go before errors, until we have added all interactions with safe_app.dll
that we see in the code block above.
By doing this I would like to find the place where something goes wrong, and if not that: get more clues.
Each set of iteration is ending with some of following errors (not deterministic which error shows up for which operations):
- AppSession destructor is called for some unknown reason. No logs in EventViewer.
- NullReferenceException occurred in Unknown Module, without stacktrace. No logs in EventViewer.
- ExecutionEngineException occurred in Unknown Module, without stacktrace. No logs in EventViewer.
- Crashes in the same unexplicable way as described in OP. Errror message
The program '[16784] dotnet.exe' has exited with code -1073741819 (0xc0000005) 'Access violation'.
and with
following logs in EventViewer:
Fault bucket 2253332025786264427, type 5
Event Name: BEX64
Response: Not available
Cab Id: 0
Problem signature:
P1: dotnet.exe
P2: 2.0.26021.1
P3: 5a3b026e
P4: StackHash_03ab
P5: 0.0.0.0
P6: 00000000
P7: PCH_B2_FROM_safe_app+0x0000000000584D45
P8: c0000005
P9: 0000000000000008
P10:
(this sounds to me like some memory access violation in safe_app.dll
)
Starting with the very first interaction:
Iterating over Sha3HashAsync
async Task<List<byte>> GetMdXorName(string plainTextId)
{
return await NativeUtils.Sha3HashAsync(plainTextId.ToUtfBytes());
}
Is unproblematic for ~1285 iterations, with 1 ms delay.
Next, add: MDataInfo.NewPublicAsync
databaseId = DbIdForProtocol(databaseId);
var dstPubIdDigest = await GetMdXorName(databaseId);
using (var dstPubIdMDataInfoH = await MDataInfo.NewPublicAsync(dstPubIdDigest, 15001))
{
// no action here
}
Is unproblematic for ~ 730 iterations, with 1 ms and 100 ms delay.
Next, add: MData.ListKeysAsync
databaseId = DbIdForProtocol(databaseId);
var dstPubIdDigest = await GetMdXorName(databaseId);
using (var dstPubIdMDataInfoH = await MDataInfo.NewPublicAsync(dstPubIdDigest, 15001))
{
var accountExists = false;
try
{
var keysH = await MData.ListKeysAsync(dstPubIdMDataInfoH);
keysH.Dispose();
accountExists = true;
}
catch (Exception)
{
// ignored - acct not found
}
if (accountExists)
{
throw new Exception("Id already exists.");
}
}
Is unproblematic for ~540 iterations, with 1 ms and 100 ms delay.
Next, add: MDataPermissionSet.NewAsync
using (var categorySelfPermSetH = await MDataPermissionSet.NewAsync())
{
}
~440 iterations
Next, add: 4 x MDataPermissionSet.AllowAsync
using (var categorySelfPermSetH = await MDataPermissionSet.NewAsync())
{
await Task.WhenAll(
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kInsert),
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kUpdate),
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kDelete),
MDataPermissionSet.AllowAsync(categorySelfPermSetH, MDataAction.kManagePermissions));
}
~410 iterations
(not yet committed state to the network.)
Next, add: MDataPermissions.NewAsync
using (var streamTypesPermH = await MDataPermissions.NewAsync())
{
// no action
}
~375 iterations
Next: Crypto.AppPubSignKeyAsync
using (var streamTypesPermH = await MDataPermissions.NewAsync())
{
using (var appSignPkH = await Crypto.AppPubSignKeyAsync())
{
// no action
}
}
~350 iterations
Next: MDataPermissions.InsertAsync
using (var streamTypesPermH = await MDataPermissions.NewAsync())
{
using (var appSignPkH = await Crypto.AppPubSignKeyAsync())
{
await MDataPermissions.InsertAsync(streamTypesPermH, appSignPkH, categorySelfPermSetH);
}
}
~335 iterations
and so on, for each additional interaction, ie:
#1: 1285 x Interaction_1
#2: 721 x #1 + Interaction_2
#3: 537 x #2 + Interaction_3
#4: 438 x #3 + Interaction_4
#5: 377 x #4 + Interaction_5
#6: 347 x #5 + Interaction_6
#7: 323 x #6 + Interaction_7
#8: 310 x #7 + Interaction_8
#9: 296 x #8 + Interaction_9
#10: 285 x #9 + Interaction_10
#11: 224 x #10 + Interaction_11
#12: 148 x #11 + Interaction_12
#13: 133 x #12 + Interaction_13
#14: 66 x #13 + Interaction_14
#15: 66 x #14 + Interaction_15
#16: 51 x #15 + Interaction_16
#17: 50 x #16 + Interaction_17
Memory footprint is low: ~70 MB.
Pattern is getting clear: any interaction we have with safe_app.dll
, is contributing to what eventually leads to the crash. The more interactions, the sooner we crash.
And this code is almost 100% copy paste from the MaidSafe examples.
I’m getting more confident this is not some error on my part (ofc not convinced yet though).
Is anyone reading this btw?
I will try run this in .Net Framework
instead of .Net Core
, to rule out that part.
Currently, a few possible sources for errors I can see on my end are:
- Some overlooked implementation in setup of reproduction.
- Some subtle mistake in the copy paste of code from MaidSafe example.
- Some .Net Core specific error.
- Some problem with debugger. (does not seem to be, experience the same when running without it)
- Conflicts with other applications (very clean machine though, installed and running almost nothing else)
- Some problem with my OS/VM.
Other possible sources:
- The vaults running locally
- Order of execution of calls to native code.
- The native code itself.
@nbaksalyar I have updated the Notebook repo with a unit test for reproducing this. I hope it can be useful if you have time to look at it.